Configuring VectorH for Use with Apache Ranger
Ranger controls access to various Hadoop services, so appropriate policies must be place to provide VectorH access to the services it requires.
Access to HDFS
VectorH needs full and exclusive access to all files under the /Actian directory in HDFS. Also, some of the Hadoop services require the VectorH user (actian) to have access to additional directories for housekeeping purposes.
The easiest way to ensure such access is to enable "federated mode" for Ranger in HDFS (in the Ambari console: HDFS, Configs, Advanced, Advanced ranger-hdfs-security, xasecure.add-hadoop-authorization=true) and letting HDFS access control default to the existing POSIX permissions.
When access control through Ranger policies is desired, you must define a policy to allow VectorH to access the appropriate locations in HDFS. In its simplest form, such a policy could look like this:
• Policy Name: Actian
• Resource Path: /Actian, /Actian/*, /ats/active, /ats/active/*, /user/actian, /user/actian/*, /app-logs, /app-logs/actian, /app-logs
• Recursive: On
• User and Group Permissions
– User: actian
– Permissions: Read/Write/Execute
Access to YARN
VectorH must have access to the queue that is configured in the VectorH YARN_AM_QUEUE configuration parameter, typically "default".
The easiest way to achieve this is to keep the YARN-ACL fallback enabled and let YARN access control default to the existing YARN ACL permissions. If strict Ranger access control is desired, you must disable the fallback (YARN, Configs, Advanced, Custom ranger-yarn-security, ranger.add-yarn-authorization=false), and then define a suitable policy to allow VectorH access to the queue.
The policy for YARN in Ranger that is automatically created upon install typically gives access to all queues to a given list of users. Simply adding the actian user to that list will work, but will give VectorH more rights than it strictly needs. Defining a separate policy to give the actian user access to only the indicated queue would be a better security policy.
In its simplest form, such a policy could look like this:
• Policy Name: Default Queue
• Queue: root.default
• User and Group Permissions
– User: actian
– Permissions: submit-app
Note: This document only describes the changes specific to VectorH. For information on setting up a working Ranger configuration on a cluster, see the appropriate Hadoop documentation.
Kerberos and YARN
When running on a Kerberos-enabled Hadoop system, YARN may refuse user actian because it has by default an ID of 300, which is below the usual minimum allowed ID for Hadoop users. In this case, user actian should be added to the "allowed.system.users" parameter in container-executor.cfg on all nodes.