Hadoop Authentication
Last modified : 25 September, 2017
How does a Hadoop Server know your identity? I owe a lot of this understanding to Heesoo Kim. Here are my notes from Hadoop-2.7.4 :
FilterInitializer
- Defines what filters are configured for the server. Specified as “hadoop.http.filter.initializers”. Defaults to “org.apache.hadoop.http.lib.StaticUserWebFilter” (defaults are for non-kerberized clusters.)
- AuthenticationFilterInitializer is used by most servers (instead of the StaticUserWebFilter) on kerberized clusters. This initializes the AuthenticationFilter class with all configuration prefixed with “hadoop.http.authentication.”
- RMAuthenticationFilterInitializer uses proxy-user configs, configures delegation token kind to “RM_DELEGATION_TOKEN” etc. Only ever used in the ResourceManager.
- TimelineAuthenticationFilterInitializer Copy Pastad RMAuthenticationFilterInitializer and changed to “TIMELINE_DELEGATION_TOKEN”. Only ever used in TimelineServer.
AuthenticationFilter
- For HTTP requests, AuthenticationFilter implements javax.servlet.filter . All HTTP requests handled by the server are passed through this Filter (except RM Web App and Timeline Server which are passed through subclasses, see last bullets).
- The AuthenticationFilter selects all configuration starting with a certain prefix and uses them to initialize a configured AuthenticationHandler (more on this later).
- If there is a cookie by the name of “hadoop.auth”, in the HTTP request, it is parsed as an AuthenticationToken. The presence of a valid AuthenticationToken means authentication has already happened. The server alone is able to decrypt this token and get the identity.
- Every request is passed to AuthenticationHandler.managementOperation() in case the AuthenticationToken needs to be got / renewed / canceled.
- DelegationTokenAuthenticationFilter extends AuthenticationFilter and allows the use of Delegation Tokens (issued by Hadoop daemons) in addition to whatever AuthenticationHandler is configured.
- RMAuthenticationFilter extends DelegationTokenAuthenticationFilter and allows setting the SecretManager for ResourceManager.
- TimelineAuthenticationFilter extends DelegationTokenAuthenticationFilter and allows setting the SecretManager for TimelineServer.
- SecretManager is the server-side store for all issued DelegationTokens.
AuthenticationHandler
- This is the class you want to extend if you want to use your own mechanism for authentication. It must be configured as “hadoop.http.authentication.type” in core-site.xml.
- To re-iterate, if its an AuthenticationFilter that has been configured, then all configuration starting with “hadoop.http.authentication.” is passed along to the initializer.
- The method AuthenticationHandler.getType() is important to overload. An AuthenticationToken sent by a client must return the same type to avoid repeating the authentication procedure several times otherwise you may see the following error:
2017-07-24 20:22:49,826 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: Invalid AuthenticationToken type
- AuthenticationHandler.managementOperation() should return true when the request processing should continue. If false is returned, it means the response has been populated by this method.
Lets look at some of the implementations of AuthenticationHandler:
KerberosAuthenticationHandler
- Allows Kerberos SPNEGO authentication. Configuration must contain PREFIX.”kerberos.principal” = (possibly a list of) principal(s), PREFIX.”kerberos.keytab” = a keytab file for the principals specified, and PREFIX.”kerberos.name-rules” that map the Kerberos principal into an operating system user. Usually a separate principal e.g. HTTP/resourcemanager@REALM is created for the HTTP principal.
- Uses org.ietf.jgss.GSSManager and on successful establishment of a context, creates and returns an AuthenticationToken (of type “kerberos” if left to the default.)
- KerberosAuthenticationHandler.managementOperation() always returns true.
AltKerberosAuthenticationHandler
- abstract class that extends KerberosAuthenticationHandler and behaves exactly the same when the user-agent of the request is not a browser.
- When the user-agent is a browser, alternateAuthenticate() is called. alternateAuthenticate() is a method that must be overridden in concrete subclasses.
- Returns tokens of the type “alt-kerberos”.
- List of non-browser user-agents is configurable but defaults to “java,curl,wget,perl”.
DelegationTokenAuthenticationHandler
- abstract class that implements Kerberos SPNEGO mechanism for HTTP and supports Delegation Token functionality.
- Allows setting an external SecretManager.
- Provides a way to obtain, renew and cancel delegation tokens via HTTP (by setting a parameter op=GETDELEGATIONTOKEN or op=RENEWDELEGATIONTOKEN or op=CANCELDELEGATIONTOKEN)
PseudoAuthenticationHandler
- Allows easy testing. Returns the identity passed in as a query parameter.
Allen pointed out that HADOOP-12082 added a new (and possibly better) way to configure authentication in Hadoop-2.8.0 which earlier required the AltKerberosAuthenticationHandler, but I haven’t gotten around to using it. The contributors of this feature wrote this article explaining it.
All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.