1.安装Hadoop环境
2.安装kerberos
3.HDFS集成kerberos
4.启动集群
192.168.2.2
192.168.0.2
这里将公司两台服务器作为集群服务器,多台同理
选择将192.168.2.2作为master,和192.168.0.2作为slave,注意192.168.0.2即作为管理节点也作为数据
节点
hadoop-3.3.1.tar.gz
官网地址:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
备注:主要用户启停HDFS,如果用root可忽略此步骤,目前服务器上暂时未做此操作
groupadd hadoop
adduser -g hadoop hadoop
备注:第一个hadoop是用户组,第二个是用户名
passwd hadoop #按照提示输入密码和确认密码
vi /etc/hosts
192.168.2.2 store01 192.168.0.2 store02
注意,主机名不要包含".“、“/”、”_",否则启动hadoop无法识别
hostnamectl set-hostname store01
set -
备注:当前用户信息变成root@store01即成功
scp /etc/hosts root@192.168.0.2:/etc/hosts
ssh 192.168.0.2
备注:按照提示输入密码,成功后可通过ip addr命令查看当前登录机器的ip
hostnamectl set-hostname store02
exit
用root账号进行此操作
ssh-keygen
备注:根据提示,会依次确认密钥存放文件以及访问密钥的私钥,可根据各自情况设定,这里选择默认和空密码,即一路回车即可
ssh-copy-id root@store01 ssh-copy-id root@store02
ssh store02
备注:不输入密码即可登录到store02即成功
cd /home/hadoop
tar -zxvf hadoop-3.1.1.tar.gz
mkdir /home/hadoop/3.1.1/data/tmp
hadoop3.x所有的配置文件都在hadoop3.3.1/etc/hadoop/目录下,3.x版本和2.x版本配置有区别,此例按照3.x版本配置,一共要修改7个文件:
workers
注意:2.x版本是修改slaves文件
hadoop-env.sh
yarn-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
此文件用户配置集群内的子节点,在此文件中添加以下2行并保存
store01 store02
此文件是hadoop运行基本环境的配置,主要修改JDK路径。配置如下:
export JAVA_HOME=/home/hadoop/jdk1.8.0_321
此文件是yarn运行基本环境的配置,主要修改JDK路径。配置如下:
export JAVA_HOME=/home/hadoop/jdk1.8.0_321
hadoop核心配置文件,主要配置地址和端口。在文件中的之间添加以下配置:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-3.3.1/data/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://store01:9000</value> </property> </configuration>
hadoop.tmp.dir
:数据存储路径fs.defaultFS
:hdfs文件访问端口
配置HDFS环境。在文件中的之间添加以下配置:
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/install/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/install/hadoop/tmp/dfs/data</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs:replication</name> <value>3</value> </property> </configuration>
配置mapreduce环境。默认配置内容只需要指定yarn框架即可。在文件中的之间添加以下配置:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
配置yarn框架规则,在文件中的之间添加以下配置:
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>store01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
slave的配置文件与master保持一致,即将hadoop文件夹同步到slave即可
scp -r /home/hadoop root@store02:/home/hadoop
注意:如果是有用户组下需要操作hadoop文件权限的,需要对文件进行授权,我这里是root用户,不做此操作
chmod 777 -R /home/hadoop/
编辑/etc/profile文件
vi /etc/profile
在文件最后添加以下内容并保存,当然里面肯定包含了jdk路径
#set jdk export JAVA_HOME=/home/hadoop/jdk1.8.0_321 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVAHOME/lib:$JAVA_HOME/jre/lib # set hadoop export HADOOP_HOME=/home/hadoop/hadoop-3.3.1 export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
hadoop namenode -format
cd /home/hadoop/hadoop-3.3.1/sbin
./start-dfs.sh ./start-yarn.sh
jps
在master上结果包括ResourceManager、NameNode、SecondaryNameNode、DataNode
在slave上结果包括DataNode
在浏览器上输入查看集群状态:http://192.168.0.2:8088/cluster/nodes
查看hdfs信息:http://192.168.0.2:9870/
注意:2.x版本hdfs为50070,3.x为9870
cd /home/hadoop/hadoop-3.3.1/sbin ./stop-dfs.sh ./stop-yarn.sh
192.168.2.2 安装master KDC
192.168.0.2 安装Kerberos Client
在已经搭建好的HDFS的master主机上安装master KDC,在slave机器上安装Kerberos Client
yum install krb5-server krb5-libs krb5-workstation krb5-devel -y
vim /var/kerberos/krb5kdc/kdc.conf [kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88 [realms] HADOOP.COM = { #master_key_type = aes256-cts acl_file = /var/kerberos/krb5kdc/kadm5.acl dict_file = /usr/share/dict/words admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab max_renewable_life = 7d supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal }
HADOOP.COM
:是设定的realms。名字随意。Kerberos可以支持多个realms,一般全用大写master_key_type
:supported_enctypes默认使用aes256-cts。由于,JAVA使用aes256-cts验证方式需要安装额外的jar包,这里暂不使用acl_file:
标注了admin的用户权限,支持通配符admin_keytab
:KDC进行校验的keytabsupported_enctypes
:支持的校验方式。一定要把aes256-cts去掉
vim /etc/krb5.conf # Configuration snippets may be placed in this directory as well includedir /etc/krb5.conf.d/ [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = HADOOP.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true clockskew = 120 udp_preference_limit = 1 [realms] HADOOP.COM = { kdc = store01 admin_server = store01 } [domain_realm] .hadoop.com = HADOOP.COM hadoop.com = HADOOP.COM
说明:[logging]
:表示server端的日志的打印位置udp_preference_limit = 1
禁止使用udp可以防止一个Hadoop中的错误ticket_lifetime
: 表明凭证生效的时限,一般为24小时。renew_lifetime
: 表明凭证最长可以被延期的时限,一般为一个礼拜。当凭证过期之后,对安全认证的服务的后续访问则会失败。clockskew
:时钟偏差是不完全符合主机系统时钟的票据时戳的容差,超过此容差将不接受此票据,单位是秒
kdb5_util create -s -r HADOOP.COM
说明:[-s]
表示生成stash file,并在其中存储master server key(krb5kdc);还可以用[-r]
来指定一个realm name —— 当krb5.conf中定义了多个realm时才是必要的。
vim /var/kerberos/krb5kdc/kadm5.acl */admin@HADOOP.COM *
kadm5.acl 文件更多内容可参考:http://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/kadm5_acl.html
想要管理 KDC 的资料库有两种方式, 一种直接在 KDC 本机上面直接执行,可以不需要密码就登入资料库管理;一种则是需要输入账号密码才能管理~这两种方式分别是:
kadmin.local:需要在 KDC server 上面操作,无需密码即可管理资料库
kadmin:可以在任何一台 KDC 领域的系统上面操作,但是需要输入管理员密码
service krb5kdc start service kadmin start chkconfig krb5kdc on chkconfig kadmin on
yum install krb5-libs krb5-workstation krb5-devel -y #从master主机复制krb5.conf到slave主机 scp /etc/krb5.conf store01:/etc/krb5.conf
kadmin.local kadmin.local: addprinc hdfs/store01@@HADOOP.COM kadmin.local: addprinc hdfs/store02@@HADOOP.COM kadmin.local: addprinc HTTP/store01@@HADOOP.COM kadmin.local: addprinc HTTP/store02@@HADOOP.COM kadmin.local: addprinc chenchen@HADOOP.COM kadmin.local: addprinc wangwang@HADOOP.COM
说明:hdfs/store01@@HADOOP.COM、hdfs/store02@@HADOOP.COM、HTTP/store01@@HADOOP.COM、HTTP/store01@@HADOOP.COM的用户名需与两台服务器/etc/hosts里的域名保持一致,是为了稍后配置HDFS的kerberos准备
退出kadmin.local
exit
kinit chenchen
查看当前用户
klist
kadmin.local -q "xst -k hdfs.keytab -norandkey hdfs/store01@HADOOP.COM" kadmin.local -q "xst -k hdfs.keytab -norandkey hdfs/store02@HADOOP.COM" kadmin.local -q "xst -k HTTP.keytab -norandkey HTTP/store01@HADOOP.COM" kadmin.local -q "xst -k HTTP.keytab -norandkey HTTP/store02@HADOOP.COM" kadmin.local -q "xst -k user.keytab -norandkey chenchen@HADOOP.COM" kadmin.local -q "xst -k user.keytab -norandkey wangwang@HADOOP.COM"
注意:-norandkey
:一定要用这个参数,否则会随机重新初始化密码,导致登录不上系统
此时,生成的keytab都在root根目录下
ktutil rkt hdfs.keytab rkt HTTP.keytab rkt user.keytab wkt hadoop.keytab
hadoop.keytab文件复制到各个节点的/home/hadoop/hadoop-3.3.1/etc/hadoop目录下
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property>
<property> <name>dfs.block.access.token.enable</name> <value>true</value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </property> <property> <name>dfs.namenode.keytab.file</name> <value>/home/hadoop/hadoop-3.3.1/etc/hadoop/hadoop.keytab</value> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>hdfs/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.namenode.kerberos.https.principal</name> <value>HTTP/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:61004</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:61006</value> </property> <property> <name>dfs.http.policy</name> <value>HTTPS_ONLY</value> </property> <property> <name>dfs.data.transfer.protection</name> <value>integrity</value> </property> <property> <name>dfs.datanode.keytab.file</name> <value>/home/hadoop/hadoop-3.3.1/etc/hadoop/hadoop.keytab</value> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>hdfs/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.datanode.kerberos.https.principal</name> <value>HTTP/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.journalnode.keytab.file</name> <value>/home/hadoop/hadoop-3.3.1/etc/hadoop/hadoop.keytab</value> </property> <property> <name>dfs.journalnode.kerberos.principal</name> <value>hdfs/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.journalnode.kerberos.internal.spnego.principal</name> <value>HTTP/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>HTTP/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/home/hadoop/hadoop-3.3.1/etc/hadoop/hadoop.keytab</value> </property> <property> <name>dfs.secondary.namenode.keytab.file</name> <value>/home/hadoop/hadoop-3.3.1/etc/hadoop/hadoop.keytab</value> </property> <property> <name>dfs.secondary.namenode.kerberos.principal</name> <value>hdfs/_HOST@HADOOP.COM</value> </property> <property> <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name> <value>HTTP/_HOST@HADOOP.COM</value> </property>
注意:keytab文件位置要改对,其他配置相应改成hdfs和HTTP
dfs.http.policy改成“HTTPS_ONLY”
HTTPS 3.x的webhdfs版本的默认端口为9871
HTTP 2.x版本的的webhdfs版本的默认端口为9870
具体可查看https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
在store01生成ca并拷贝至store02
cd /etc/https
openssl req -new -x509 -keyout hdfs_ca_key -out hdfs_ca_cert -days 9999 -subj '/C=CN/ST=beijing/L=chaoyang/O=lecloud/OU=dt/CN=jenkin.com' scp hdfs_ca_key hdfs_ca_cert store02:/etc/https/
在每一台机器上生成 keystore,和trustores(中间需要输入密码,这里我全部设置成了123456)
// 生成 keystore keytool -keystore keystore -alias localhost -validity 9999 -genkey -keyalg RSA -keysize 2048 -dname "CN=${fqdn}, OU=DT, O=DT, L=CY, ST=BJ, C=CN" // 添加 CA 到 truststore keytool -keystore truststore -alias CARoot -import -file hdfs_ca_cert // 从 keystore 中导出 cert keytool -certreq -alias localhost -keystore keystore -file cert // 用 CA 对 cert 签名 openssl x509 -req -CA hdfs_ca_cert -CAkey hdfs_ca_key -in cert -out cert_signed -days 9999 -CAcreateserial // 将 CA 的 cert 和用 CA 签名之后的 cert 导入 keystore keytool -keystore keystore -alias CARoot -import -file hdfs_ca_cert keytool -keystore keystore -alias localhost -import -file cert_signed
将最终keystore,trustores放入合适的目录,并加上后缀
cp keystore /etc/https/keystore.jks cp truststore /etc/https/truststore.jks
配置ssl-client.xml:keystore.jks、truststore.jks文件位置配置好,密码均为123456为了方便好记
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <property> <name>ssl.client.truststore.location</name> <value>/etc/https/truststore.jks</value> <description>Truststore to be used by clients like distcp. Must be specified. </description> </property> <property> <name>ssl.client.truststore.password</name> <value>123456</value> <description>Optional. Default value is "". </description> </property> <property> <name>ssl.client.truststore.type</name> <value>jks</value> <description>Optional. The keystore file format, default value is "jks". </description> </property> <property> <name>ssl.client.truststore.reload.interval</name> <value>10000</value> <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds). </description> </property> <property> <name>ssl.client.keystore.location</name> <value>/etc/https/keystore.jks</value> <description>Keystore to be used by clients like distcp. Must be specified. </description> </property> <property> <name>ssl.client.keystore.password</name> <value>123456</value> <description>Optional. Default value is "". </description> </property> <property> <name>ssl.client.keystore.keypassword</name> <value>123456</value> <description>Optional. Default value is "". </description> </property> <property> <name>ssl.client.keystore.type</name> <value>jks</value> <description>Optional. The keystore file format, default value is "jks". </description> </property> </configuration>
配置ssl-server.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <property> <name>ssl.server.truststore.location</name> <value>/etc/https/truststore.jks</value> <description>Truststore to be used by NN and DN. Must be specified. </description> </property> <property> <name>ssl.server.truststore.password</name> <value>123456</value> <description>Optional. Default value is "". </description> </property> <property> <name>ssl.server.truststore.type</name> <value>jks</value> <description>Optional. The keystore file format, default value is "jks". </description> </property> <property> <name>ssl.server.truststore.reload.interval</name> <value>10000</value> <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds). </description> </property> <property> <name>ssl.server.keystore.location</name> <value>/etc/https/keystore.jks</value> <description>Keystore to be used by NN and DN. Must be specified. </description> </property> <property> <name>ssl.server.keystore.password</name> <value>123456</value> <description>Must be specified. </description> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>123456</value> <description>Must be specified. </description> </property> <property> <name>ssl.server.keystore.type</name> <value>jks</value> <description>Optional. The keystore file format, default value is "jks". </description> </property> <property> <name>ssl.server.exclude.cipher.list</name> <value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA, SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA, SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA, SSL_RSA_WITH_RC4_128_MD5</value> <description>Optional. The weak security cipher suites that you want excluded from SSL communication.</description> </property> </configuration>
####2.4 将修改的core-site.xml、hdfs-site.xml、ssl-client.xml、ssl-server.xml拷贝至store2
cd /home/hadoop/hadoop-3.3.1/sbin ./start-dfs.sh ./start-yarn.sh jps 46102 DataNode 6665 Jps 46459 SecondaryNameNode 45887 NameNode
kinit chehnchen Password for chehnchen@HADOOP.COM: hadoop fs -ls /
https://192.168.2.2:9871/可成功登录webui,但https://192.168.2.2:9871/explorer.html#/这个可以看到是没有权限访问的说明kerberos配置成功
import requests import subprocess from hdfs import InsecureClient keytab_file = '/xx/xx/hadoop.keytab'#keytab位置 principal = 'chenchen@HADOOP.COM' session = requests.Session() session.verify = False class KerberosHdfsClient(object): def __init__(self, keytab_path, principal, *args, **kwargs): kt_cmd = 'kinit -kt ' + keytab_path + ' ' + principal # 通过命令认证kerberos用户,且有效期为24小时 status = subprocess.call([kt_cmd], shell=True) if status != 0: print("kinit ERROR:") print(subprocess.call([kt_cmd], shell=True)) self.generate_client("https://服务器域名:9871") exit() def generate_client(self, hdfs_address): client = InsecureClient(url=hdfs_address, session=session) print(client.list("/")) # client = KerberosClient(hdfs_address, hostname_ov erride=hostname_override) return client client = KerberosHdfsClient(keytab_file, principal)