hive2.3.6安装指南
本篇博客要点如下:
hive2.3.6安装指南
一. 准备工作
安装hive前,需要先安装好对应版本的hadoop和mysql并启动
1.1 hadoop安装
首先要确认hadoop的版本是否与hive版本兼容,可通过以下链接来确认版本兼容性
hadoop集群的搭建,可以参照我之前搭建hadoop集群时做的笔记,链接如下:
1.2 mysql安装
mysql安装,可以参照我的上一篇博客:
基于以上准备工作,我们可以正式开始hive的安装
二. 正式安装hive
注意 : 这里应该使用hadoop的用户进行相关操作
否则,后续会出现一大堆和权限有关的问题
2.1 下载解压hive
下载:
推荐去国内的开源镜像站下载,这里给出清华大学的开源镜像站链接
上传解压:
上传压缩文件到 : /opt/module 目录下, 这里可根据个人实际情况设置
tar -zxvf apache-hive-2.3.6-bin.tar.gz #解压
# 解压后的目录结构为 : /opt/module/apache-hive-2.3.6-bin
mv /opt/module/apache-hive-2.3.6-bin /opt/module/hive # 重命名该文件夹为hive
#最终hive的目录结构为 : /opt/module/hive
2.2 配置hive环境变量
可以配置系统级环境变量(root用户)
或用户级环境变量, 二者选一即可
系统级环境变量
vi /etc/profile
# 在末尾添加如下内容,保存退出
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile # 让修改立即生效
用户级环境变量
vi ~/.bash_profile
export PATH
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bash_profile
如下,证明环境变量配置成功!
2.3 在hadoop下创建hive所用文件夹
如果没有在PATH添加$HADOOP的bin目录,需要在:
$HADOOP_HOME/bin下执行,否则在任意目录下执行:
hadoop fs -mkdir -p /tmp
hadoop fs -mkdir -p /user/hive/warehouse # 这里使用-p 是为了防止报上级路径没有的错误
hadoop fs -chmod g+w /tmp # 给目录所在的群组增加写入权限
hadoop fs -chmod g+w /user/hive/warehouse
2.4 配置mysql
下载解压对应jar包
connerctor对应的版本连接:
connector jar包下载链接, 记住选择对应的mysql版本进行下载
我这里对应的文件是:mysql-connector-java-5.1.48.tar.gz
把该文件放到 hive的lib文件夹下,解压文件:
cd $HIVE_HOME/lib
tar -zxvf mysql-connector-java-5.1.48.tar.gz
# 将mysql-connector-java-5.1.48.jar文件拷贝到lib目录下
cp /opt/module/hive/lib/mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar /opt/module/hive/lib/
创建数据库,并配置用户和权限
mysql -u username -p password #登录mysql
执行如下命令 :
create database metastore; # 创建数据库
# 创建mysql的用户hive,密码为hive并赋予相关权限
grant all on metastore.* to hive@'%' identified by 'hive';
grant all on metastore.* to hive@'localhost' identified by 'hive';
flush privileges; # 刷新mysql权限
2.5 配置hive
复制template文件为需要的配置文件
cd $HIVE_HOME/conf
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
配置hive-env.sh
vi hive-env.sh
# 引入java环境变量和hadoop环境变量,这里根据个人实际情况配置
export JAVA_HOME=/home/master/bigdata/jdk1.8.0_171
export HADOOP_HOME=/home/master/bigdata/hadoop-2.7.6
配置hive-site.xml
注意: 这里是修改,不是添加!
hive-site.xml内容很多,所以需要按照key的值在hive-site.xml查找,修改如下部分:
注意mysql的用户名密码,应该与2.4中配置一致
其中 : /opt/module/hive 为我的$HIVE_HOME
# 这里需要使用绝对路径名,否则hive启动的时候会报错,类似下面这种
AILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: file:./$HIVE_HOME/local/41198459-2cf3-4ac7-bc74-ed3c32aa5a61/hive_2020-04-21_17-34-13_530_6862709781968853739-1
<property>
<name>hive.exec.scratchdir</name>
<value>/opt/module/hive/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/opt/module/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/opt/module/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/opt/module/hive/local</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/module/hive/download</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
配置 hive-log4j2.properties
# 修改hive日志配置文件, 修改log.dir以便出错的时候能够快速定位排查问题
vi hive-log4j2.properties
property.hive.log.dir = /opt/module/hive/log
# 排查问题时 去:查看 /opt/module/hive/log/hive.log文件即可
2.6 初始化hive
./schematool -dbType mysql -initSchema hive hive
cd $HIVE_HOME/bin
如下 : 证明初始化成功
若初始化失败,需要根据控制台和日志提示做出相应处理
2.7 hive启动
cd $HIVE_HOME/bin # 如果配置了PATH,可在任何一个目录执行
nohup ./hive --service metastore > metastore.out & # (后台启动) 端口号 : 9083 重定向到metastore.out文件,方便排查问题
nohup ./hive --service hiveserver2 > hiveservice2.out & # (后台启动) 端口号 : 10000
日志情况如下:
ps -ef | grep hive #查看Hive相关进程,如下界面证明Hive启动成功
2.8 检测hive是否可用
cd $HIVE/HOME/bin
./hive #启动Hive.配置了PATH可以在任意目录下直接执行hive启动
show tables; # 随便执行一条sql语句,看看是否成功
如下,hive安装完成并且可用!
三. 安装中遇到的问题
3.1 hive初始化报错
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
... 59 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
... 61 more
原因:
$HIVE_HOME/lib目录下缺少mysql的驱动包
解决办法 : 把对应版本的 mysql-connector-java-5.1.48.jar 放在$HIVE_HOME/lib目录下
如下图:
3.2 hive脚本启动报错
[master@linux01 bin]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/master/bigdata/hadoop-2.7.6/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/opt/module/hive/conf/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:663)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:586)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:553)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 12 more
原因: hive-site.xml配置文件部分配置没有配置好:
cd $HIVE_HOME/conf
vi hive-site.xml
修改如下配置,重启后即可!
<property>
<name>hive.querylog.location</name>
<value>/opt/module/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/opt/module/hive/local</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/module/hive/download</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>