Anuj Vaghani
Posted on April 1, 2022
Install and run hive
Install Apache Hive on windows Linux subsystem
To configure Apache Hive, first you need to download and unzip Hive. Then you need to customize the following files and settings:
Ubuntu command line and download the compressed Hive files using and the wget command followed by the download path:
wget https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Once the download process is complete, untar the compressed Hive package:
tar xzf apache-hive-3.1.2-bin.tar.gz
step-2
Configure Hive Environment Variables (~/.bashrc)
The $HIVE_HOME environment variable needs to direct the client shell to the apache-hive-3.1.2-bin directory. Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):
source vim ~/.bashrc
Append the following Hive environment variables to the .bashrc file:
export HIVE_HOME= "home/anuj/hadoop/apache-hive-3.1.2-bin"
export PATH=$PATH:$HIVE_HOME/bin
Save and exit the .bashrc file once you add the Hive variables. Apply the changes to the current environment with the following command:
source ~/.bashrc
Step 3
Edit hive-config.sh file
Apache Hive needs to be able to interact with the Hadoop Distributed File System. Access the hive-config.sh file using the previously created $HIVE_HOME variable:
sudo vim $HIVE_HOME/bin/hive-config.sh
Setp-4
Create Hive Directories in HDFS
The temporary, tmp directory is going to store the intermediate results of Hive processes.
The warehouse directory is going to store the Hive related tables.
Create tmp Directory
Create a tmp directory within the HDFS storage layer. This directory is going to store the intermediary data Hive sends to the HDFS:
hdfs dfs -mkdir /tmp
Add write and execute permissions to tmp group members:
hdfs dfs -chmod g+w /tmp
Check if the permissions were added correctly:
hdfs dfs -ls /
The output confirms that users now have write and execute permissions.
type a command to hadoop fs -ls /
Create warehouse Directory
Create the warehouse directory within the /user/hive/ parent directory:
hdfs dfs -mkdir -p /user/hive/warehouse
Add write and execute permissions to warehouse group members:
hdfs dfs -chmod g+w /user/hive/warehouse
Check if the permissions were added correctly:
hdfs dfs -ls /user/hive
The output confirms that users now have write and execute permissions.
Setp-5
Configure hive-site.xml File (Optional)
Apache Hive distributions contain template configuration files by default. The template files are located within the Hive conf directory and outline default Hive settings.
Use the following command to locate the correct file:
cd $HIVE_HOME/conf
List the files contained in the folder using the ls command.
Use the hive-default.xml.template to create the hive-site.xml file:
cp hive-default.xml.template hive-site.xml
Access the hive-site.xml file using the nano text editor:
sudo vim hive-site.xml
Step-6
Apache Hive uses the Derby database to store metadata. Initiate the Derby database, from the Hive bin directory using the schematool command:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
The process can take a few moments to complete.
Derby is the default metadata store for Hive. If you plan to use a different database solution, such as MySQL or PostgreSQL, you can specify a database type in the hive-site.xml file.
Launch Hive Client Shell on Ubuntu
Start the Hive command-line interface using the following commands:
cd $HIVE_HOME/bin
hive
You are now able to issue SQL-like commands and directly interact with HDFS.
Posted on April 1, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.