Introduction to HDFS Server Configuration in GBase 8a MPP Cluster

congcong

Cong Li

Posted on September 19, 2024

Introduction to HDFS Server Configuration in GBase 8a MPP Cluster

Today, I would like to introduce the configuration of an HDFS server. For reference, you can check out the previous articles:

Setting Up an HDFS Server Using Apache Hadoop 2.6.0

1) Preparing the Hadoop Cluster Environment

  • Operating System User: gbase
  • SSH trust has been established between all cluster nodes.
  • The C3 tool is already configured in the cluster.
  • Open Source Product Versions:
    • Apache Hadoop 2.6.0
    • JVM Version 1.6 or 1.7

Example Configuration:

IP Hostname Role
192.168.10.114 ch-10-114 NameNode, DataNode
192.168.10.115 ch-10-115 DataNode
192.168.10.116 ch-10-116 DataNode

2) Configuring Hostnames

Each node needs to have the correct hostname configuration. For example, on the node 192.168.10.114, the configuration should be as follows. Other nodes can directly copy this configuration.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1     localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.10.114 ch-10-114
192.168.10.115 ch-10-115
192.168.10.116 ch-10-116
Enter fullscreen mode Exit fullscreen mode

Note: If the first line is configured as shown below, there will be an issue where the Hadoop DataNode cannot connect to the NameNode after installation.

127.0.0.1   ch-10-114 localhost localhost.localdomain localhost4 localhost4.localdomain4
Enter fullscreen mode Exit fullscreen mode

If the cluster does not have a DNS server to resolve the hostnames of Hadoop's NameNode and DataNode, you need to configure the /etc/hosts file on every coordinator node executing the load task and every data node in the cluster. Add the mappings of the IP addresses and hostnames of the NameNode and DataNode as shown above. If the /etc/hosts file is not configured, an error like “Couldn't resolve hostname” will be reported when loading files from the HDFS server.

Check Method:

Use the jps command to check. If you find that the DataNode has started but its log shows continuous attempts to connect to the NameNode's port 9000 (HDFS's RPC port), check the NameNode node with netstat -an. You should see something like this:

$ netstat -an | grep 9000
tcp  0  0 127.0.0.1:9000    0.0.0.0:*        LISTEN  
Enter fullscreen mode Exit fullscreen mode

Error Reason: The IP address for the TCP listener is 127.0.0.1, causing only the local machine to connect to port 9000. This is due to an incorrect configuration of the /etc/hosts file on the NameNode.

Solution: Remove the red text (ch-10-114) from the first line, or move the contents of the first line to a later position.

Correct configuration:

192.168.10.114 ch-10-114
192.168.10.115 ch-10-115
192.168.10.116 ch-10-116
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1     localhost localhost.localdomain localhost6 localhost6.localdomain6
Enter fullscreen mode Exit fullscreen mode

Restart HDFS and check again with netstat -an | grep 9000. The port and IP should now be correct:

$ netstat -an | grep 9000
tcp  0  0 192.168.10.114:9000    0.0.0.0:*        LISTEN  
Enter fullscreen mode Exit fullscreen mode

3) Directory Planning

Directory Purpose
/home/gbase/bin Stores the Hadoop ecosystem, including Hadoop itself
/home/gbase/hdfs Stores HDFS files, including tmp, name, and data

Add the environment variable ${HADOOP_HOME}:

$ echo "export HADOOP_HOME=/home/gbase/bin/Hadoop-2.6.0" >> ~/.bashrc
$ . ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Note: ${HADOOP_HOME} refers to /home/gbase/bin/Hadoop-2.6.0 below.

4) Preparing Hadoop 2.6.0

Unzip hadoop-2.6.0.tar.gz to /home/gbase/bin on each node.

$ tar xfz hadoop-2.6.0.tar.gz -C /home/gbase/bin
Enter fullscreen mode Exit fullscreen mode

5) Configuring hadoop-env.sh

File path: ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh

$ cd ${HADOOP_HOME}
$ vi etc/hadoop/hadoop-env.sh
Enter fullscreen mode Exit fullscreen mode

Configure both NameNode and DataNode as follows.

Change export JAVA_HOME=$JAVA_HOME to:

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
Enter fullscreen mode Exit fullscreen mode

Change export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} to:

export HADOOP_CONF_DIR=/home/gbase/bin/hadoop-2.6.0/etc/hadoop
Enter fullscreen mode Exit fullscreen mode

6) Configuring core-site.xml

File path: ${HADOOP_HOME}/etc/hadoop/core-site.xml

$ cd ${HADOOP_HOME}
$ vi etc/hadoop/core-site.xml
Enter fullscreen mode Exit fullscreen mode

Configure both NameNode and DataNode as follows:

<configuration>
   <property>
       <name>fs.default.name</name>
       <value>hdfs://ch-10-114:9000</value>
   </property>
   <property>
       <name>hadoop.tmp.dir</name>
       <value>file:/home/gbase/hdfs/tmp</value>
   </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

7) Configuring hdfs-site.xml

File path: ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml

$ cd ${HADOOP_HOME}
$ vi etc/hadoop/hdfs-site.xml
Enter fullscreen mode Exit fullscreen mode

NameNode Configuration:

<configuration>
   <property>
       <name>dfs.replication</name>
       <value>2</value>
   </property>
   <property>
       <name>dfs.name.dir</name>
       <value>file:/home/gbase/hdfs/name</value>
       <description>name node dir </description>
   </property>
   <property>
       <name>dfs.permissions</name>
       <value>false</value>
   </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

DataNode Configuration:

<configuration>
   <property>
       <name>dfs.data.dir</name>
       <value>file:/home/gbase/hdfs/data</value>
       <description>data node dir</description>
   </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

8) Configuring Masters and Slaves

File paths:

  • ${HADOOP_HOME}/etc/hadoop/masters
  • ${HADOOP_HOME}/etc/hadoop/slaves

Only need to configure on the NameNode node.

$ cd ${HADOOP_HOME}
$ vi etc/hadoop/masters
Enter fullscreen mode Exit fullscreen mode

Contents of ${HADOOP_HOME}/etc/hadoop/masters:

ch-10-114
Enter fullscreen mode Exit fullscreen mode
$ cd ${HADOOP_HOME}
$ vi etc/hadoop/slaves
Enter fullscreen mode Exit fullscreen mode

Contents of ${HADOOP_HOME}/etc/hadoop/slaves:

ch-10-114
ch-10-115
ch-10-116
Enter fullscreen mode Exit fullscreen mode

9) Formatting the NameNode

NameNode formatting needs to be done before starting HDFS.

$ cexec rm -fr /home/gbase/hdfs/*
$ cd ${HADOOP_HOME}
$ bin/hdfs namenode -format
Enter fullscreen mode Exit fullscreen mode

10) Starting HDFS

$ cd ${HADOOP_HOME}
$ sbin/start-dfs.sh
Enter fullscreen mode Exit fullscreen mode

After starting, use the jps command to check the processes on each node. The following output indicates successful startup:

$ cexec jps
************************* test *************************
--------- 192.168.10.114---------
31318 SecondaryNameNode
31133 NameNode
31554 Jps
--------- 192.168.10.115---------
10835 DataNode
11000 Jps
--------- 192.168.10.116---------
10145 DataNode
10317 Jps
Enter fullscreen mode Exit fullscreen mode

11) Stopping HDFS

$ cd ${HADOOP_HOME}
$ sbin/stop-dfs.sh
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
congcong
Cong Li

Posted on September 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related