Popular Posts

Saturday, August 9, 2014

Installing Hadoop 1.2.1(Single node) in Ubuntu 14.04 with openjdk-7

Procedure for installing Hadoop(Single Node) on Ubuntu 14.04

 with openjdk-7

In this tutorial i just explain steps to install and configure hadoop1.2.1 in ubuntu 14.04 with openjdk-7.

Step 1: Prerequisites

1.Download Hadoop 1.2.1

Hadoop 1.2.1 can be downloaded from here.Please download a stable version of Hadoop.

2.Install openjdk-7

Hadoop need a working Java for that open a new terminal and run the following commands.
  • bimal@bimal:~$ sudo apt-get update
  • bimal@bimal:~$ sudo apt-get upgrade
  • bimal@bimal:~$ sudo apt-get install openjdk-7-jdk
After the installation of Java we need to add JAVA_HOME to Ubuntu environment for that purpose you need to edit /etc/environment.
  • bimal@bimal:~$ sudo gedit /etc/environment
Then you need to append the following line to the file.
  • JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
3.Add a dedicated user

    we need to add a dedicated user(hduser) and also create a group hadoop because during the installation and configuration of hadoop we need a sepration and privecy from other users.
  • bimal@bimal:~$ sudo addgroup hadoop
  • bimal@bimal:~$ sudo adduser --ingroup hadoop hduser


4.Installing and configuring SSH Server

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it you need to configure SSH server.For configuring SSH server use the following commands.
First you need to install openssh server by using the following command.
  • bimal@bimal:~$ sudo apt-get install openssh-server


After installation you need to open a new terminal and switch the user to hduser by using the following commands.
  • bimal@bimal:~$ su – hduser

After switching to hduser you need to create the SSH key using the following commands.

  • hduser@bimal:~$ ssh-keygen -t rsa -P ""

The above command will produce the RSA key.The above output shows the RSA key.

Then we need to enable SSH by using the following command.The below command will copy the key to authorized_keys.
  • hduser@bimal:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
The last and final step is to connect our local machine with hduser.The step is also needed to save our local machine’s host key fingerprint to the hduser user’s known_hosts file.
  • hduser@bimal:~$ ssh localhost

If you get any error during the above step trying to reinstall the openssh-server.

5. Disable IPV6

Before going to disable ipv6 you need to repeatdly type exit and open a new terminal and edit the file /etc/sysctl.conf using the following command.
  • hduser@bimal:~$ sudo gedit /etc/sysctl.conf
Add the the following line of code at the end of the file
  • net.ipv6.conf.all.disable_ipv6 = 1
  • net.ipv6.conf.default.disable_ipv6 = 1
  • net.ipv6.conf.lo.disable_ipv6 = 1

You need to restart you system to get the effect of changing the file.

You can check whether the ipv6 is disabled or not by using the following command
  • hduser@bimal:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

If it return a value 0 then ipv6 is not disabled.If it return a value 1 it means ipv6 is disabled.

Step 2 .Install Hadoop

1.Extract and Modify Permissions

First move the hadoop package to /usr/local.Then change the directory to /usr/local.Then extract the package using tar command.Then move the extraced file to the directory hadoop after that change the owner of the hadoop directory, all files and directorys in it.The following commands are using for this purpose.
  • sudo mv /home/bimal/Downloads/hadoop-1.2.1.tar.gz /usr/local
  • cd /usr/local
  • sudo tar xzf hadoop-1.2.1.tar.gz
  • sudo mv hadoop-1.2.1 hadoop
  • sudo chown –R hduser:hadoop hadoop
Note:Sometimes -R option make some error the just use option –recursive

2. Update ‘$HOME/.bashrc’ of hduser

first open $HOME/.bashrc of hduser
  • hduser@bimal:~$ sudo gedit /home/hduser/.bashrc
After opening the file you need to append the following line of code to the end of the file.

# Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Some convenient aliases and functions for running Hadoop-
#related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster #and compress job outputs with #LZOP (not covered in this #tutorial): Conveniently inspect an LZOP compressed file from the #command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_PREFIX/bin


Step 3: Configuring Hadoop

Now we have to configure the directory where Hadoop will store its data files, the network ports it listens to, etc.

1.Assigning working directory

We will use the directory ‘/app/hadoop/tmp’ . Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS. Now we create the directory and set the required ownerships and permissions.

  • hduser@bimal:~$ sudo mkdir -p /app/hadoop/tmp
  • hduser@bimal:~$ sudo chown hduser:hadoop /app/hadoop/tmp
  • hduser@bimal:~$ sudo chmod 750 /app/hadoop/tmp
If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section.

2.Configuring Hadoop setup files

I. hadoop-env.sh

The only required environment variable we have to configure for Hadoop is JAVA_HOME.
Replace
# The java implementation to use. Required.
#export JAVA_HOME=/usr/lib/jvm/
With
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64


ii.core-site.xml


open the file core-site.xml and add the following lines of code between <configuration>...</configuration>.

  • hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

III. mapred-site.xml

Open file /usr/local/hadoop/conf/mapred-site.xml and we need to append the following code between <configuration>...</configuration>.
  • hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.</description>
</property>

IV. hdfs-site.xml

Open up the file /usr/local/hadoop/conf/hdfs-site.xml
  • hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/hdfs-site.xml
Add the following lines of code between the <configuration>...</configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>


Step 4: Formatting the HDFS Filesystem via the Namenode


open a new terminal and switch to hduser
  • bimal@bimal:~$ su – hduser
  • hduser@bimal:~$ /usr/local/hadoop/bin/hadoop namenode -format
Step 5: Starting your Single-Node Cluster

  • hduser@bimal:~$ /usr/local/hadoop/bin/start-all.sh
After the completion of the above command run the below command to check which of the nodes are started.
  • hduser@bimal:/usr/local/hadoop$ jps


You can also check if Hadoop is listening on the configured ports. Open a new terminal and run
  • sudo netstat -plten | grep java
Output


Step 6: Stopping your Single-Node Cluster

To stop all the daemons running on your machine, run the command.
  • hduser@bimal:~$ /usr/local/hadoop/bin/stop-all.sh