In
this tutorial i just explain steps to install and configure
hadoop1.2.1 in ubuntu 14.04 with openjdk-7.
Step 1: Prerequisites
1.Download
Hadoop 1.2.1
2.Install
openjdk-7
Hadoop
need a working Java for that open a new terminal and run the
following commands.
- bimal@bimal:~$ sudo apt-get update
- bimal@bimal:~$ sudo apt-get upgrade
- bimal@bimal:~$ sudo apt-get install openjdk-7-jdk
After
the installation of Java we need to add JAVA_HOME to Ubuntu
environment for that purpose you need to edit /etc/environment.
- bimal@bimal:~$ sudo gedit /etc/environment
Then
you need to append the following line to the file.
3.Add
a dedicated user
we
need to add a dedicated user(hduser) and also create a group hadoop
because during the installation and configuration of hadoop we need a
sepration and privecy from other users.
- bimal@bimal:~$ sudo addgroup hadoop
- bimal@bimal:~$ sudo adduser --ingroup hadoop hduser
4.Installing
and configuring SSH Server
Hadoop
requires SSH access to manage its nodes, i.e. remote machines plus
your local machine if you want to use Hadoop on it you need to
configure SSH server.For configuring SSH server use the following
commands.
First
you need to install openssh server by using the following command.
- bimal@bimal:~$ sudo apt-get install openssh-server
After
installation you need to open a new terminal and switch the user to
hduser by using the following commands.
- bimal@bimal:~$ su – hduser
After
switching to hduser you need to create the SSH key using the
following commands.
- hduser@bimal:~$ ssh-keygen -t rsa -P ""
The
above command will produce the RSA key.The above output shows the RSA key.
Then
we need to enable SSH by using the following command.The below
command will copy the key to authorized_keys.
- hduser@bimal:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
The
last and final step is to connect our local machine with hduser.The
step is also needed to save our local machine’s host key
fingerprint to the hduser user’s known_hosts file.
- hduser@bimal:~$ ssh localhost
If
you get any error during the above step trying to reinstall the
openssh-server.
5.
Disable IPV6
Before
going to disable ipv6 you need to repeatdly type exit and open a new
terminal and edit the file /etc/sysctl.conf using the following
command.
- hduser@bimal:~$ sudo gedit /etc/sysctl.conf
Add
the the following line of code at the end of the file
- net.ipv6.conf.all.disable_ipv6 = 1
- net.ipv6.conf.default.disable_ipv6 = 1
- net.ipv6.conf.lo.disable_ipv6 = 1
You
need to restart you system to get the effect of changing the file.
You
can check whether the ipv6 is disabled or not by using the following
command
- hduser@bimal:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
If
it return a value 0 then ipv6 is not disabled.If it return a value 1
it means ipv6 is disabled.
Step
2 .Install Hadoop
1.Extract
and Modify Permissions
First
move the hadoop package to /usr/local.Then change the directory to
/usr/local.Then extract the package using tar command.Then move the
extraced file to the directory hadoop after that change the owner of
the hadoop directory, all files and directorys in it.The following
commands are using for this purpose.
- sudo mv /home/bimal/Downloads/hadoop-1.2.1.tar.gz /usr/local
- cd /usr/local
- sudo tar xzf hadoop-1.2.1.tar.gz
- sudo mv hadoop-1.2.1 hadoop
- sudo chown –R hduser:hadoop hadoop
Note:Sometimes
-R option make some error the just use option –recursive
2.
Update ‘$HOME/.bashrc’ of hduser
first
open $HOME/.bashrc of hduser
- hduser@bimal:~$ sudo gedit /home/hduser/.bashrc
After
opening the file you need to append the following line of code to the
end of the file.
#
Set Hadoop-related environment variables
export
HADOOP_PREFIX=/usr/local/hadoop
#
Set JAVA_HOME (we will also configure JAVA_HOME directly for
Hadoop later on)
export
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
#
Some convenient aliases and functions for running Hadoop-
#related commands
unalias
fs &> /dev/null
alias
fs="hadoop fs"
unalias
hls &> /dev/null
alias
hls="fs -ls"
#
If you have LZO compression enabled in your Hadoop cluster #and
compress job outputs with #LZOP (not covered in this #tutorial):
Conveniently inspect an LZOP compressed file from the #command
#
line; run via:
#
#
$ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
#
Requires installed 'lzop' command.
#
lzohead
() {
hadoop
fs -cat $1 | lzop -dc | head -1000 | less
}
#
Add Hadoop bin/ directory to PATH
export
PATH=$PATH:$HADOOP_PREFIX/bin
|
Step
3: Configuring Hadoop
Now we
have to configure the directory where Hadoop will store its data
files, the network ports it listens to, etc.
1.Assigning
working directory
We will
use the directory ‘/app/hadoop/tmp’ . Hadoop’s default
configurations use hadoop.tmp.dir as the base temporary directory
both for the local file system and HDFS. Now we create the directory
and set the required ownerships and permissions.
- hduser@bimal:~$ sudo mkdir -p /app/hadoop/tmp
- hduser@bimal:~$ sudo chown hduser:hadoop /app/hadoop/tmp
- hduser@bimal:~$ sudo chmod 750 /app/hadoop/tmp
If you
forget to set the required ownerships and permissions, you will see a
java.io.IOException when you try to format the name node in the next
section.
2.Configuring
Hadoop setup files
I.
hadoop-env.sh
The
only required environment variable we have to configure for Hadoop is
JAVA_HOME.
Replace
#
The java implementation to use. Required.
#export
JAVA_HOME=/usr/lib/jvm/
|
With
#
The java implementation to use. Required.
export
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
|
ii.core-site.xml
open the file core-site.xml and add the following lines of code between <configuration>...</configuration>.
- hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A
base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The
name of the default file system. A URI whose
scheme
and authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The uri's authority is used to
determine
the host, port, etc. for a filesystem.</description>
</property>
|
III.
mapred-site.xml
Open
file /usr/local/hadoop/conf/mapred-site.xml and we need to append
the following code between <configuration>...</configuration>.
- hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The
host and port that the MapReduce job tracker runs at. If "local",
then jobs are run in-process as a single map and reduce
task.</description>
</property>
|
IV.
hdfs-site.xml
Open up
the file /usr/local/hadoop/conf/hdfs-site.xml
- hduser@bimal:~$ sudo gedit /usr/local/hadoop/conf/hdfs-site.xml
Add the
following lines of code between the
<configuration>...</configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default
block replication. The actual number of replications can be
specified when the file is created. The default is used if
replication is not specified in create time.</description>
</property>
|
Step
4: Formatting the HDFS Filesystem via the Namenode
open a
new terminal and switch to hduser
- bimal@bimal:~$ su – hduser
- hduser@bimal:~$ /usr/local/hadoop/bin/hadoop namenode -format
Step
5: Starting your Single-Node Cluster
- hduser@bimal:~$ /usr/local/hadoop/bin/start-all.sh
After
the completion of the above command run the below command to check
which of the nodes are started.
- hduser@bimal:/usr/local/hadoop$ jps
You
can also check if Hadoop is listening on the configured ports. Open a
new terminal and run
- sudo netstat -plten | grep java
Output
Step
6: Stopping your Single-Node Cluster
To
stop all the daemons running on your machine, run the command.
- hduser@bimal:~$ /usr/local/hadoop/bin/stop-all.sh