Installing Cloudera Hadoop CDH3 on Mac 10.6 was pretty straight forward - however, I did encounter couple of barriers and spent time on researching. So this blog intends to outline installation steps and possible solutions to the known issues.
Issues encountered:
Issue #1: If you encounter errors while coping content into the HDFS as mentioned below. Possible solution is
physically delete all the files under the dfs.name.dir & dfs.data.dir then run the format command (step #3).
If the above steps does not solve the issue, validate the configurations under /conf directory. (Note - Cloudera distribution bundles example conf files under /example-confs directory.)
could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
Issue #2: JobTracker or TaskTracker is not loading - it could possible be missing the configuration on the mapred-site.xml
Issue #3: Give read/write permission to all the directory configured for the dfs.
ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because org.apache.hadoop.util.DiskChecker$DiskErrorException: all local directories are not writable
at org.apache.hadoop.mapred.TaskTracker.checkLocalDirs(TaskTracker.java:3495)
at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:659)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:734)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1431)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3521)
Reference Links:
Hope these steps save you time.
- Step #1: Download the Hadoop from Cloudera website (hadoop-0.20.2-cdh3u0.tar.gz)
- Step #2: After unzipping the tar file into your working directory. Make following changes to the configuration file:
- core-site.xml file ( /conf directory)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/Kumar</value>
</property>
</configuration>
- hdfs-site.xml file (under conf directory)
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<!-- specify this so that running 'hadoop namenode -format' formats the right dir -->
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/data</value>
</property>
</configuration>
Make sure the read write permissions on the directory are enabled.
- mapred-site.xml file (under conf directory)
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
- core-site.xml file ( /conf directory)
- Step #3: Format the namenode:
./bin/hadoop namenode -format
- Step #4: Start your hadoop cluster -
./bin/start-all.sh
- Step #5: Test the HDFS system by coping a file into the hdfs
//creates a directory named census-original-files under /user/$username/
./bin/hadoop fs -mkdir census-original-files
//list the directory and you should see the above directory
./bin/hadoop fs -ls
//copy a file from local to hdfs
./bin/hadoop fs -copyFromLocal README.txt /user/Kumar/census-original-files/
//check the content of the file in hdfs
./bin/hadoop fs -cat /user/Kumar/census-original-files/README.txt
//check whether all the mapreduce java tasks have started:
jps
expected java process (ignore port#):
6722 SecondaryNameNode
6571 NameNode
6779 JobTracker
6647 DataNode
6855 TaskTracker
if any of the above java process is missing - check the logs directory. (Mostly the configuration files are not set right or directory write permissions are missing.)
- Step #6: To shutdown the cluster
./bin/stop-all.sh
Issues encountered:
Issue #1: If you encounter errors while coping content into the HDFS as mentioned below. Possible solution is
physically delete all the files under the dfs.name.dir & dfs.data.dir then run the format command (step #3).
If the above steps does not solve the issue, validate the configurations under /conf directory. (Note - Cloudera distribution bundles example conf files under /example-confs directory.)
could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
Issue #2: JobTracker or TaskTracker is not loading - it could possible be missing the configuration on the mapred-site.xml
FATAL org.apache.hadoop.mapred.JobTracker: java.lang.RuntimeException: Not a host:port pair: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:140)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:124)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2427)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2043)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:294)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:286)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4767)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:140)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:124)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2427)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2043)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:294)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:286)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4767)
Issue #3: Give read/write permission to all the directory configured for the dfs.
ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because org.apache.hadoop.util.DiskChecker$DiskErrorException: all local directories are not writable
at org.apache.hadoop.mapred.TaskTracker.checkLocalDirs(TaskTracker.java:3495)
at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:659)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:734)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1431)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3521)
Reference Links:
Hope these steps save you time.