Knowledge Donor: May 2011

Installing Cloudera Hadoop CDH3 on Mac 10.6 was pretty straight forward - however, I did encounter couple of barriers and spent time on researching. So this blog intends to outline installation steps and possible solutions to the known issues.

Step #1: Download the Hadoop from Cloudera website (hadoop-0.20.2-cdh3u0.tar.gz)
Step #2: After unzipping the tar file into your working directory. Make following changes to the configuration file:
- core-site.xml file ( /conf directory)
  
  <configuration>
  <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:8020</value>
  </property>
  <property>
       <name>hadoop.tmp.dir</name>
       <value>/var/lib/hadoop-0.20/cache/Kumar</value>
  </property>
  </configuration>
- hdfs-site.xml file (under conf directory)
  
  <configuration>
  
  <property>
      <name>dfs.replication</name>
      <value>1</value>
  </property>
  
  <property>
       
       <name>dfs.name.dir</name>
       <value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
  </property>
  
  <property>
       <name>dfs.data.dir</name>
       <value>/var/lib/hadoop-0.20/cache/hadoop/dfs/data</value>
  </property>
  </configuration>
  
  Make sure the read write permissions on the directory are enabled.
- mapred-site.xml file (under conf directory)
  
  <configuration>
  <property>
  <name>mapred.job.tracker</name>
  <value>localhost:8021</value>
  </property>
  
  </configuration>
Step #3: Format the namenode:

./bin/hadoop namenode -format
Step #4: Start your hadoop cluster -

./bin/start-all.sh
Step #5: Test the HDFS system by coping a file into the hdfs

//creates a directory named census-original-files under /user/$username/

./bin/hadoop fs -mkdir census-original-files

//list the directory and you should see the above directory

./bin/hadoop fs -ls

//copy a file from local to hdfs

./bin/hadoop fs -copyFromLocal README.txt /user/Kumar/census-original-files/

//check the content of the file in hdfs

./bin/hadoop fs -cat /user/Kumar/census-original-files/README.txt

//check whether all the mapreduce java tasks have started:

jps

expected java process (ignore port#):
6722 SecondaryNameNode
6571 NameNode
6779 JobTracker
6647 DataNode
6855 TaskTracker

if any of the above java process is missing - check the logs directory. (Mostly the configuration files are not set right or directory write permissions are missing.)
Step #6: To shutdown the cluster

./bin/stop-all.sh

Issues encountered:

Issue #1: If you encounter errors while coping content into the HDFS as mentioned below. Possible solution is
physically delete all the files under the dfs.name.dir & dfs.data.dir then run the format command (step #3).
If the above steps does not solve the issue, validate the configurations under /conf directory. (Note - Cloudera distribution bundles example conf files under /example-confs directory.)

could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

Issue #2: JobTracker or TaskTracker is not loading - it could possible be missing the configuration on the mapred-site.xml

FATAL org.apache.hadoop.mapred.JobTracker: java.lang.RuntimeException: Not a host:port pair: local

    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:140)

    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:124)

    at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2427)

    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)

    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2043)

    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:294)

    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:286)

    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4767)

Issue #3: Give read/write permission to all the directory configured for the dfs.

ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because org.apache.hadoop.util.DiskChecker$DiskErrorException: all local directories are not writable
    at org.apache.hadoop.mapred.TaskTracker.checkLocalDirs(TaskTracker.java:3495)
    at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:659)
    at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:734)
    at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1431)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3521)

Reference Links:

http://blog.lars-francke.de/

Hope these steps save you time.

Knowledge Donor

Thursday, May 19, 2011

Installing Cloudera Hadoop (hadoop-0.20.2-cdh3u0) on Mac 10.6.x