I am happy to post another blog post on how to use map reduce to find Max and Min temperature of a data set.
I have a dataset (say temperaturedata.txt) file. I like to analyze the above data set using map reduce.
I have a written a map reduce program to find the max and min temp of the year.
I have a data set in my linux file system and I need to move that to hadoop file system.
hdfs dfs -copyFromLocal /home/cloudera/Downloads/temperaturedata.txt /jpraveen/temperaturedata.txt
Now the file is in hdfs directory i.e (hdfs dfs -ls /jpraveen/)
Now you need to convert your map reduce program into a jar file and run the jar file using the below command.
hadoop jar MaxMinTemp.jar /jpraveen/temperaturedata.txt ~/output1
Below code snippet in the main method of your program
FileInputFormat.setInputPaths(job, new Path(args)) represents this value /jpraveen/temperaturedata.txt. FileOutputFormat.setOutputPath(job, new Path(args)) represents this value ~/output1. JobClient.runJob(job) this will trigger the map reduce job to start.
Moreover you can also track the job status which you will get at the time of running the map reduce.
The output of the above job can seen using the below command or through file browser
hdfs dfs -cat /root/output1/part-00000 . (or) http://quickstart.cloudera:50070/explorer.html#/root/output5.