泉州优化seo网站关键词优化,ens域名注册,wordpress 添加地图吗,网站tdk标签文章目录1. MapReduce 作业流程2. 实践2.1 启动 hadoop2.2 创建 java 项目2.3 MapReduce shell2.4 MapReduce Web UI3. MapReduce 编程实践#xff1a;统计对象中的某些属性参考书#xff1a;《Hadoop大数据原理与应用》1. MapReduce 作业流程 2. 实践
2.1 启动 hadoop
sta…
文章目录1. MapReduce 作业流程2. 实践2.1 启动 hadoop2.2 创建 java 项目2.3 MapReduce shell2.4 MapReduce Web UI3. MapReduce 编程实践统计对象中的某些属性参考书《Hadoop大数据原理与应用》1. MapReduce 作业流程 2. 实践
2.1 启动 hadoop
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
# 第三条可以用下面的命令上面的显示过期了以后弃用
mapred --daemon start historyserver2.2 创建 java 项目 WordCountMapper.java
package com.michael.mapreduce;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;public class WordCountMapper extends MapperLongWritable, Text, Text, IntWritable{//self define map method 自定义 map 方法Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{String line value.toString();String[] words line.split( );for(String word : words)context.write(new Text(word), new IntWritable(1));// context.write() give to next stage: shuffle}
}WordCountReducer.java
package com.michael.mapreduce;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;public class WordCountReducer extends ReducerText, IntWritable, Text, IntWritable{// 自定义 reduce 方法Overrideprotected void reduce(Text key, IterableIntWritable values, Context context) throwsIOException, InterruptedException{int sum 0;for(IntWritable value : values)sum value.get();context.write(key, new IntWritable(sum));}
}WordCountDriver.javadirver 类设置本次 job
package com.michael.mapreduce;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.BZip2Codec;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;public class WordCountDriver {// args 参数 输入输出文件路径public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{Configuration conf new Configuration();// map compress 开启 map 阶段的压缩conf.setBoolean(mapreduce.map.output.compress, true);// compress type指定压缩类型conf.setClass(mapreduce.map.output.compress.codec, BZip2Codec.class, CompressionCodec.class);Job job Job.getInstance(conf, word count diy:);job.setJarByClass(WordCountDriver.class);job.setMapperClass(WordCountMapper.class);// 自定义 Combinejob.setCombinerClass(WordCountReducer.class);job.setReducerClass(WordCountReducer.class);// 指定 map 输出数据的类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);// 指定 reduce 输出数据类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);// 设置输入文件路径FileInputFormat.setInputPaths(job, new Path(args[0]));// 设置输出文件路径FileOutputFormat.setOutputPath(job, new Path(args[1]));// 开启 reduce 阶段的解压缩FileOutputFormat.setCompressOutput(job, true);// 指定解压缩类型跟上面压缩类型一致FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);boolean result job.waitForCompletion(true);System.exit(result ? 0 : 1);}
}导出 wordcount_diy.jar - 提交hadoop执行
hadoop jar /home/dnn/eclipse-workspace/HDFS_example/wordcount_diy.jar com.michael.mapreduce.WordCountDriver /InputDataTest /OutputDataTest1查看结果
hdfs dfs -cat /OutputDataTesdfs dfs -cat /OutputDataTest1/part-r-00000.bz2显示乱码需要下载然后解压再查看
下载
hdfs dfs -get /OutputDataTest1/part-r-00000.bz2 /home/dnn/eclipse-workspace/HDFS_example/part-r-00000.bz2查看
bzcat /home/dnn/eclipse-workspace/HDFS_example/part-r-00000.bz22.3 MapReduce shell
查看作业状态
mapred job -status job_1615849408082_0001[dnnmaster Desktop]$ mapred job -status job_1615849408082_0001
WARNING: HADOOP_MAPRED_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of HADOOP_MAPRED_PID_DIR.
2021-03-26 04:25:14,881 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master/192.168.253.130:8032
2021-03-26 04:25:15,939 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatusSUCCEEDED. Redirecting to job history serverJob: job_1615849408082_0001
Job File: hdfs://192.168.253.130:9000/tmp/hadoop-yarn/staging/history/done/2021/03/24/000000/job_1615849408082_0001_conf.xml
Job Tracking URL : http://master:19888/jobhistory/job/job_1615849408082_0001
Uber job : false
Number of maps: 3
Number of reduces: 1
map() completion: 1.0
reduce() completion: 1.0
Job state: SUCCEEDED
retired: false
reason for failure:
Counters: 54File System CountersFILE: Number of bytes read6640FILE: Number of bytes written1072644FILE: Number of read operations0FILE: Number of large read operations0FILE: Number of write operations0HDFS: Number of bytes read25631HDFS: Number of bytes written4967HDFS: Number of read operations14HDFS: Number of large read operations0HDFS: Number of write operations2HDFS: Number of bytes read erasure-coded0Job Counters Launched map tasks3Launched reduce tasks1Data-local map tasks3Total time spent by all maps in occupied slots (ms)43801Total time spent by all reduces in occupied slots (ms)5037Total time spent by all map tasks (ms)43801Total time spent by all reduce tasks (ms)5037Total vcore-milliseconds taken by all map tasks43801Total vcore-milliseconds taken by all reduce tasks5037Total megabyte-milliseconds taken by all map tasks44852224Total megabyte-milliseconds taken by all reduce tasks5157888Map-Reduce FrameworkMap input records667Map output records3833Map output bytes40605Map output materialized bytes8455Input split bytes358Combine input records3833Combine output records1264Reduce input groups913Reduce shuffle bytes8455Reduce input records1264Reduce output records913Spilled Records2528Shuffled Maps 3Failed Shuffles0Merged Map outputs3GC time elapsed (ms)818CPU time spent (ms)3140Physical memory (bytes) snapshot599461888Virtual memory (bytes) snapshot10950950912Total committed heap usage (bytes)385351680Peak Map Physical memory (bytes)167784448Peak Map Virtual memory (bytes)2735529984Peak Reduce Physical memory (bytes)96972800Peak Reduce Virtual memory (bytes)2744360960Shuffle ErrorsBAD_ID0CONNECTION0IO_ERROR0WRONG_LENGTH0WRONG_MAP0WRONG_REDUCE0File Input Format Counters Bytes Read25273File Output Format Counters Bytes Written49672.4 MapReduce Web UI
http://192.168.253.130:19888/jobhistory3. MapReduce 编程实践统计对象中的某些属性
MapReduce 编程实践统计对象中的某些属性