第一成品网站超市,wordpress 引用图片,公众号平台怎么弄的,中国建设厅官方网站一、环境准备
1.1#xff1a;在ubuntu上安装idea
我们知道#xff0c;在hdfs分布式系统中#xff0c;MapReduce这部分程序是需要用户自己开发#xff0c;我们在ubuntu上安装idea也是为了开发wordcount所需的Map和Reduce程序#xff0c;最后打包#xff0c;上传到hdfs上…一、环境准备
1.1在ubuntu上安装idea
我们知道在hdfs分布式系统中MapReduce这部分程序是需要用户自己开发我们在ubuntu上安装idea也是为了开发wordcount所需的Map和Reduce程序最后打包上传到hdfs上。
在ubuntu上安装idea的教程我参考的是这篇Ubuntu 中安装 IDEA
1.2下载maven
maven是什么maven是一个项目构建和管理的工具提供了帮助管理 构建、文档、报告、依赖、scms、发布、分发的方法。
Maven的安装和配置参考这篇Ubuntu20.04下配置MavenIDEA配置
二、MapReduce的实现
2.1在IDEA中配置Maven
注意确保你是在IDEA的欢迎界面进行配置这一步很重要决定了你的配置是不是全局的如果你在项目中的话请点击菜单的“文件”-关闭项目回到欢迎界面一定要注意
步骤1—— 在欢迎界面打开设置 点击所有设置或者直接用打开设置的快捷键 Ctrl Alt S
步骤2—— 找到maven配置项 左上角搜索框搜索maven回车主要修改红色框内的几个配置 步骤3—— 修改maven配置
按照下图配置即可maven的主路径指的就是maven的主文件夹用户设置文件就是我们刚刚上面下载maven中配置的那个文件settings.xml本地仓库就是我们自己新建的一个文件夹所有从中央仓库下载的jar包会放在这里面如果你按照我上面1.2的思路配置那么这三个路径应该是这种 Maven主路径/opt/maven/apache-maven-3.9.7 用户设置文件/opt/maven/apache-maven-3.9.7/conf/settings.xml 本地仓库/opt/maven/repository用来存储从远程仓库或中央仓库下载的插件和jar包项目使用一些插件或jar包优先从本地仓库查找 注意设置完后需要重启IDEA设置才可生效 2.1新建一个Maven空项目wordCount 2.2添加依赖 dependenciesdependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-common/artifactIdversion3.3.5/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-hdfs/artifactIdversion3.3.5/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-core/artifactIdversion3.3.5/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-client/artifactIdversion3.3.5/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-jobclient/artifactIdversion3.3.5/versionscopeprovided/scope/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-common/artifactIdversion3.3.5/version/dependency/dependencies 2.3创建WordCount类 /*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* License); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an AS IS BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.example;import java.io.IOException;
import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {public static class TokenizerMapperextends MapperObject, Text, Text, IntWritable{private final static IntWritable one new IntWritable(1);private Text word new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class IntSumReducerextends ReducerText,IntWritable,Text,IntWritable {private IntWritable result new IntWritable();public void reduce(Text key, IterableIntWritable values,Context context) throws IOException, InterruptedException {int sum 0;for (IntWritable val : values) {sum val.get();}result.set(sum);context.write(key, result);}}public static void main(String[] args) throws Exception {Configuration conf new Configuration();String[] otherArgs new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length 2) {System.err.println(Usage: wordcount in [in...] out);System.exit(2);}Job job Job.getInstance(conf, word count);job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i 0; i otherArgs.length - 1; i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}
}2.4初始化文件
在工程根目录新建input文件夹增加两个文件
input
- file1.TXT
Hello Yaoyao
- file2.txt
Hello Hadoop2.5:运行配置 配置完成后点击apply-ok
2.6:运行 – part-r-00000文件是结果文件 – _SUCCESS文件是成功标志 – _logs目录是日志目录
三、打包到服务器使用hadoop jar命令执行
3.1pom.xml增加打包插件 buildplugins!--指定主函数和各个依赖--pluginartifactIdmaven-assembly-plugin/artifactIdversion3.6.0/versionconfigurationarchivemanifestmainClassorg.example.WordCount/mainClass/manifest/archivedescriptorRefsdescriptorRefjar-with-dependencies/descriptorRef/descriptorRefs/configurationexecutionsexecutionidmake-assembly/idphasepackage/phasegoalsgoalsingle/goal/goals/execution/executions/plugin/plugins/build
3.2:maven打包
直接在终端执行命令
mvn clean install我们得到了jar包 /home/hadoop/JavaProject/wordCount/target/wordCount-1.0-SNAPSHOT-jar-with-dependencies.jar jar包也就是jar文件Java Archive。可以这么理解它类似于zip包。但与zip文件不同的是jar文件不仅用于压缩和发布而且还用于部署和封装库、组件和插件程序并可以被像编译器和 JVM 这样的工具直接使用。 3.3使用java -jar执行 在当前可执行jar目录初始化input文件夹 在当前目录打开命令行执行以下命令即可在当前目录生成output文件夹里面就是执行结果。(注意需要先启动hadoop) java -jar wordCount-1.0-SNAPSHOT-jar-with-dependencies.jar input output3.4将文件上传到hdfs使用hadoop执行
1.在hdfs上创建wordCount/input文件夹并且把本地的file1,file2上传
hadoop fs -mkdir -p /yaoyao/wordcount/input
#hadoop fs -mkdir -p /yaoyao/wordcount/output
#hadoop fs -chmod 777 /yaoyao/wordcount/output更改输出文件权限任何人有写权限。因为从本地直接使用服务器的大数据集群环境服务器集群文件没有写权限。 hadoop会自动生成输出目录无需提前创建否则报错 hadoop fs -copyFromLocal file1.txt /yaoyao/wordcount/input
hadoop fs -copyFromLocal file2.txt /yaoyao/wordcount/input查看hdfs上传的input数据内容
hadoop fs -cat /yaoyao/wordcount/input/file1.txt
hadoop fs -cat /yaoyao/wordcount/input/file2.txt2.使用hadoop命令执行jar包
hadoop jar wordCount-1.0-SNAPSHOT-jar-with-dependencies.jar /yaoyao/wordcount/input /yaoyao/wordcount/output发现报错 Exception in thread “main” org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/yaoyao/wordcount/output already exists 原因Hadoop 运行程序时输出目录不能存在 因为Hadoop会自动创建输出目录。这样做可以确保在任务执行过程中不会出现命名冲突或并发写入问题。 将输出目录删除即可
hadoop fs -rm -r /yaoyao/wordcount/output但是运行又发现报错 Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster 原因分析日志中有段Please check whether your HADOOP_HOME/etc/hadoop/mapred-site.xml contains the below configuration很明显说明配置文件有问题按照提示将mapred-site.xml配置补全
cd /usr/local/hadoop/etc/hadoop/
sudo gedit mapred-site.xml
configurationpropertynamemapreduce.framework.name/namevalueyarn/value/propertypropertynamemapreduce.application.classpath/namevalue$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*/value/propertypropertynameyarn.app.mapreduce.am.env/namevalueHADOOP_MAPRED_HOME${HADOOP_HOME}/value/propertypropertynamemapreduce.map.env/namevalueHADOOP_MAPRED_HOME${HADOOP_HOME}/value/propertypropertynamemapreduce.reduce.env/namevalueHADOOP_MAPRED_HOME${HADOOP_HOME}/value/property
/configuration并重启yarn服务
# 先停止
/usr/local/hadoop/sbin/stop-yarn.sh# 在启动
/usr/local/hadoop/sbin/start-yarn.sh到jar所在目录再次运行任务就成功了 查看结果。
hadoop fs -cat /yaoyao/wordcount/output/part-r-00000
Hadoop 1
Hello 2
World 1