阿里巴巴网站建设要多少钱,友情链接只有链接,自学网站开发哪个网站好,盘锦做网站电话国内私募机构九鼎控股打造APP#xff0c;来就送 20元现金领取地址#xff1a;http://jdb.jiudingcapital.com/phone.html内部邀请码#xff1a;C8E245J #xff08;不写邀请码#xff0c;没有现金送#xff09;国内私募机构九鼎控股打造#xff0c;九鼎投资是在全国股份…国内私募机构九鼎控股打造APP来就送 20元现金领取地址http://jdb.jiudingcapital.com/phone.html内部邀请码C8E245J 不写邀请码没有现金送国内私募机构九鼎控股打造九鼎投资是在全国股份转让系统挂牌的公众公司股票代码为430719为“中国PE第一股”市值超1000亿元。 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ 原文地址 http://my.oschina.net/lanzp/blog/309078 目录[-] 1、开发配置环境2、Hadoop服务端配置Master节点3、基于Eclipse的Hadoop2.x开发环境配置4、运行Hadoop程序及查看运行日志 1、开发配置环境 开发环境Win764bitEclipsekepler service release 2 配置环境Ubuntu Server 14.04.1 LTS64-bit only 辅助工具WinSCP Putty Hadoop版本2.5.0 Hadoop的Eclipse开发插件2.x版本适用http://pan.baidu.com/s/1eQy49sm 服务器端JDK版本OpenJDK7.0 以上所有工具请自行下载安装。 2、Hadoop服务端配置Master节点 最近一直在摸索Hadoop2的配置因为Hadoop2对原有的一些框架API做了调整但也还是兼容旧版本的包括配置。像我这种就喜欢用新的东西的人当然要尝一下鲜了现在网上比较少新版本的配置教程那么下面我就来分享一下我自己的实战经验如有不正确的地欢迎指正:)。 假设我们已经成功地安装了Ubuntu Server、OpenJDK、SSH如果还没有安装的话请先安装自己网上找一下教程这里我就说一下SSH的无口令登陆设置。首先通过 ? 1 $ ssh localhost 测试一下自己有没有设置好无口令登陆如果没有设置好系统将要求你输入密码通过下面的设置可以实现无口令登陆具体原理请百度谷歌 ? 1 2 $ ssh-keygen -t dsa -P -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys 其次是Hadoop安装假设已经安装好OpenJDK以及配置好了环境变量到Hadoop官网下载一个Hadoop2.5.0版本的下来好像大概有100多M的tar.gz包下载 下来后自行解压我的是放在/usr/mywind下面Hadoop主目录完整路径是/usr/mywind/hadoop这个路径根据你个人喜好放吧。 解压完后打开hadoop主目录下的etc/hadoop/hadoop-env.sh文件在最后面加入下面内容 ? 1 2 3 4 5 # set to the root of your Java installation export JAVA_HOME/usr/lib/jvm/java-7-openjdk-amd64 # Assuming your installation directory is /usr/mywind/hadoop export HADOOP_PREFIX/usr/mywind/hadoop 为了方便起见我建设把Hadoop的bin目录及sbin目录也加入到环境变量中我是直接修改了Ubuntu的/etc/environment文件内容如下 ? 1 2 3 PATH/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java-7-openjdk-amd64/bin:/usr/mywind/hadoop/bin:/usr/mywind/hadoop/sbin JAVA_HOME/usr/lib/jvm/java-7-openjdk-amd64 CLASSPATH.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 也可以通过修改profile来完成这个设置看个人习惯咯。假如上面的设置你都完成了可以在命令行里面测试一下Hadoop命令如下图 假如你能看到上面的结果恭喜你Hadoop安装完成了。接下来我们可以进行伪分布配置Hadoop可以在伪分布模式下运行单结点。 接下来我们要配置的文件有四个分别是/usr/mywind/hadoop/etc/hadoop目录下的yarn-site.xml、mapred-site.xml、hdfs-site.xml、core-site.xml注意这个版本下默认没有yarn-site.xml文件但有个yarn-site.xml.properties文件把后缀修改成前者即可关于yarn新特性可以参考官网或者这个文章http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/。 首先是core-site.xml配置HDFS地址及临时目录默认的临时目录在重启后会删除 ? 1 2 3 4 5 6 7 8 9 10 11 12 configuration property namefs.defaultFS/name valuehdfs://192.168.8.184:9000/value descriptionsame as fs.default.name/description /property property namehadoop.tmp.dir/name value/usr/mywind/tmp/value descriptionA base for other temporary directories./description /property /configuration 然后是hdfs-site.xml配置集群数量及其他一些可选配置比如NameNode目录、DataNode目录等等 ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 configuration property namedfs.namenode.name.dir/name value/usr/mywind/name/value descriptionsame as dfs.name.dir/description /property property namedfs.datanode.data.dir/name value/usr/mywind/data/value descriptionsame as dfs.data.dir/description /property property namedfs.replication/name value1/value descriptionsame as old frame,recommend set the value as the cluster DataNode host numbers!/description /property /configuration 接着是mapred-site.xml配置启用yarn框架 ? 1 2 3 4 5 6 configuration property namemapreduce.framework.name/name valueyarn/value /property /configuration 最后是yarn-site.xml配置NodeManager: ? 1 2 3 4 5 6 7 configuration !-- Site specific YARN configuration properties -- property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property /configuration 注意网上的旧版本教程可能会把value写成mapreduce.shuffle这个要特别注意一下的至此我们所有的文件配置都已经完成了下面进行HDFS文件系统进行格式化 ? 1 2 $ hdfs namenode -format 然后启用NameNode及DataNode进程 ? 1 2 $ start-yarn.sh 然后创建hdfs文件目录 ? 1 2 $ hdfs dfs -mkdir /user $ hdfs dfs -mkdir /user/a01513 注意这个a01513是我在Ubuntu上的用户名最好保持与系统用户名一致据说不一致会有许多权限等问题我之前试过改成其他名字报错实在麻烦就改成跟系统用户名一致吧。 然后把要测试的输入文件放在文件系统中 ? 1 $ hdfs dfs -put /usr/mywind/psa input 文件内容是Hadoop经典的天气例子的数据 ? 1 2 3 4 5 6 7 8 9 12345679867623119010123456798676231190101234567986762311901012345679867623119010123456001212345678903456 12345679867623119010123456798676231190101234567986762311901012345679867623119010123456011212345678903456 12345679867623119010123456798676231190101234567986762311901012345679867623119010123456021212345678903456 12345679867623119010123456798676231190101234567986762311901012345679867623119010123456003212345678903456 12345679867623119010123456798676231190201234567986762311901012345679867623119010123456004212345678903456 12345679867623119010123456798676231190201234567986762311901012345679867623119010123456010212345678903456 12345679867623119010123456798676231190201234567986762311901012345679867623119010123456011212345678903456 12345679867623119010123456798676231190501234567986762311901012345679867623119010123456041212345678903456 12345679867623119010123456798676231190501234567986762311901012345679867623119010123456008212345678903456 把文件拷贝到HDFS目录之后我们可以通过浏览器查看相关的文件及一些状态 http://192.168.8.184:50070/ 这里的IP地址根据你实际的Hadoop服务器地址啦。 好吧我们所有的Hadoop后台服务搭建跟数据准备都已经完成了那么我们的M/R程序也要开始动手写了不过在写当然先配置开发环境了。 3、基于Eclipse的Hadoop2.x开发环境配置 关于JDK及ECLIPSE的安装我就不再介绍了相信能玩Hadoop的人对这种配置都已经再熟悉不过了如果实在不懂建议到谷歌百度去搜索一下教程。假设你已经把Hadoop的Eclipse插件下载下来了然后解压把jar文件放到Eclipse的plugins文件夹里面 重启Eclipse即可。 然后我们再安装Hadoop到Win7下在这不再详细说明跟安装JDK大同小异在这个例子中我安装到了E:\hadoop。 启动Eclipse点击菜单栏的【Windows/窗口】→【Preferences/首选项】→【Hadoop Map/Reduce】,把Hadoop Installation Directory设置成开发机上的Hadoop主目录 点击OK。 开发环境配置完成下面我们可以新建一个测试Hadoop项目右键【NEW/新建】→【Others、其他】选择Map/Reduce Project 输入项目名称点击【Finish/完成】 创建完成后可以看到如下目录 然后在SRC下建立下面包及类 以下是代码内容 TestMapper.java ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 package com.my.hadoop.mapper; import java.io.IOException; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; public class TestMapper extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private static final int MISSING 9999; private static final Log LOG LogFactory.getLog(TestMapper.class); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output,Reporter reporter) throws IOException { String line value.toString(); String year line.substring(15, 19); int airTemperature; if (line.charAt(87) ) { // parseInt doesnt like leading plus signs airTemperature Integer.parseInt(line.substring(88, 92)); } else { airTemperature Integer.parseInt(line.substring(87, 92)); } LOG.info(loki:airTemperature); String quality line.substring(92, 93); LOG.info(loki2:quality); if (airTemperature ! MISSING quality.matches([012459])) { LOG.info(loki3:quality); output.collect(new Text(year), new IntWritable(airTemperature)); } } } TestReducer.java ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package com.my.hadoop.reducer; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reducer; public class TestReducer extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { Override public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output,Reporter reporter) throws IOException{ int maxValue Integer.MIN_VALUE; while (values.hasNext()) { maxValue Math.max(maxValue, values.next().get()); } output.collect(key, new IntWritable(maxValue)); } } TestHadoop.java ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 package com.my.hadoop.test.main; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import com.my.hadoop.mapper.TestMapper; import com.my.hadoop.reducer.TestReducer; public class TestHadoop { public static void main(String[] args) throws Exception{ if (args.length ! 2) { System.err .println(Usage: MaxTemperature input path output path); System.exit(-1); } JobConf job new JobConf(TestHadoop.class); job.setJobName(Max temperature); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(TestMapper.class); job.setReducerClass(TestReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); JobClient.runJob(job); } } 为了方便对于Hadoop的HDFS文件系统操作我们可以在Eclipse下面的Map/Reduce Locations窗口与Hadoop建立连接直接右键新建Hadoop连接即可 连接配置如下 然后点击完成即可新建完成后我们可以在左侧目录中看到HDFS的文件系统目录 这里不仅可以显示目录结构还可以对文件及目录进行删除、新增等操作非常方便。 当上面的工作都做好之后就可以把这个项目导出来了导成jar文件放到Hadoop服务器上运行 点击完成然后把这个testt.jar文件上传到Hadoop服务器192.168.8.184上目录其实可以放到其他目录你自己喜欢是 ? 1 /usr/mywind/hadoop/share/hadoop/mapreduce 如下图 4、运行Hadoop程序及查看运行日志 当上面的工作准备好了之后我们运行自己写的Hadoop程序很简单 ? 1 $ hadoop jar /usr/mywind/hadoop/share/hadoop/mapreduce/testt.jar com.my.hadoop.test.main.TestHadoop input output 注意这是output文件夹名称不能重复哦假如你执行了一次在HDFS文件系统下面会自动生成一个output文件夹第二次运行时要么把output文件夹先删除$ hdfs dfs -rmr /user/a01513/output要么把命令中的output改成其他名称如output1、output2等等。 如果看到以下输出结果证明你的运行成功了 ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 a01513hadoop :~$ hadoop jar /usr/mywind/hadoop/share/hadoop/mapreduce/testt.jar com.my.hadoop.test.main.TestHadoop input output 14/09/02 11:14:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8032 14/09/02 11:14:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8032 14/09/02 11:14:04 WARN mapreduce.JobSubmitter: Hadoop command-line option parsin g not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 14/09/02 11:14:04 INFO mapred.FileInputFormat: Total input paths to process : 1 14/09/02 11:14:04 INFO mapreduce.JobSubmitter: number of splits:2 14/09/02 11:14:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 09386620927_0015 14/09/02 11:14:05 INFO impl.YarnClientImpl: Submitted application application_14 09386620927_0015 14/09/02 11:14:05 INFO mapreduce.Job: The url to track the job: http://hadoop:80 88/proxy/application_1409386620927_0015/ 14/09/02 11:14:05 INFO mapreduce.Job: Running job: job_1409386620927_0015 14/09/02 11:14:12 INFO mapreduce.Job: Job job_1409386620927_0015 running in uber mode : false 14/09/02 11:14:12 INFO mapreduce.Job: map 0% reduce 0% 14/09/02 11:14:21 INFO mapreduce.Job: map 100% reduce 0% 14/09/02 11:14:28 INFO mapreduce.Job: map 100% reduce 100% 14/09/02 11:14:28 INFO mapreduce.Job: Job job_1409386620927_0015 completed successfully 14/09/02 11:14:29 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read105 FILE: Number of bytes written289816 FILE: Number of read operations0 FILE: Number of large read operations0 FILE: Number of write operations0 HDFS: Number of bytes read1638 HDFS: Number of bytes written10 HDFS: Number of read operations9 HDFS: Number of large read operations0 HDFS: Number of write operations2 Job Counters Launched map tasks2 Launched reduce tasks1 Data-local map tasks2 Total time spent by all maps in occupied slots (ms)14817 Total time spent by all reduces in occupied slots (ms)4500 Total time spent by all map tasks (ms)14817 Total time spent by all reduce tasks (ms)4500 Total vcore-seconds taken by all map tasks14817 Total vcore-seconds taken by all reduce tasks4500 Total megabyte-seconds taken by all map tasks15172608 Total megabyte-seconds taken by all reduce tasks4608000 Map-Reduce Framework Map input records9 Map output records9 Map output bytes81 Map output materialized bytes111 Input split bytes208 Combine input records0 Combine output records0 Reduce input groups1 Reduce shuffle bytes111 Reduce input records9 Reduce output records1 Spilled Records18 Shuffled Maps 2 Failed Shuffles0 Merged Map outputs2 GC time elapsed (ms)115 CPU time spent (ms)1990 Physical memory (bytes) snapshot655314944 Virtual memory (bytes) snapshot2480295936 Total committed heap usage (bytes)466616320 Shuffle Errors BAD_ID0 CONNECTION0 IO_ERROR0 WRONG_LENGTH0 WRONG_MAP0 WRONG_REDUCE0 File Input Format Counters Bytes Read1430 File Output Format Counters Bytes Written10 a01513hadoop :~$ 我们可以到Eclipse查看输出的结果 或者用命令行查看 ? 1 2 $ hdfs dfs -cat output/part-00000 假如你们发现运行后结果是为空的可能到日志目录查找相应的log.info输出信息log目录在/usr/mywind/hadoop/logs/userlogs 下面。 好了不太喜欢打字以上就是整个过程了欢迎大家来学习指正。 转载于:https://www.cnblogs.com/AloneSword/p/3955935.html