新手建站1 网站建设过程一览,wordpress超链接无下划线,营销网站建设网络公司,wordpress数据库改密码线上部分job运行失败#xff0c;报OOM的错误:因为是maptask报错#xff0c;怀疑是map数量过少#xff0c;导致oom#xff0c;因此调整参数#xff0c;增加map数量#xff0c;但是问题依然存在。看来和map的数量没有关系。通过jobid查找jobhistory中对应的日志信息#x… 线上部分job运行失败报OOM的错误:因为是maptask报错怀疑是map数量过少导致oom因此调整参数增加map数量但是问题依然存在。看来和map的数量没有关系。通过jobid查找jobhistory中对应的日志信息定位到出错的task id和对应的host.通过日志查看出问题的containerid.由于container是由RM进行分配的查看RM的日志可以看到container的分配情况比如下面的例子2014-05-06 16:00:00,632 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_1399267192386_43455_01_000037 of capacity memory:1536, vCores:1 on host xxxx:44614, which currently has 4 containers, memory:6144, vCores:4 used and memory:79872, vCores:42 available可以看到container的id,host和内存大小cpu 大小。进一步查看NM的相关container日志2014-05-05 10:14:47,001 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1399203487215_21532_01_000035 by user hdfs
2014-05-05 10:14:47,001 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USERhdfs IP10.201.203.111 OPERATIONStart Container Request TARGETContainerManageImpl RESULTSUCCESS APPIDapplication_1399203487215_21532 CONTAINERIDcontainer_1399203487215_21532_01_000035
2014-05-05 10:14:47,001 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1399203487215_21532_01_000035 to application application_1399203487215_21532
2014-05-05 10:14:47,055 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1399203487215_21532_01_000035 transitioned from NEW to LOCALIZING
2014-05-05 10:14:47,058 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1399203487215_21532_01_000035
2014-05-05 10:14:47,060 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/vipshop/hard_disk/10/yarn/local/nmPrivate/container_1399203487215_21532_01_000035.tokens. Credentials list:
2014-05-05 10:14:47,412 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1399203487215_21532_01_000035 transitioned from LOCALIZING to LOCALIZED
2014-05-05 10:14:47,454 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1399203487215_21532_01_000035 transitioned from LOCALIZED to RUNNING
2014-05-05 10:14:47,493 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /home/vipshop/hard_disk/6/yarn/local/usercache/hdfs/appcache/application_1399203487215_21532/container_1399203487215_21532_01_000035/default_container_executor.sh]
2014-05-05 10:14:48,827 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/vipshop/hard_disk/10/yarn/local/nmPrivate/container_1399203487215_21532_01_000035.tokens to /home/vipshop/hard_disk/11/yarn/local/usercache/hdfs/appcache/application_1399203487215_21532/container_1399203487215_21532_01_000035.tokens
2014-05-05 10:14:49,169 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1399203487215_21532_01_000035
2014-05-05 10:14:49,305 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21209 for container-id container_1399203487215_21532_01_000035: 66.7 MB of 1.5 GB physical memory used; 2.1 GB of 3.1 GB virtual memory used
2014-05-05 10:14:53,063 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21209 for container-id container_1399203487215_21532_01_000035: 984.1 MB of 1.5 GB physical memory used; 2.1 GB of 3.1 GB virtual memory used
2014-05-05 10:14:56,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21209 for container-id container_1399203487215_21532_01_000035: 984.5 MB of 1.5 GB physical memory used; 2.1 GB of 3.1 GB virtual memory used
.......
2014-05-05 10:19:26,823 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21209 for container-id container_1399203487215_21532_01_000035: 1.1 GB of 1.5 GB physical memory used; 2.1 GB of 3.1 GB virtual memory used
2014-05-05 10:19:27,459 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USERhdfs IP10.201.203.111 OPERATIONStop Container Request TARGETContainerManageImpl RESULTSUCCESS APPIDapplication_1399203487215_21532 CONTAINERIDcontainer_1399203487215_21532_01_000035
2014-05-05 10:19:27,459 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1399203487215_21532_01_000035 transitioned from RUNNING to KILLING
2014-05-05 10:19:27,459 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1399203487215_21532_01_000035
2014-05-05 10:19:27,800 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1399203487215_21532_01_000035 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL可以看到虽然container分配的内存为1.5但是在使用到1.1G1.1 GB of 1.5 GB physical memory used时task被kill掉了。。还有400多M的剩余看来不是task的整个内存大小分配的太小导致比较像perm的问题默认为64m更新mapred的设置如下propertynamemapreduce.map.java.opts/namevalue-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio6 -XX:MaxPermSize128m/value/propertypropertynamemapreduce.reduce.java.opts/namevalue-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio6 -XX:MaxPermSize128m/value/propertypropertynamemapred.child.java.opts/namevalue-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio6 -XX:MaxPermSize128m/valuefinaltrue/final/property重新运行job成功。其实对应java的oom问题来说最好的方法是打印gc的信息和dump内存的堆栈然后使用MAT一类的工具来进行分析。 转载于:https://blog.51cto.com/caiguangguang/1407424