中铁建设工程项目公示网站,自学网站建设多久,哪些网站可以做详情页,网页浏览器怎么扫描二维码文章目录 一. HDFS回收箱机制逻辑1. 基本逻辑2. 举例说明 二. 配置测试1. 配置2. 回收箱相关命令 三. 其他问题讨论1. api不会走trash机制2. 因为设置了Trash configuration导致nn无法响应 一. HDFS回收箱机制逻辑
1. 基本逻辑 If trash configuration is enabled, files remo… 文章目录 一. HDFS回收箱机制逻辑1. 基本逻辑2. 举例说明 二. 配置测试1. 配置2. 回收箱相关命令 三. 其他问题讨论1. api不会走trash机制2. 因为设置了Trash configuration导致nn无法响应 一. HDFS回收箱机制逻辑
1. 基本逻辑 If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under /user//.Trash). The file can be restored quickly as long as it remains in trash. Most recent deleted files are moved to the current trash directory (/user//.Trash/Current), and in a configurable interval, HDFS creates checkpoints (under /user//.Trash/) for files in current trash directory and deletes old checkpoints when they are expired. See expunge command of FS shell about checkpointing of trash. After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS. 当hdfs配置了回收箱后文件删除后会移动到回收箱目录每个用户都有自己的回收箱目录/user/username/.Trash。只要文件删除后还在回收箱就可以随时恢复。
删除的文件被移动到/user/username/.Trash/Current目录下。配置时间间隔后HDFS为当前垃圾目录下的文件创建检查点(在/user//. trash /下)并在过期时删除旧的检查点。
当过期时namenode会从namespace删除此文件。文件删除后会释放与之相关的块。文件删除后磁盘空间的增加会有延迟。
参考File Deletes and Undeletes 2. 举例说明 This will result in deleted files being move to trash and retained in trash for 4 days (i.e. fs.trash.interval); every 12 hours (i.e. fs.trash.checkpoint.interval) the Trash is scanned and files older than 4 days will be deleted (“expunged”) from Trash. 比如删除文件后文件将会在回收箱保留4天通过fs.trash.interval设置每12小时通过 fs.trash.checkpoint.interval设置扫描回收箱并将过期超过4天的文件删除。 二. 配置测试
1. 配置
在hdfs上面开启trash功能默认是没有开启的。只需要在hadoop的配置文件core-site.xml中添加下面的内容
propertynamefs.trash.interval/namevalue360/valuedescription检查点被删除后的分钟数。如果为零垃圾桶功能将被禁用。该选项可以在服务器和客户端上配置。如果垃圾箱被禁用服务器端则检查客户端配置。如果在服务器端启用垃圾箱则会使用服务器上配置的值并忽略客户端配置值。/description
/propertypropertynamefs.trash.checkpoint.interval/namevalue0/valuedescription垃圾检查点之间的分钟数。应该小于或等于fs.trash.interval。如果为零则将该值设置为fs.trash.interval的值。每次检查指针运行时它都会从当前创建一个新的检查点并删除比fs.trash.interval更早创建的检查点。/description
/property不需要重启直接执行
2. 回收箱相关命令
# 删除bin/hdfs dfs -rm /conf.tar.gz
2023-12-05 14:54:43,989 INFO fs.TrashPolicyDefault: Moved: hdfs://xxx/conf.tar.gz to trash at: hdfs://xmanhdfs3/user/taiyi/.Trash/Current/conf.tar.gz# 查看回收箱文件
bin/hdfs dfs -ls hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz
-rw-r--r-- 3 taiyi supergroup 7605 2023-12-05 14:54 hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz# 文件恢复就是将文件从回收箱中移出
bin/hdfs dfs -mv hdfs://xxx/user/taiyi/.Trash/Current/conf.tar.gz /# 清空回收站
bin/hdfs dfs -expunge# 跳过回收站直接删除
hdfs dfs -rm -r -skipTrash /user/root/123123三. 其他问题讨论
1. api不会走trash机制
但如果直接调用hadoop delete api进行删除操作是默认不会走trash机制的同时也未配置快照功能的情况下文件所对应的block数据已经开始真正从底层文件系统层面进行删除此时需要快速的做出决断进行恢复操作。
因为需要停止数据服务nn、dn所以需要综合考虑去权衡恢复数据和停服对线上服务的影响两者之间的利害关系。
参考 恢复数据 如何有效恢复误删的HDFS文件 ing 2. 因为设置了Trash configuration导致nn无法响应
Hadoop NameNode becomes un-responsive due to Trash configuration Resolving The Problem In order to prevent the NameNode having to perform an extreme amount of file to block map maintenance (which will also impact the DataNode(s)), the settings for fs.trash.interval and fs.trash.checkpoint.interval should be set so that the amount of data to be expunged at a single point of time is within the capability of the environment; a suggestion being under 10GB. namenode单次删除回收箱的数据不大于10G