技术头条 - 一个快速在微博传播文章的方式     搜索本站
您现在的位置首页 --> 系统运维 --> Linux探索:一次删除一百万个文件的最快方法

Linux探索:一次删除一百万个文件的最快方法

浏览:5828次  出处信息

最初的测评

昨天,我看到一个非常有趣的删除一个目录下的海量文件的方法。这个方法来自http://www.quora.com/How-can-someone-rapidly-delete-400-000-files里的Zhenyu Lee。

他没有使用find 或 xargs,他很有创意的利用了rsync的强大功能,使用rsync -delete将目标文件夹以一个空文件夹来替换。之后,我做了一个实验来比较各种方法。让我吃惊的是,Lee的方法要比其它的快的多。下面就是我的测评。

环境:

  • CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

  • MEM: 4G

  • HD: ST3250318AS: 250G/7200RPM

  • Method# Of FilesDeletion Time
    rsync -a -delete empty/ s1/10000006m50.638s
    find s2/ -type f -delete100000087m38.826s
    find s3/ -type f | xargs -L 100 rm100000083m36.851s
    find s4/ -type f | xargs -L 100 -P 100 rm100000078m4.658s
    rm -rf s5100000080m33.434s

    使用 -delete 和 -exclude,你可以选择性删除符合条件的文件。还有一点,当你需要保留这个目录做其它用处时,这种方法是再适合不过了。

    重新测评

    几天前,Keith-Winstein在回复Quora上的这个帖子时说我之前的测评无法复制,因为操作的时间持续的太久。我澄清一下,这些数据过大,可能是因为我的计算机在过去的几年里做的事太多,测评中可能存在一些文件系统错误。但我不确定是这些原因。现在好了,我弄了一天比较新的计算机,把测评再做一次。这次我使用/usr/bin/time,它能提供更详细的信息。下面就是新的结果。

    (每次都是1000000个文件。每个文件的体积都是0。)

    CommandElapsedSystem Time%CPUcs (Vol/Invol)
    rsync -a -delete empty/ a10.601.3195106/22
    find b/ -type f -delete28.5114.465214849/11
    find c/ -type f | xargs -L 100 rm41.6920.605437048/15074
    find d/ -type f | xargs -L 100 -P 100 rm34.3227.8289929897/21720
    rm -rf f31.2914.804715134/11

    原始输出

    # method 1
    ~/test $ /usr/bin/time -v  rsync -a --delete empty/ a/
            Command being timed: "rsync -a --delete empty/ a/"
            User time (seconds): 1.31
            System time (seconds): 10.60
            Percent of CPU this job got: 95%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.42
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 24378
            Voluntary context switches: 106
            Involuntary context switches: 22
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 2
            Command being timed: "find b/ -type f -delete"
            User time (seconds): 0.41
            System time (seconds): 14.46
            Percent of CPU this job got: 52%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.51
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 11749
            Voluntary context switches: 14849
            Involuntary context switches: 11
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    # method 3
    find c/ -type f | xargs -L 100 rm
    ~/test $ /usr/bin/time -v ./delete.sh
            Command being timed: "./delete.sh"
            User time (seconds): 2.06
            System time (seconds): 20.60
            Percent of CPU this job got: 54%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.69
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 1764225
            Voluntary context switches: 37048
            Involuntary context switches: 15074
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 4
    find d/ -type f | xargs -L 100 -P 100 rm
    ~/test $ /usr/bin/time -v ./delete.sh
            Command being timed: "./delete.sh"
            User time (seconds): 2.86
            System time (seconds): 27.82
            Percent of CPU this job got: 89%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.32
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 1764278
            Voluntary context switches: 929897
            Involuntary context switches: 21720
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 5
    ~/test $ /usr/bin/time -v rm -rf f
            Command being timed: "rm -rf f"
            User time (seconds): 0.20
            System time (seconds): 14.80
            Percent of CPU this job got: 47%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.29
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 176
            Voluntary context switches: 15134
            Involuntary context switches: 11
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

    我真的十分好奇为什么Lee的方法要比其它的快,竟然比rm -rf也要快。如果有人知道,请写在下面,非常感谢。


本文由外刊IT评论网(www.aqee.net)原创发表,文章地址:Linux技巧:一次删除一百万个文件的最快方法,[英文原文:A faster way to delete millions of files in a directory ]


建议继续学习:

  1. rsync同步的艺术    (阅读:8296)
  2. rsync 的核心算法    (阅读:4490)
  3. Dropbox差异同步算法rsync及其改进算法原理    (阅读:4327)
  4. rsync自动输入密码实现数据备份    (阅读:4082)
  5. 使用 rsync 或 unison 备份或同步支持 ssh 的 web 主机    (阅读:3280)
  6. puppet使用rsync来同步文件教程    (阅读:3249)
  7. rsync主动同步代码    (阅读:3125)
  8. 数据库程序开发原则:不要删除数据    (阅读:2933)
  9. SHELL TIPS: rsync 和 crontab 变量    (阅读:2938)
  10. 根据文件大小删除一个特殊文件名的文件    (阅读:1788)
QQ技术交流群:445447336,欢迎加入!
扫一扫订阅我的微信号:IT技术博客大学习
© 2009 - 2024 by blogread.cn 微博:@IT技术博客大学习

京ICP备15002552号-1