IT技术博客大学习 共学习 共进步
全部 移动开发 后端 数据库 AI 算法 安全 DevOps 前端 设计 开发者

Linux探索:一次删除一百万个文件的最快方法

外刊IT评论 2013-06-17 23:57:16 累计浏览 6,862 次
本机暂存

最初的测评

昨天,我看到一个非常有趣的删除一个目录下的海量文件的方法。这个方法来自http://www.quora.com/How-can-someone-rapidly-delete-400-000-files里的Zhenyu Lee。

他没有使用find 或 xargs,他很有创意的利用了rsync的强大功能,使用rsync -delete将目标文件夹以一个空文件夹来替换。之后,我做了一个实验来比较各种方法。让我吃惊的是,Lee的方法要比其它的快的多。下面就是我的测评。

环境:

  • CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

  • MEM: 4G

  • HD: ST3250318AS: 250G/7200RPM

  • Method# Of FilesDeletion Time
    rsync -a -delete empty/ s1/10000006m50.638s
    find s2/ -type f -delete100000087m38.826s
    find s3/ -type f | xargs -L 100 rm100000083m36.851s
    find s4/ -type f | xargs -L 100 -P 100 rm100000078m4.658s
    rm -rf s5100000080m33.434s

    使用 -delete 和 -exclude,你可以选择性删除符合条件的文件。还有一点,当你需要保留这个目录做其它用处时,这种方法是再适合不过了。

    重新测评

    几天前,Keith-Winstein在回复Quora上的这个帖子时说我之前的测评无法复制,因为操作的时间持续的太久。我澄清一下,这些数据过大,可能是因为我的计算机在过去的几年里做的事太多,测评中可能存在一些文件系统错误。但我不确定是这些原因。现在好了,我弄了一天比较新的计算机,把测评再做一次。这次我使用/usr/bin/time,它能提供更详细的信息。下面就是新的结果。

    (每次都是1000000个文件。每个文件的体积都是0。)

    CommandElapsedSystem Time%CPUcs (Vol/Invol)
    rsync -a -delete empty/ a10.601.3195106/22
    find b/ -type f -delete28.5114.465214849/11
    find c/ -type f | xargs -L 100 rm41.6920.605437048/15074
    find d/ -type f | xargs -L 100 -P 100 rm34.3227.8289929897/21720
    rm -rf f31.2914.804715134/11

    原始输出

    # method 1
    ~/test $ /usr/bin/time -v  rsync -a --delete empty/ a/
            Command being timed: "rsync -a --delete empty/ a/"
            User time (seconds): 1.31
            System time (seconds): 10.60
            Percent of CPU this job got: 95%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.42
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 24378
            Voluntary context switches: 106
            Involuntary context switches: 22
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 2
            Command being timed: "find b/ -type f -delete"
            User time (seconds): 0.41
            System time (seconds): 14.46
            Percent of CPU this job got: 52%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.51
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 11749
            Voluntary context switches: 14849
            Involuntary context switches: 11
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    # method 3
    find c/ -type f | xargs -L 100 rm
    ~/test $ /usr/bin/time -v ./delete.sh
            Command being timed: "./delete.sh"
            User time (seconds): 2.06
            System time (seconds): 20.60
            Percent of CPU this job got: 54%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.69
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 1764225
            Voluntary context switches: 37048
            Involuntary context switches: 15074
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 4
    find d/ -type f | xargs -L 100 -P 100 rm
    ~/test $ /usr/bin/time -v ./delete.sh
            Command being timed: "./delete.sh"
            User time (seconds): 2.86
            System time (seconds): 27.82
            Percent of CPU this job got: 89%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.32
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 1764278
            Voluntary context switches: 929897
            Involuntary context switches: 21720
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    
    # method 5
    ~/test $ /usr/bin/time -v rm -rf f
            Command being timed: "rm -rf f"
            User time (seconds): 0.20
            System time (seconds): 14.80
            Percent of CPU this job got: 47%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.29
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 0
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 176
            Voluntary context switches: 15134
            Involuntary context switches: 11
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

    我真的十分好奇为什么Lee的方法要比其它的快,竟然比rm -rf也要快。如果有人知道,请写在下面,非常感谢。


本文由外刊IT评论网(www.aqee.net)原创发表,文章地址:Linux技巧:一次删除一百万个文件的最快方法,[英文原文:A faster way to delete millions of files in a directory ]

同分类推荐文章

  1. 从零重建 macOS 开发机:可复现的环境初始化流程 (2026-06-14 20:36:00)
  2. 百度物理网络监控工具开源第二弹:毫秒级监控工具 baize,让你的网络问题无处遁形 (2026-06-11 08:10:28)
  3. How to Set Up Homebrew Tap for Private CLI Tools: A Complete Guide (2026-05-27 02:13:03)

查看更多 DevOps 文章 →

建议继续学习

  1. 如何成为Python高手 (累计阅读 54,992)
  2. Linux如何统计进程的CPU利用率 (累计阅读 16,308)
  3. 我的 RHCA 之路 (累计阅读 14,013)
  4. Linux内存点滴 用户进程内存空间 (累计阅读 13,230)
  5. 给程序员新手的一些建议 (累计阅读 13,089)
  6. Linux 性能监控、测试、优化工具 (累计阅读 13,011)
  7. 关于linux内存free的一些事情 (累计阅读 12,867)
  8. include(“./file.php”)和include(“file.php”)区别 (累计阅读 12,789)
  9. ps - 按进程消耗内存多少排序 (累计阅读 12,688)
  10. Google怎么用linux (累计阅读 12,581)