Erlang R15的内存delay dealloc特性对消息密集型程序的影响 -- 其他 -- IT技术博客大学习 -- 共学习共进步！

您现在的位置：首页 --> 其他 --> Erlang R15的内存delay dealloc特性对消息密集型程序的影响

Erlang R15的内存delay dealloc特性对消息密集型程序的影响

浏览:2391次出处信息

在新的NUMA体系结构下，每个CPU都有自己的本地内存，如果要访问其他CPU的内存，那算remote了，要走CPU之间的QPI通道，通常这样速度会有40%的下降。

那么对于多线程的程序来讲，这个硬件的变化对软件也有很大的影响。在多线程程序里面，通常一个线程会为一个对象分配内存，然后把这个对象传递到不同的线程去使用，最后由其他线程释放内存。而这二个线程可能在不同的CPU上运行，这个场景很普遍，比如说Erlang的消息机制。如果谁创建谁释放对象，那么对于消息密集型程序会有很多帮助。

R15的最大的运行期优化见： https://github.com/erlang/otp/commit/a67e91e658bdbba24fcc3c79b06fdf10ff830bc9

这个特性也就是之前声称的delay dealloc特性

对照下OTP团队之前的规划：

目前规划里面的1，2，3，4在R15里面都已经实现了。

Optimize memory allocation

A number of memory allocation optimizations have been implemented. Most

optimizations reduce contention caused by synchronization between

threads during allocation and deallocation of memory. Most notably:

* Synchronization of memory management in scheduler specific allocator

instances has been rewritten to use lock-free synchronization.

* Synchronization of memory management in scheduler specific

pre-allocators has been rewritten to use lock-free synchronization.

* The ‘mseg_alloc’ memory segment allocator now use scheduler specific

instances instead of one instance. Apart from reducing contention

this also ensures that memory allocators always create memory

segments on the local NUMA node on a NUMA system.

我们来尝鲜演示下，首先找个消息密集型的程序：消息由一种线程组成ring传送。

$ cat threadring.erl
%%% The Computer Language Benchmarks Game
%%% http://shootout.alioth.debian.org/
%%% Contributed by Jiri Isa
%%% optimized run time options by shun shino

-module(threadring).
-export([main/1, roundtrip/2]).

-define(RING, 503).

start(Token) ->
   H = lists:foldl(
      fun(Id, Pid) -> spawn(threadring, roundtrip, [Id, Pid]) end,
      self(),
      lists:seq(?RING, 2, -1)),
   H ! Token,
   roundtrip(1, H).

roundtrip(Id, Pid) ->
   receive
      1 ->
         io:fwrite("~b~n", [Id]),
         erlang:halt();
      Token ->
         Pid ! Token - 1,
         roundtrip(Id, Pid)
   end.

main([Arg]) ->
   Token = list_to_integer(Arg),
   start(Token).

$ erlc threadring.erl

$ time erl -smp disable -noshell -run +t 8192 +ec +K true +P 1000 +hmbs 1 +hms 4 +sss 4 threadring main 500000000
396

real    1m46.900s
user    0m35.010s
sys     1m11.870s

$ time otp/bin/erl -smp disable -noshell -run +t 8192 +ec +K true +P 1000 +hmbs 1 +hms 4 +sss 4 threadring main 500000000
396

real    1m37.884s
user    0m5.350s
sys     1m32.513s

我们可以看到简单的对比，性能大概有8%的提升。在复杂的程序里面性能应该提升更大。

祝玩得开心。

建议继续学习：

QQ技术交流群：445447336，欢迎加入！
扫一扫订阅我的微信号：IT技术博客大学习

<< 前一篇：10个最“优秀”的代码注释

后一篇：Storm配置项详解 >>

文章信息

作者：Yu Feng 来源： Erlang非业余研究
标签： dealloc Erlang
发布时间：2011-11-24 00:01:55

建议继续学习

近3天十大热文