Thu, 14 Dec 2006

tcmalloc and the C++ issue

From Michael Still's blog (via Planet Linux Australia) I found a reference to "tcmalloc", a faster-than-glibc-malloc threaded malloc implementation out of Google. Reading the paper, it sounded good.

But the "Caveats" caught my eye, particularly: In particular, at startup TCMalloc allocates approximately 6 MB of memory. It would be easy to roll a specialized version that trades a little bit of speed for more space efficiency.

Why should it use that much memory? Since the arrays involved should be zero until used, surely it wouldn't take a 6MB hit until they were fully utilized?

So, I downloaded it and took a look. Erk. It's in C++! It astounds me that someone would write a low-level library like this in C++. Here's a simple "1-byte malloc then spin" program with glibc:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 7078 rusty     21   0  1524  344  272 R 95.3  0.1   0:02.73 sleeptest          

And linked against ctmalloc:

 5946 rusty     24   0  4688 1456  908 R 80.2  0.3   0:02.77 sleeptest          

Now 1.1MB isn't quite the 6MB quoted, but it's not good. Here's what size says:

$ size /usr/local/lib/libtcmalloc.so.0.0.0 
   text    data     bss     dec     hex filename
 125231    1840  267624  394695   605c7 /usr/local/lib/libtcmalloc.so.0.0.0

125k of text seems excessive to me, but the 267k of BSS caught my eye. "static TCMalloc_Central_FreeListPadded central_cache[kNumClasses];" is 250k (kNumClasses is 170). In addition, initializing the library causes 4 128k allocs and a single 1Mb alloc. I'm guessing at least some of these are being helpfully initialized by C++ constructors, using up memory even if it's initialized to zero.

But because it was C++, my tolerance limit was reached and I stopped poking. My initial casual thought of integrating talloc has faded. But then, Andrew Tridgell once said "you never need a reason to rewrite something"!


[/tech] permanent link