Search |
War of allocators: hoard or hoards?I found Emery Berger's allocator called hoard. Hoard's homepage holds out some promising general qualities (fast, scalable, and memory-efficient) about the allocator, do we need more than this? I tried out how it performs in WebKit.
Methanol
Methanol is still our live browsing simulation benchmark which loads and renders popular web pages one by one (currently, 9 pages, 5 times) locally. The time measurement has been done with Methanol's JavaScript.
SunSpider in QtLauncher
This test runs SunSpider inside QtLauncher, it does a minimal rendering but it lays emphasis on the JavaScript execution. From the view of performance hoard is slower than TCmalloc by 11.4%, and it consumes 3.7% more memory. It is slower and consumes more memory... Sounds bad...
V8 in QtLauncher
The V8 benchmark shows the same results that we have seen above in the case of SunSpider. It is slower than TCmalloc by 4.9% and consumes more memory than TCmalloc by 7.9%. Anyway, V8 benchmark consumes ~151 megabytes, this number will be interesting later.
WindScorpion in QtLauncher
WindScorpion is our collection of real life JavaScripts and it works like SunSpider and V8 benchmarks. These were single threaded benchmarks... But as hoard's web site wrote "it can dramatically improve application performance, especially for multi-threaded programs running on multiprocessors"...
Workers - the multi-threaded benchmarks
With the help of JavaScript workers, we can run JavaScript applications simultaneously. Let's see, how does our new multi-threaded allocator perform with workers.
Two SunSpider workers in QtLauncher
The columns represent the slower worker's result. In the case of performance, TCmalloc is 39% faster than hoard. On the memory consumption side, hoard consumes 2.9% more memory.
Two V8 workers in QtLauncher
As the chart shows, hoard is 44% slower than TCmalloc and it consumes 28% more memory. In the case of V8, TCmalloc consumes ~150 megabytes, with 2 V8 workers it consumes 334 megabytes.
Summary
I've expected that hoard will perform better, but as you can see, still TCmalloc shows the best values. |
Monthly archive
|
Thomas Fletcher (not verified) - 03/18/2010 - 15:21
These results don't surprise me at all. The hoard allocator has a lot of noise around it, but my experience on embedded and multi-core systems has demonstrated that you need to have a really big churning code base for the allocator to start to hit it's prime. On the whole, WebKit is a pretty frugal and well behaved application and doesn't do wild amounts of cross thread allocation and de-allocations etc.
One allocator that you might be interested in trying out, now that you've been dissapointed with Hoard, is TLSF (http://rtportal.upv.es/rtmalloc/ or http://tlsf.baisoku.org/.
It generally performs about the same as the dlmalloc in my experience, but it's big win is its deterministic behaviour which may be a win for WebKit.
Thomas
www.cranksoftware.com
zoltan.horvath - 03/18/2010 - 23:56
Thanks for your responding!
Sounds good, I'll check this allocator implementations, thanks for the tip!
I might do benchmarking on ARM... I'm curious, are the results valid for ARM?
Post new comment