The effect of TCmalloc in the QtWebKit port - stage 1: Performance

Much time has passed since we started to work on the custom allocation framework with Paul Pedriana. The core of the solution (FastAllocBase class, bug #20422) was landed into the trunk half year ago.

After that check in we started to work on JavaScriptCore class inheritances, because every class which is instantiated by operator new needs to be inherited from FastAllocBase. Now, after that half year, almost every necessary class in JavaScriptCore is inherited from FastAllocBase.

These changes made the enabling of TCmalloc on Qt-related WebKit ports possible.

 

Results for x86

 

Let's see the speed results of QtWebKit on x86-Linux (with JIT) in the following table:

 

QtWebKit x86-Linux
System malloc TCmalloc Improvement
SunSpider
774 ms
743 ms
~4.0% faster
V8
3560 ms
3492 ms
~2.0% faster
WindScorpion
281195 ms
269435 ms
~4.2% faster

 

SunSpider
774ms -> 743ms
(-31ms, ~4.0% faster)

V8
3560ms -> 3492ms
(-68ms, ~2.0% faster)

WindScorpion
19524ms -> 17875ms
(-1649ms, ~4.2% faster)

 

Results for ARM

 

We do benchmarking on ARM hardware as well. The effect of enabling TCmalloc on ARM (with JIT) is as follows:

 

QtWebKit ARM-Linux
System malloc TCmalloc Improvement
SunSpider
10967 ms
10480 ms
~4.4% faster
V8
24172 ms
22788 ms
~5.7% faster
WindScorpion
281195 ms
269435 ms
~4.1% faster

 

SunSpider
10967ms -> 10480ms
(-487ms, ~4.4% faster)

V8
24172ms -> 22788ms
(-1384ms, ~5.7% faster)

WindScorpion
281195ms -> 269435ms
(-11760ms, ~4.1% faster)

Future

The integration of the custom allocation framework for WebCore is still in progress, so I can not show performance results for the whole WebKit yet.

After all...

As the charts show, we achieved effective performance improvement with enabling TCmalloc on the Qt-port of WebKit. However, there is always a reverse of a medal... I'll talk about the memory costs in another post. :-)

Anonymous (not verified) - 10/08/2009 - 21:08

Do the performance gains justify taking on the maintenance of something that's provided natively by the system, as well as the memory costs you hint at?

akos.kiss - 10/08/2009 - 21:23

The memory cost is not that of the solution but comes from TCmalloc itself. So, if you use the custom allocation framework to use VeryMemoryFriendlyMalloc instead of TCmalloc or the system allocator, then you will observe a drop in memory consumption. Well, but then there will be a reverse of the medal, too... (i.e., performance loss).

The key is that now you are *able* to choose which allocator you want to use. And this is *not* provided by the system. At least not on Qt Linux platforms. As far as I remember, the background is described in the bug report referenced in the blog post.

Anonymous (not verified) - 10/09/2009 - 12:16

Hi,

What about other allocators, like jemalloc :-?

akos.kiss - 10/09/2009 - 12:48

Actually, JEmalloc is quite in our focus. That topic will be covered in another post. (First, it was easier for us to work with TCmalloc since it is already a part of WebKit - see http://trac.webkit.org/browser/trunk/JavaScriptCore/wtf/FastMalloc.cpp )

quickembed (not verified) - 05/15/2012 - 16:35

arm embedded system R&D
we are porting qt webkit v8 to arm linux 

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • No HTML tags allowed
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Fill in the blank