TL;DR: On Linux, if you have Transparent Huge Pages enabled, you can try to run
the environment variable
MALLOC_CONF="thp:always,metadata_thp:always" for a potential ~5% speed boost.
…Sorry for the clickbait title, I just couldn’t help myself.
One of the approaches that haven’t been explored well so far is the usage of huge pages for memory allocated by the compiler. Linux normally uses 4KiB pages for virtual memory paging, which can be quite wasteful for some workloads these days, as such small pages might cause a lot of unnecessary page faults and TLB misses.
I have been trying to figure out if supporting huge pages could be useful for the performance of the
Rust compiler. Sadly, configuring huge pages is a mess, and I haven’t been able to figure out anything
useful for some time. That is, until
valarauca14 has posted a helpful guide
about using huge pages on Reddit, in response to one of my posts. I noticed something interesting in
the guide, which I haven’t seen before. The jemalloc (the memory allocator
currently used by
rustc on Linux) can be configured to use (transparent) huge pages (THP), but it
doesn’t support THP by default. I wonder what would happen if we changed that?
After a bit of trial and error, I have managed to enable THP support in jemalloc in
and was amazed by the results!
On average, a
~5% wall-time reduction of compilation time across the board, and a
in page faults, with some benchmarks showing up to
~90% page fault reduction!
I really didn’t expect that it would have such an effect. Sadly, it also increases the
of the compiler by
~15% on average, and by up to
~35% for some benchmarks. I guess that there’s
no free lunch, as usually.
Because huge page configuration inherently depends on the operating system and specific configuration used by each Rust developer, it will not straightforward to enable huge page support across the board. THP is also a finicky beast, and it’s not always a win to enable it. Furthermore, the memory usage regressions might be a blocker for enabling this option by default1. That being said, I will definitely try to ask around if it would be possible to enable it.
However, in the meantime, if you’re on Linux, you can try to benefit from this performance increase on your own. To do that, you have to:
- Enable transparent huge pages, either with the
madvisemodes. On Ubuntu, you can do that with the following command:
$ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
Note that you might need superuser rights (
sudo) to execute this command, and that it might also be reset after a restart (depending on your system, for me it was set to
madviseby default and it stays there).
- Use the environment variable
MALLOC_CONF="thp:always,metadata_thp:always"when compiling Rust code:
$ export MALLOC_CONF="thp:always,metadata_thp:always" $ cargo build
There is a lot to be done and experimented regarding the usage of huge pages within the compiler, but until that work is done, this quick trick can be used for a small compilation boost.
By the way, this “trick” should work for any program that uses jemalloc. You can find more jemalloc configuration options here.
If you have any comments or questions, or you have hints about using huge pages, please let me know on Reddit.