TL;DR: On Linux, if you have Transparent Huge Pages enabled, you can try to run cargo with the environment variable MALLOC_CONF="thp:always,metadata_thp:always" for a potential ~5% speed boost.

…Sorry for the clickbait title, I just couldn’t help myself.

I am regularly trying to search for opportunities how to speed up the Rust compiler without necessarily changing its code, just by modifying its configuration.

One of the approaches that haven’t been explored well so far is the usage of huge pages for memory allocated by the compiler. Linux normally uses 4KiB pages for virtual memory paging, which can be quite wasteful for some workloads these days, as such small pages might cause a lot of unnecessary page faults and TLB misses.

I have been trying to figure out if supporting huge pages could be useful for the performance of the Rust compiler. Sadly, configuring huge pages is a mess, and I haven’t been able to figure out anything useful for some time. That is, until valarauca14 has posted a helpful guide about using huge pages on Reddit, in response to one of my posts. I noticed something interesting in the guide, which I haven’t seen before. The jemalloc (the memory allocator currently used by rustc on Linux) can be configured to use (transparent) huge pages (THP), but it doesn’t support THP by default. I wonder what would happen if we changed that?

After a bit of trial and error, I have managed to enable THP support in jemalloc in rustc, and was amazed by the results! On average, a ~5% wall-time reduction of compilation time across the board, and a ~60% reduction in page faults, with some benchmarks showing up to ~90% page fault reduction! I really didn’t expect that it would have such an effect. Sadly, it also increases the memory usage of the compiler by ~15% on average, and by up to ~35% for some benchmarks. I guess that there’s no free lunch, as usually.

Because huge page configuration inherently depends on the operating system and specific configuration used by each Rust developer, it will not straightforward to enable huge page support across the board. THP is also a finicky beast, and it’s not always a win to enable it. Furthermore, the memory usage regressions might be a blocker for enabling this option by default1. That being said, I will definitely try to ask around if it would be possible to enable it.

However, in the meantime, if you’re on Linux, you can try to benefit from this performance increase on your own. To do that, you have to:

  • Enable transparent huge pages, either with the always or madvise modes. On Ubuntu, you can do that with the following command:
    $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
    

    Note that you might need superuser rights (sudo) to execute this command, and that it might also be reset after a restart (depending on your system, for me it was set to madvise by default and it stays there).

  • Use the environment variable MALLOC_CONF="thp:always,metadata_thp:always" when compiling Rust code:
    $ export MALLOC_CONF="thp:always,metadata_thp:always"
    $ cargo build
    

There is a lot to be done and experimented regarding the usage of huge pages within the compiler, but until that work is done, this quick trick can be used for a small compilation boost.

By the way, this “trick” should work for any program that uses jemalloc. You can find more jemalloc configuration options here.

Conclusion

If you have any comments or questions, or you have hints about using huge pages, please let me know on Reddit.

  1. It has already been a blocker in the past, e.g. when we tried to switch from jemalloc to mimalloc, which resulted in a ~5% compilation time reduction, but sadly also up to 35% memory usage increase.