Speeding up the Rust compiler without changing its code
…Yes, I know that the title looks like a clickbait. But it’s actually not that far from the truth :)
I have started contributing to rustc
this year as a part of the
#wg-compiler-performance
,
which is focused on making the Rust compiler as fast as possible. This post describes some things
that I and several other rustc
developers have been working on for the last several months in order
to help fulfill this goal.
What’s possibly a bit unusual is that most of my efforts weren’t focused on improving the source code
of rustc
itself, but rather on improving the way we compile/build rustc
in order to make sure
that it’s as efficient as possible.
There is a tracking issue that tracks rustc
build optimizations here.
I’ll try to talk about the individual changes in chronological order, but some them were overlapping,
so it might not match up perfectly. I will also add links to PRs, performance results and
mean/max. improvements to each section where it is applicable. The efforts mentioned here were done
with the help and guidance of many other rustc
developers (@nnethercote,
@lqd, @Mark-Simulacrum, @bjorn3
and many others). I’m very thankful for their continued help and assistance.
How was rustc
built at the start of 2022?
Before we begin talking about the improvements that were made this year, I’ll try to describe how
was rustc
built for Linux targets on CI at the beginning of 2022. We will see at the end
of the blog post that the pipeline has been modified since then, but it’s still quite similar.
This build pipeline is used for building the actual release artifacts that are distributed via
e.g. rustup
and eventually find their way to Rust end-users (on Linux).
- LLVM is built with PGO instrumentation (LLVM build #1).
-
rustc
that uses LLVM #1 is built. -
PGO profiles for LLVM are gathered by compiling
libcore
. - LLVM is built (LLVM build #2).
-
rustc
that uses LLVM #2 is built with PGO instrumentation. -
PGO profiles for
rustc
are gathered by compiling some popular/big crates, likecargo
, plus some stress tests. The compilation is performed using therustc-perf
performance suite. - LLVM is built with the PGO profiles gathered in step 3 (LLVM build #3).
-
rustc
that uses LLVM #3 is built with the PGO profiles gathered in step 6.
What is PGO?
Profile-guided optimization is a technique for optimizing programs. It observes how does a program behave when it executes some common workload, and then uses this information to better optimize the program.
PGO workflow usually works like this:
- You compile the program with “instrumentation”, which is additional code injected by the compiler.
- You run the program on some workload that represents its common usage. When the program is executed, the injected code takes note about the program’s behavior. What functions are called the most? What parts of functions are almost never executed? Which branches are often taken? The injected code records and stores this data in a profile (a file or a set of files).
- You re-compile the program again, but this time you provide the gathered profile, so that the compiler can optimize the program in a better way.
- You enjoy a faster, more efficient program.
- ????
- Profit!!!
This build pipeline is implemented in the
pgo.sh
bash script (although I’m planning to rewrite it in Python soon). It has to recompile both LLVM
and rustc
several times, therefore it takes quite a lot of time (about 1.5 hours). The PGO gathering
steps are performed separately for LLVM and rustc
, which prolongs the time (originally these steps
were performed at once, but there were some issues with this approach).
Now that we have a high-level overview of how the build pipeline works, let’s take a look at how we have tried to improve it this year and how it has evolved in the process.
Expanding LLVM PGO profiling (#94704)
While reading the build pipeline above, maybe you have wondered why do we gather the PGO profile for
LLVM on the libcore
crate only, while rustc
enjoys being profiled on real-world crates. I have
wondered about this too, so I have tried to change it in this PR.
It took some back and forth to experiment with the set of crates that it made sense to profile before
we started hitting diminishing returns, but otherwise it was a pretty simple change, and provided
very nice gains! Note that when we gather LLVM PGO profiles, we always perform codegen
(we do not profile check
builds), since we only care about profiling the LLVM codegen backend,
not the rustc
frontend.
Final results (instruction counts):
- Mean improvement: -2.48%
- Max improvement: -5.56%
Updating LLVM (#95171)
Sometimes it helps just to bump dependencies. On CI, we use a host compiler (LLVM) to compile
a target compiler (also LLVM), which is then used by rustc
to compile Rust programs.
In #95171, I have updated the host LLVM from 13.0.0
to 14.0.2
. It didn’t help much with compilation speed, but it had a great effect on max RSS
(a metric that measures the peak memory usage of rustc
during crate compilation).
Why did this help? We have compiled LLVM with a newer compiler, which reduced the memory usage of
LLVM when it was used by rustc
, thus also reducing the memory usage of compiling Rust crates.
- Mean improvement: -4.75%
- Max improvement: -12.34%
Updating the PGO benchmark suite (#95724)
This year, there has been an effort to update the
benchmarks in the rustc-perf
benchmark suite, which is used both for measuring the performance
of rustc
after each merged commit (the perf.rlo dashboard), but also
for local experiments and profiling done by rustc
developers. Old benchmarks have been deprecated,
similar benchmarks were merged, new benchmarks were introduced and some popular crate benchmarks were
updated to newer versions of said crates.
Even though the suite itself was updated, the CI build script was still using an old version from
May 2021 for gathering the PGO profiles. Therefore, in #95724
I have updated the version of rustc-perf
used in CI, along with the set of crates that we compile
to gather PGO profiles.
This change resulted in performance wins, but it was also a kind of self-fulfilling prophecy, as we
were now PGO profiling rustc
on an exact subset of crates that were also later used to measure the
performance of rustc
. That being said, I think that it makes sense to update the PGO crates regularly.
It’s better if rustc
compiles modern, heavily used (versions of) crates as efficiently as possible,
rather than being super optimized for older crate versions that are no longer being used.
Final results (instruction counts)
- Mean improvement: -0.75%
- Max improvement: -2.71%
Trying out PGO on macOS (#96732)
At this point in time, PGO was only being performed for Linux, so Windows and macOS (and of course also other targets!) were being left out of possibly 10-20% performance improvements that PGO can provide.
In #96732, I have tried to apply PGO also on macOS
to see what happens. It seemed like it could work, but the problem turned out to be in the duration
of the CI builds. The macOS runners on GitHub are incredibly slow. To put this into perspective, at
this point in time, the Linux builds took about 1.5 hours, and that was with the full build pipeline
described above, which rebuilds both LLVM and rustc
three times! While the macOS builds took about
two hours, with LLVM pre-built and rustc
being built only once1.
It seems like there might be some improvements to macOS CI rustc
performance on the horizon, but
for now, this was an unsuccessful experiment.
Another idea that I tried was to download the PGO profiles from earlier Linux builds and try to
use them for optimizing macOS rustc
builds (thus skipping the slow PGO gathering and LLVM/rustc
rebuild steps), but it seemed that it’s not so trivial to reuse PGO artifacts between platforms
(which was kind of to be expected, to be honest), mainly because of symbol issues.
Improving precision of rustc
PGO (#97110)
When you use a PGO instrumented rustc
, it will write the gathered profiles to a single file by
default. When @lqd was trying to port the PGO pipeline to Windows, he noticed
that this file was sometimes being overwritten when rustc
was being invoked concurrently.
While this didn’t seem to cause any issues on Linux, I wondered if this overwriting was affecting
the profiles in any way.
That’s why in #97110, I have tried to tell rustc
to use a separate profiling file for each rustc
process (basically I added the PID to the filename).
And it has helped! Probably now that the file wasn’t being overwritten, the profiles were more
precise and therefore rustc
could use them to better optimize itself.
As a slight disadvantage, creating so many profiles resulted in 20+ GiB of disk space being used, but that wasn’t a large problem, since the profile files are created on CI, and they are temporary anyway, as we then merge them to a single, rather small (tens/hundreds of MiB) file after the profiling is done.
Final results (instruction counts)
- Mean improvement: -0.34%
- Max improvement: -1.23%
Applying PGO to libstd
(#97038)
Although we use PGO for LLVM and rustc
, I was thinking that we could also use it for libstd
.
So far, I haven’t been able to get it working properly, but it’s still definitely on my TODO list.
Updating rustc
PGO benchmark list (#97120)
In #97120, I have further updated the set of crates
that is used to gather PGO profiles for rustc
, as the old set was missing trait heavy crates
like diesel
. Although this change has helped the most on the crates that were added to the PGO set
(as expected), it also seemed to help across the board, so we had assumed that we weren’t overfitting
too much to the PGO crates, and decided to merge this change.
Final results (instruction counts)
- Mean improvement: -0.51%
- Max improvement: -1.29%
Improving precision of LLVM PGO (#97137)
This was basically the same thing that I have described above (adding PID to PGO profile filenames),
but this time for LLVM rather than rustc
. This turned out to be a bit more involved because LLVM
uses a different approach for setting the profile paths than rustc
, but it wasn’t that complicated
in the end, and it has resulted in a small, but nice win.
Final results (instruction counts)
- Mean improvement: -0.63%
- Max improvement: -1.03%
Trying out Call-Site aware PGO for LLVM (#97153)
In my quest of finding all the possible compiler knobs and techniques that could be used to optimize
rustc
, I have stumbled on something called Call-site aware PGO.
Up to that time, I had no idea that something like that has even existed, so I have naturally
immediately tried to apply it to rustc
’s LLVM :)
The results were quite mixed. Some attempts did not result in any improvements at all,
some had quite nice cycle wins.
But the problematic part was that CS-PGO is layered on top of the normal PGO workflow, so we needed
yet another rebuild of LLVM + rustc
, which has of course made the CI build even slower.
Since it did not bring clear benefits, I have decided to abandon CS-PGO for now. But I think that this definitely warrants another attempt some time in the future.
Static linking of rustc
(#97154)
While trying to examine what was the state of compiler knobs used for building rustc
, I was a bit
confused by the apparent lack of LTO, a crucial compiler optimization technique. I refused to believe
that it was not being used for compiling rustc
, but I couldn’t find any trace of it.
As a short summary, when rustc
compiles a crate, it splits the crate (and also its dependencies)
into multiple CGUs, pieces of code that are optimized separately by the codegen backend (usually LLVM).
This makes the compilation parallelizable, but it can also miss out on some optimizations, because
the codegen units are not optimized together, but each on their own. LTO has the ability to optimize
also across these codegen units, and thus potentially generating better code, at the cost of longer
compilation time.
When you compile Rust programs with rustc
, you can choose amongst three LTO modes (in addition to
turning LTO off completely):
-
crate-local thin LTO
- LTO is used to optimize across codegen units, but always only within a single crate. This mode is used by default forrelease
builds. -
thin LTO
- LTO is used to also optimize across crates. -
fat LTO
- same asthin
LTO, but it can in theory optimize even better. But it also typically makes compilation much slower thanthin
LTO.
After examining the situation, I learned that rustc
was being compiled with the default
crate-local thin LTO
. Since the compiler consists of 50+ crates and even more external
dependencies, it seemed to me that this very basic form of LTO wasn’t doing it justice, and we
needed at least thin
LTO to achieve better performance.
I wasn’t able to apply thin LTO for rustc
at that time (more on that in a later section below),
but I was able to force rustc
to link statically and then use LTO for it. Normally, the compiler
is built in two parts. A dynamic shared library (dylib
) called librustc_driver.so
(which
exports all the important functions offered by the compiler, and which is used by other tools like
rustdoc
or clippy
) and rustc
, a very thin binary that mostly just links to librustc_driver.so
and then calls its entry function. I have tried to change this in #97154
to “inline” the dynamic library directly into rustc
by statically linking it, and then to apply
thin
LTO to it.
This has showed me a couple of interesting things:
- Using static linking, even without LTO, helps performance (probably by removing dynamic function resolving), which is especially good for tiny crates, where this takes a relatively large part of the whole compilation process (since it’s so short when the crate is small).
- Using (thin) LTO helps. A lot!. Although this perf. result also contains the wins gained by static linking, it still contains some pretty impressive results. This gave me hope that LTO could be really useful, so I have further continued to try to get it to work.
- Using fat LTO vs thin LTO does not seem like an obvious (big win),
but with fat LTO it takes much more time to compile
rustc
. So it might not be worth it to use it.
So, static linking it is? Well, not really. As I have learned by asking around, it would not really
be practical to link rustc
statically, among other things because of the way codegen backends are used.
They are loaded into the rustc
process directly via dlopen
, and they expect that they will be able
to use the symbols from librustc_driver.so
. Doing this with a statically linked binary that would
have to export these symbols seemed to be way too complicated. But it is probably something that
might get revisited in the future.
Using PGO on Windows (#96978)
As a part of porting our optimized build pipeline to other platforms than Linux, @lqd has managed, in a heroic effort, to port PGO to Windows.
I don’t even want to imagine how much time and effort it must have taken, but the results speak for themselves :)
Sadly, we don’t currently do any performance measurements on CI for other platforms than Linux, so the results had to be verified manually on a Windows system.
Using ICF for rustc
(#99062)
While reading the Rust for Rustaceans book by the
one and only @jonhoo, I was reminded of the
polymorphization initiative,
an experimental rustc
optimization that tries to merge identical generic functions in order
to reduce the number of monomorphized functions that we send to LLVM and thus (potentially) greatly
reduce compilation times.
While polymorphization is sadly still probably far from being completed, I wondered whether we could
do a poor man’s version of polymorphization, by merging together identical compiled functions
on the binary level. For example, I noticed that functions like core::ptr::read<u32>
and
core::ptr::read<i32>
were both stored in the final rustc
binary, even though they had identical
instructions. It turns out that there’s a name for this technique -
Identical Code Folding.
To use this technique, I had to switch the linker that we use on CI to lld
, the linker from the
LLVM suite. With that, I was able to use the --icf=all
flag to (very aggressively) prune the
duplicated functions during linking.
I had hoped that ICF would decrease the binary size of librustc_driver.so
, and it did, but only
very slightly (~1%),
probably because switching from ld
to lld
has actually increased the binary size by ~3%, so the
baseline was different.
Using ICF didn’t improve instruction counts (expectedly), but it had very nice wins for cycle counts, probably because of better instruction cache utilization. Before, the duplicated functions were located across different places in the binary, but now, after deduplication, they should be in the same place, which should in turn improve i-cache utilization.
When examining the results in more detail later (after my PR has broken
a few things), it turned out that most of the wins actually came from using lld
as a linker, and
not ICF itself. Well, I don’t really care, as long as the compiler gets faster :) A win is a win.
- Mean improvement: -3.06%
- Max improvement: -6.11%
Using BOLT to optimize LLVM (#94381)
I’m writing about BOLT as one of the last sections of this blog post, but it was actually one of the first things that I have tried this year. It just took a long time to finish :)
BOLT was originally a research project
by Facebook Meta. It is a compiler optimization technique, which is partly similar to PGO,
although it works on the level of compiled binaries rather than on the level of compiler IR. In short,
it can gather profiles from a binary that is being executed, and then re-optimize said binary with
the gathered profiles. It can do all that without actually needing to re-compile the binary, which
is quite cool. By smartly moving stuff inside the binary around, it can improve i-cache utilization
and thus improve the performance, even if the binary was previously already optimized with LTO and/or
PGO.
Originally, BOLT has lived in a separate repository (basically a fork of LLVM), but since LLVM 14 it has been integrated into the mainline LLVM repo. This was a signal to me that BOLT is stable enough so that we can try to use it to optimize the Rust compiler (narrator voice: it wasn’t stable enough).
I had some troubles with applying BOLT to rustc
itself (after all, it was mostly meant for C/C++
binaries), but someone (I don’t remember who, maybe @Mark-Simulacrum
) gave me the idea to try it
on the LLVM that we use in rustc
. After all, it’s a C++ program, and I found several demo examples
on GitHub that were using BOLT to optimize clang
.
There are two ways BOLT can gather the profiles. Either by instrumenting (modifying the code of a binary by injecting code that records execution metadata) or by sampling (gathering performance counters during the execution of an unmodified binary). The sampling approach is much simpler, since we don’t have to instrument the binary, but sadly it was a no-go, since the performance counters required for sampling (LBR) weren’t available on GitHub CI machines (and they aren’t even available on AMD CPUs, which are in my local machine).
Therefore, I had to resort to instrumentation. Sadly, this is where the problems have begun. The
instrumented LLVM library was nondeterministically segfaulting when being used by rustc
. It was
very difficult to debug. I created and monitored several LLVM issues that were dealing with a similar
instability of BOLT, but that was all I could do at the time (without delving really deep into LLVM).
I was also trying to run the whole CI pipeline locally, which was sometimes working, sometimes not.
This “segfault non-determinism” didn’t give me a lot of hope that this could be resolved without some
upstream BOLT patches, so I just had to wait until they will arrive.
My original attempt was in February, since then I have repeatedly (~every month) tried to apply BOLT with the newest trunk LLVM version, but the results were mostly the same. This resulted in the poor PR having several tens of CI run attempts and almost three hundred comments (most of them probably generated automatically by bots).
I think that the instability problems were caused by the fact that I was using BOLT in a slightly non-standard way. Instrumentation probably wasn’t supported as well as sampling, and I was also trying to optimize a shared library, which was still quite experimental in BOLT, since it was mostly focused on optimizing actual executable binaries.
Things started to finally turn better when LLVM 15 was released. With the updated LLVM that contained
a set of BOLT patches, the instrumentated libLLVM.so
started working without crashes, and hopefully
it will stay that way .
Once it was working, I was finally able to implement support for BOLT properly and perform performance experiments. And the results weren’t bad at all! Up to 10% max RSS improvements and up to 5% cycle improvements on real-world crates. It seems like it was definitely worth it to “persevere” and wait until BOLT gets stable enough so that we can use it.
The cost of this was of course yet another rebuild of LLVM and rustc
in CI, which increased the
CI time by around 15-20 minutes. I think that in this case it was worth it, but the CI times will
probably need to be optimized soon, before it gets out of hand (more on that below).
- Mean improvement: -3.97%
- Max improvement: -10.25%
- Mean improvement: -3.67%
- Max improvement: -6.60%
Using linker plugin LTO (#101524)
During the year, I was occasionally returning to my quest for using LTO for rustc
. The results from
the static linking were great, but I knew that using static linking wasn’t actually a usable solution,
so I needed to try something else. I still couldn’t figure out using LTO for the librustc_driver.so
dylib
, but I remembered that there is another way that LTO could be done, using so-called
Linker-plugin-based LTO.
This approach delegates the LTO work to the linker itself, rather than implementing it within the
compiler (in simplified terms). From the example given by Rust documentation, I thought that I had
to use clang
as a linker to get it to work, which led me down a path of futile attempts to get
clang
working as a linker on CI. Then I decided to scratch all that, and just add a single line
(-Clinker-plugin-lto
) to the rustc
build system… and lo and behold, it has worked! Using a single
line, I was able to achieve up to ~10% improvements
on real-world crates, which was quite exciting.
There was also the possibility that rustc
code could be cross-optimized with LLVM (this is one of
the motivations behind linker-plugin LTO), which sounds quite cool. But it would also require pretty
large changes in the rustc
build process (I think) so I haven’t attempted that (yet).
Sadly, after the initial excitement, I realized that this approach, while being quite helpful in
terms of rustc
performance, had some issues, much like the static linking mentioned before:
- It basically required the usage of the
lld
linker, which might not be so easy to use on other platforms than Linux. It would be a shame if such a fundamental performance gain wasn’t readily available on other platforms. - It only supported thin LTO, not fat LTO (AFAIK), which also seemed like a slight disadvantage.
- I haven’t been able to verify this yet, but I think that this could cause issues when
we update LLVM. Once every ~6 months, a new version of LLVM is released. The
rustc
repository usually switches to the new version quite soon, months or weeks before the final version is released, to prepare for it by making changes in its usage of LLVM. I think that after the next switch, the linker plugin LTO would have a problem, since the LLVM used byrustc
and the LLVM used to compilerustc
would get desynchronized, which could spell trouble for the linker-plugin LTO, which requires LLVM IR compatibility. But I’m not sure if this would actually happen, it was just a potential problem that has occurred to me.
Anyway, even though the wins were nice, in the meantime there was advance on the dylib
LTO front
(see below), so I have abandoned this approach.
Using LTO for optimizing librustc_driver.so
(#101403)
I wasn’t really moving anywhere with my attempts to apply LTO to the librustc_driver.so
dylib
.
I found a topic
about using LTO for rustc
on the Rust Zulip forum, so I tried to ask there. I got an answer from
bjorn3
, who is IMO one of the leading experts on rustc
backends, codegen, and basically anything
Rust :D His response wasn’t very optimistic (you can read it in the topic link above), so I put LTO
to rest for some time.
It was still nagging me that we don’t do LTO though, so after some time I reached out to bjorn3
directly and asked if he thought that there’s any hope for dylib
LTO. During that time, I also
tried static linking of rustc
+ LTO (as described above), which had very nice results, so it gave
us additional motivation to try using LTO for the librustc_driver.so
dylib
. After a bit of
nerd-sniping, bjorn3
conjectured a way that it could be done in theory. He then returned after an
hour or so and wrote me that he got it working .
This was the road that led to #101403. We needed to resolve some issues (mostly cosmetic), but in the end it was not at all that difficult to get it working (barring future issues, which tend to almost always come after such fundamental PRs :) ).
I volunteered to work on bjorn3
’s groundwork, which turned out to be a very fun process, because
I couldn’t push to bjorn3
’s branch, so he always had to force-push a branch from my fork to his
fork’s branch to update the PR. What a fun endeavour :)
Anyway, in the end the perf. results were pretty impressive (5-10% wins on real-world crates) .
It has also caused some max RSS regressions sadly, and the size of the librustc_driver.so
was slightly
increased, but I think that this has been worth it for such a large performance boost.
Sometimes when a feature is missing, it’s not because it’s impossible or very difficult to implement it, but because nobody has tried to implement it yet :)
After merging the LTO PR, I have tried to reapply linker-plugin LTO to compare its performance to
the dylib
LTO approach. The results
hinted that linker-plugin LTO wasn’t really more efficient, so I have decided not to pursue it further.
Final results (instruction counts)
- Mean improvement: -4.10%
- Max improvement: -9.62%
Conclusion
Thanks for reading up to this point! By now you have seen most of the rustc
build pipeline
optimizations that we have been working on this year (and that I am aware of). To summarize, here
is the actual build pipeline of (Linux) rustc
on CI, as of October 2022:
- LLVM is built with PGO instrumentation (LLVM build #1).
-
rustc
that uses LLVM #1 is built. -
PGO profiles for LLVM are gathered by compiling several popular crates (
serde
,ripgrep
,clap
, etc.). - LLVM is built (LLVM build #2).
-
rustc
that uses LLVM #2 is built with PGO instrumentation. -
PGO profiles for
rustc
are gathered by compiling several popular crates (cargo
,diesel
), plus some stress tests. - LLVM is built with the profiles gathered in step 3 (LLVM build #3) and with BOLT instrumentation.
- BOLT profiles for LLVM are gathered by compiling the same crates as in step 3.
- LLVM is built with the PGO profiles gathered in step 3 and with the BOLT profiles gathered in step 8 (LLVM build #4).
-
rustc
that uses LLVM #4 is built with the PGO profiles gathered in step 6.
Build systems are fun :)
One of the reasons why I was and still am interested in these techniques is that most of them should “scale”, looking into the future. If you find a bottleneck in the compiler and optimize it, you can get a really nice speedup. But if someone else comes later and rewrites or deletes that code, you might need to optimize the same code again, or find another place where a bottleneck happens. On the other hand, if you enable a build optimization technique, like PGO, LTO or BOLT, it should improve the performance of the compiler after each and any change of the compiler’s code, without us needing to do anything more. This is what I mean by “scaling”.
Of course, there’s a limit to the amount of build/compilation optimizations that one can apply (there’s just not that many of them). But after a year of trying, I still feel like we’re not at the end :) Unfortunately, most of the demonstrated improvements have helped only Linux users, so one of our goals going forward is to try to port these improvements to other (tier-1) platforms, like macOS and Windows.
There are also some next steps that we’re planning to explore (in no particular order):
- Use BOLT also for rustc (PR) - so far the results were lackluster.
- Use LTO for
rustdoc
(PR) - haven’t managed to get any speed-ups so far. - Use LTO on more platforms (macOS, Windows).
- Use PGO on more platforms (macOS).
- Try to use fat LTO for
librustc_driver.so
(PR) - so far no real improvements and the CI build times are incredibly slow with fat LTO. - Port the whole build pipeline from bash to Python (PR)
to get better statistics about the build process, and possibly also to speed it up by introducing
some non-trivial caching. Having a slow CI build is annoying, because it makes it slower to
merge PRs, test PRs (with so-called
try
builds that test integration of the PR) and also to measure performance of PRs (because perf. runs on PRs always do atry
build first). - Revisit Call-site aware PGO (PR).
- Revisit
libstd
PGO (PR).
On an unrelated note, I have been also working on implementing support for runtime benchmarks
into the Rust performance suite, so that we could not only measure the compilation speed of rustc
,
but also the performance of Rust programs compiled by it. But more on that later :)
Anyway, that’s all for now. If you have any ideas how to speed up rustc
(but rememember, without
changing its source code! ) or any other comments, let me know on Reddit.
-
Well, we always do stage 2 builds, so each
rustc
“build” actually compiles the compiler two times, but that’s another story. ↩