Benchmark: C compilers for the 6502 CPU


Benchmark: C compilers for the 6502 CPU

Adding a C compiler in your game's toolchain is not an easy task. Each have their own strength, and weakness. You may want speed, ease of use, compliance to the standard, freedom, or anything. We'll try to compare some compilers, maybe it will help you.

While choosing a compiler to enter my game's toolchain, the speed was the least of the concerns but discussions invariably derived to the subject. This article will mainly focus on this question: what compiler produces blazing fast code?

Some code samples will be compared, each time with an explanation about the code, and why it may be useful to benchmark it.

That done, I'll talk about my experience with each compiler. Their strong points, their weaknesses. This part is necessarily subjective, I'll try my best to be factual.

Finally, a short "how to choose a compiler" guide will conclude the article. Hope it helps.

Let's go!

Introducing the contenders

cc65 is the most used compiler. It comes with an extensive toolsuite, is actively maintained, and has a big community of users. However, it is known to be slow... We'll see :)

vbcc is the cool kids' compiler. Less used than the king, it has the reputation of generating largely better code.

KickC is the young promising project. Is it a toy for nerds or a real option?

6502-gcc is the mysterious one. Nobody really masters it, the project seems dead, installation instructions are cryptic, ... We'll dig in its dark secrets!

6502-gcc has two interesting optimization flags:
 * "-O3" optimize for speed
 * "-Os" optimize for code size

Run length decoding

This benchmark comes from real life. It is a function prototyped in C for Super Tilt Bro. before being converted to assembly for performance. Data used are also the original ones.

The function parses a compressed blob to extract some random bytes from it. It is pure loop logic, burning many cycles reading the memory.


Red is execution time, blue is generated code's size.

First thing first, cc65 lives by its reputation of sluggishness there. It is a complete KO. Code generated is big and slow.

The winner is 6502-gcc, by a fair bit. KickC has a slight advantage over vbcc.

Fun fact: the more popular the compiler, the worst it perform in this test.

Memcopy

Here is the super common task of copying bytes in the memory. It is important to compare compiler on what our programs will spend the most time.

The bench comes in two flavors:
 * The normal one directly copies bytes with a for-loop
   * the compiler knows both addresses and number of bytes to copy, food for optimization
 * The no-inline is "memcpy-like" function, copying at most 256 bytes at once, not inlineable
   * the compiler knows nothing, it has to implement a generic copy function


All graphs are sorted: slower on the left, speedier on the right

cc65 is still on the last spot.

KickC confirms to be slightly better than vbcc. Note KickC absence from the no-inline version: KickC seems to always work with the full source, so no way to forbid an inlining.

6502-gcc takes a hit. Worst, extensive tests showed that results are highly varying when changing little things to the code.

Also, did you noticed how "-O3" poo'd itself in the inline scenario? gcc (the "real" one as well as 6502-gcc) works in two steps: the fronted optimizes C code, then the backend translates it to machine code. The fronted of 6502-gcc is the regular one, it does wonders, then comes the backend. The 6502 backend is seriously lacking love, and it is showing here: the fronted detected that the code is equivalent to a "memcpy", and told "copy 200 bytes from $0400 to $0200" to the backend. The backend was expected to implement it the most optimal way, instead it just called the standard "memcpy" function, which is far from optimal for an 8bit CPU.

RPG engine

This bench is voluntarily made to take advantage 6502-gcc strength: its fronted.

It is an RPG engine with lots of abstraction. There are structs, functions making the player strike, functions for hitting monsters, the player is wielding a weapon, ...

The benched function, initialize the game's state, play one turn, then return. So while there are functions updating structures, adding attack points, and subtracting hit points, the finality is just to set the memory in a particular state.


cc65... it becomes redundant. Ok, this test is definitely not for cc65 which refuse to do high-level optimization. This benchmark actually tests the quality of the high-level optimize.

KickC is a better vbcc performance-wise. That's a trend.

6502-gcc doesn't disappoint. It's frontend is world-class, and nails it perfectly. The generated code is a short list of "lda <constant> : sta <memory>", with even redundant LDAs pruned.

Code tailored for cc65

Did you read Ilmenit's great essay "Advanced optimizations in CC65"? Here it is: https://github.com/ilmenit/CC65-Advanced-Optimizations

Let's save cc65! Here we bench a code especially tailored for cc65, made by an expert. Should do it, yes?

The code itself is comparable to the RPG benchmark: it is a uselessly abstracted RPG engine. The difference: it loops 100 times and print characters on screen each time (so, no way to reduce it to a list of LDA+STA, computations will be done.) Then, Ilmenit applies various optimizations to make it really fast.

We'll bench two versions: the first one, without any optimization, and the last one, fully optimized.


KickC disapeared, and vbcc is half absent!

KickC had just too many limitations to compile the code: notably it refuses to output any runtime modulo or division.

vbcc generated an infinite loop on the unoptimized code, and suffered my lack of experience when integrating it in the benchmarking tool (hence the lacking code size.)

6502-gcc exhibited a bad bug (or once again my lack of experience), it puts global variables in random segments. The assembly generated had to be fixed by hand.

So, even before watching graphs: cc65 is the winner! It compiled this code!

Interestingly 6502-gcc does not perform on the unoptimized version. After all what has been said about its magic high-level optimizations! The problem here is that it was not allowed to inline functions. In C, when you declare a function without preceding it by "static", it has to be accessible to other compilation units, it must be there and fully usable: passing parameters on stack and all that. Of course, cc65 don't inline anyway so it is of little difference to it.

On the optimized version vbcc seems to be lacking, and cc65 is slightly better than 6502-gcc. The code has been optimized by hand to be straightforward to compile, so the compilers have not a lot of freedom to improve things.

What about this "???" entry? If you didn't read "Advanced optimizations in CC65" by now (shame on you!), you don't know how the optimized code is a mess to read. All available tricks have been used, even the author does not recommend going so far in real life.

The "???" is "6502-gcc -Os" on a version with only two of the 12 optimizations plus the use of "static" keyword. The resulting code is a lot like the unoptimized code, and the performance penalty is almost gone. That's why I think high level optimizations are what matter: it allows writing in C, without caring much of the assembly generated. Let low levels tricks to the compiler, focus on the logic.

Aside of performance

Ok, performance-wise it seems that cc65 is seriously lacking, vbcc and KickC are almost on par, and 6502-gcc varies from excellent to trash.

Actually, performance should not matter a lot when choosing a C compiler. Be sure to learn the assembly language, and you'll be able to get perfs where it is needed. Here is a summary of pros and cons of each compiler.

cc65
Pros
 * Rock-solid and battle-tested, it will not let you down.
 * Active development
 * Great resources available to learn
 * One of the most complete toolsuite out there
Cons
 * Performances (seriously, that's its only dark point)

vbcc
Pros
 * Acceptable performances
 * Active user base
 * Extensive documentation
 * Complete suite of tools (assembler, linker, versatile config files, ...)
Cons
 * Terrible licensing (like, I will recommend avoiding it just for that)
 * Buggy

KickC
Pros
 * Good performances
 * Hard to integrate with other tools (made to compile an entire project)
 * Active development (and a good base, may the future be bright!)
Cons
 * Compliance with C standard very partial (even "const" is badly supported)
 * Compiling is slow
 * Terrible compilation errors (at times, you just have a stack-trace)

6502-gcc
Pros
 * 100% compliance with the C standard
 * God-tier high-level optimizations
 * Generates assembly for ca65 (taking advantage of the cc65's toolsuite)
 * Human readable and on-point warnings and compilation errors.
Cons
 * Development inactive
 * Buggy
 * Variable quality of the generated code

Ok, but which one to choose for my project?

As always, it depends! Who are you?

Are you an experienced developer accustomed to a particular one? Stay on it. You already learned to master your tool, others bring little benefit.

It is your first C project for the 6502? Go for cc65, it is the most mature, and you'll find help.

You want high-level optimizations, and something that just work? KickC is made for you.

You want high-level optimizations, to rely on fine-prints of the C standard, and are ready to build your own toolchain? 6502-gcc is the way.

No, I won't recommend vbcc. It's licensing is terrible: it is closed source and you cannot use it for "commercial purpose" (without definition of it, nor if it applies to generated executables.) Also, it incorporates various tools and libraries with various licenses. If you want to do things "the right way", you will have to check at least three licenses to know if you can do what you have in mind.

Last word

Benchmarks are a nice tool to see the general picture, but never can be perfect. Especially these, it was my first experience with most of those compilers, and I may have had some details wrong. If you want to play by yourself, the tool used is available here: https://github.com/sgadrat/6502-compilers-bench it takes C files, and output speed metrics as well as the generated assembler.

Hope this little research can help somebody out there. Remember, in retro-development, the most important is to have fun!

<< Previous post (rollback netcode) | Next post (Music driver) >>

Get Super Tilt Bro. for NES

Download NowName your own price

Leave a comment

Log in with itch.io to leave a comment.