π€πͺ Human vs Compiler - ARM cycle level ARM assembly optimization
This video is a (very) deep dive into the main ROM serving algorithm from the Software Defined Retro ROM, running on an STM32F4 (Cortex-M4) MCU, covering
- the default hand-crafted assembly algorithm, in detail, explaining all of the key performance optimizations
- the most highly optimized assembly the GCC compiler can achieve from two different C implementations
- why I donβt use interrupts (they just are not fast enough for this use case, on this processor)
The results may surprise you.
If you think you can do better - post your code in the comments below!
π Links:
Software Defined Retro ROM (SDRR) github repo: https://piers.rocks/u/sdrr
SDRR intro: https://youtu.be/Jhe4LF5LrZ8
SDRR tech deep dive: https://youtu.be/pOZ2-W3dpZ8
β° Timestamps
00:00 π¬ Introduction
00:10 π§ͺ Test Methodology
01:50 π Results
02:25 πͺ Hand Rolled Assembly
12:38 π§ Naive C Implementation
16:15 π Better C Implementation
26:35 β‘ Interrupts - Why Not?
27:53 π Final Thoughts
Video content copyright (c) 2025 piers.media Limited. All rights reserved.
comments powered by Disqus