This video is a (very) deep dive into the main ROM serving algorithm from the Software Defined Retro ROM, running on an STM32F4 (Cortex-M4) MCU, covering

  • the default hand-crafted assembly algorithm, in detail, explaining all of the key performance optimizations
  • the most highly optimized assembly the GCC compiler can achieve from two different C implementations
  • why I don’t use interrupts (they just are not fast enough for this use case, on this processor)

The results may surprise you.

If you think you can do better - post your code in the comments below!

πŸ”— Links:
Software Defined Retro ROM (SDRR) github repo: https://piers.rocks/u/sdrr
SDRR intro: https://youtu.be/Jhe4LF5LrZ8
SDRR tech deep dive: https://youtu.be/pOZ2-W3dpZ8

⏰ Timestamps
00:00 🎬 Introduction
00:10 πŸ§ͺ Test Methodology
01:50 πŸ“Š Results
02:25 πŸ’ͺ Hand Rolled Assembly
12:38 πŸ”§ Naive C Implementation
16:15 πŸš€ Better C Implementation
26:35 ⚑ Interrupts - Why Not?
27:53 πŸ’­ Final Thoughts

Video content copyright (c) 2025 piers.media Limited. All rights reserved.

comments powered by Disqus