In the early days of personal computing CPU bugs were so rare as to be newsworthy.
-
@arclight timing degradation should not be visible outside of the highest-spec desktop CPUs which are really pushing the envelope even when they're new. Embedded systems and even mid-range desktop CPUs will never fail because of it. What might become visible is increased power consumption over time though.
@arclight on the other hand watch out for memory errors. Those can crop up much sooner than CPU problems due to circuit degradation: https://fosstodon.org/@gabrielesvelto/112407741329145666
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
@gabrielesvelto there was also no meaningful computer security nor much need for it in the days of 6502. it's much different when most computers are now connected to the internet and can be infected with malware within seconds of connecting.
-
@gabrielesvelto but UEFI is already quite complex, it has to find block devices, read their partition tables, read FAT file systems, read directories and files, load data in memory and transfer execution. Wouldn't a patch after all that not be too late?
@mdione yes, it's very complex, but motherboard firmware has a mechanism to load the new microcode right as the CPU is bootstrapped. That is even before the CPU is capable of accessing DRAM. All the rest of the UEFI machinery runs after that. Note that this early bootstrap mechanisms usually involves a separate bootstrap CPU, usually an embedded microcontroller whose task is to get the main x86 core up and running.
-
All in all modern CPUs are beasts of tremendous complexity and bugs have become inevitable. I wish the industry would be spending more resources addressing them, improving design and testing before CPUs ship to users, but alas most of the tech sector seems more keen on playing with unreliable statistical toys rather than ensuring that the hardware users pay good money for works correctly. 31/31
@gabrielesvelto I wonder if they could use said statistical toys as part of a large-scale fuzzing process to detect such bugs?
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
Fascinating thread. Do you know if the same issues exist on low power, embedded CPUs like ESP32, or is this something that mostly affects high-end stuff?
-
@gabrielesvelto that's the deep nerdy stuff I love about IT! Thanks a ton for sharing this!
@perpetuum_mobile @gabrielesvelto I used to even code in assembler on 8 bit platforms, for years I could not quite get my head round how modern CPUs worked until this thread (and now I know a bit more)
-
Bonus end-of-thread post: when you encounter these bugs try to cut the hardware designers some slack. They work on increasingly complex stuff, with increasingly pressing deadlines and under upper management who rarely understands what they're doing. Put the blame for these bugs where it's due: on executives that haven't allocated enough time, people and resources to make a quality product.
I don’t cut any slack for Intel producing two whole generations of CPUs with manufacturing flaws then trying to cover it up and never really offering full restitution to any customers.
-
All in all modern CPUs are beasts of tremendous complexity and bugs have become inevitable. I wish the industry would be spending more resources addressing them, improving design and testing before CPUs ship to users, but alas most of the tech sector seems more keen on playing with unreliable statistical toys rather than ensuring that the hardware users pay good money for works correctly. 31/31
@gabrielesvelto It was a very rich, exciting, interesting, and useful post! Thank you very much!
-
@perpetuum_mobile @gabrielesvelto I used to even code in assembler on 8 bit platforms, for years I could not quite get my head round how modern CPUs worked until this thread (and now I know a bit more)
@vfrmedia @perpetuum_mobile if you have some free time this is a good deep dive: https://cseweb.ucsd.edu/classes/fa14/cse240A-a/pdf/04/Gonzalez_Processor_Microarchitecture_2010_Claypool.pdf
While it doesn't cover some of the most recent advancement it captures 90% of what you need to know.
If you have a lot of free time and want to dive deeper there's this: https://www.agner.org/optimize/microarchitecture.pdf
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
@gabrielesvelto The book 'Silicon' by the Italian who designed the 4004, 8080 and Z80 is a most splendid read. Fascinating that he had to add reverse engineering optical confusions to minimise cloning by rivals.
-
@perpetuum_mobile @gabrielesvelto I used to even code in assembler on 8 bit platforms, for years I could not quite get my head round how modern CPUs worked until this thread (and now I know a bit more)
@vfrmedia @gabrielesvelto I did code a little bit in x86 asm when I was a teen. It was the only way to turn on SVGA modes in Turbo Pascal and I wanted to make games back then
I did a program which simulated a flame in real time, doing per pixel average of surrounding pixels and adding random 255 sparks on the bottom to make the flame move and look real -
@gabrielesvelto I wonder if they could use said statistical toys as part of a large-scale fuzzing process to detect such bugs?
@x0 hardware design already involves various forms of fuzzing, but the amount of state involved is so large that no amount of testing can be truly exhaustive
-
@x0 hardware design already involves various forms of fuzzing, but the amount of state involved is so large that no amount of testing can be truly exhaustive
@gabrielesvelto Makes sense. Especially when something like voltage/temperature fluctuations are involved.
-
@gabrielesvelto I went to a lecture in the early 1990's by Tim Leonard, the formal methods guy at DEC. His story was that DEC had as-built simulators for every CPU they designed, and they had correct-per-the-spec simulators for these CPUs.
At night, after the engineers went home, their workstations would fire up tools that generated random sequences of instructions, throw those sequences at both simulators, and compare the results. This took *lots *of machines, but, as Tim joked, Equipment was DEC's middle name.
And they'd find bugs - typically with longer sequences, and with weird corner cases of exceptions and interrupts - but real bugs in real products they'd already shipped.
But here was the banger: sure, they'd fix those bugs. But there were still more bugs to find, and it took longer and longer to find them.
Leonard's empirical conclusion is that there is no "last bug" to be found and fixed in real hardware. There's always one more bug out there, and it'll take you longer and longer (and cost more and more) to find it.
@grumble209 @gabrielesvelto I rember that the UltraSPARC-II (Blackbird) CPU, over it's lifetime (and to date, to boot) only had a single errata, and that was an extremely unlikely timing issue, not a logic bug. Unfortunately, that overall goodness was offset by the widespread off-chip L2 cache issue circa y2k.
-
@gsuberland thanks, I was playing a bit fast and loose with the terminology. As I was writing these toots I reminded myself that entire books have been written just to model transistor behavior and propagation delay, and my very crude wording would probably give their authors a heart attack.
@gabrielesvelto haha for sure

-
@gabrielesvelto haha for sure

@gabrielesvelto I actually keep meaning to find a decent reference text on FET construction and modelling. I've got plenty on SI/EMI, power delivery, etc. but everything I've found for FETs has been the sort of thing that presumes you're either someone with a deep background in semiconductor physics or a professional semiconductor/ASIC engineer just looking for a reference text. very little out there for EE folks who are coming at it from the practical side.
-
@gabrielesvelto there was also no meaningful computer security nor much need for it in the days of 6502. it's much different when most computers are now connected to the internet and can be infected with malware within seconds of connecting.
@burnitdown @gabrielesvelto in the 70's, most issues were largely logic bugs, nowadays there are a larger proportion of timing/analogue issues.
-
@mdione yes, it's very complex, but motherboard firmware has a mechanism to load the new microcode right as the CPU is bootstrapped. That is even before the CPU is capable of accessing DRAM. All the rest of the UEFI machinery runs after that. Note that this early bootstrap mechanisms usually involves a separate bootstrap CPU, usually an embedded microcontroller whose task is to get the main x86 core up and running.
@gabrielesvelto wow, and where does it get the microcode from? Another computer within the computer? (turtles and all that

-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
@gabrielesvelto I was just thinking about those bugs and something crept up my neck when i thought "now add hallucinating AIs to all that". Pretty sure they're already used in CPU development to deal with the increasing complexity.
So the problem will become much worse than it already is.
for the article. Loved it. -
@gabrielesvelto I actually keep meaning to find a decent reference text on FET construction and modelling. I've got plenty on SI/EMI, power delivery, etc. but everything I've found for FETs has been the sort of thing that presumes you're either someone with a deep background in semiconductor physics or a professional semiconductor/ASIC engineer just looking for a reference text. very little out there for EE folks who are coming at it from the practical side.
@gsuberland @gabrielesvelto You mean you don’t want my college device physics textbook that starts with solving Schrödinger's equation for a hydrogen atom? They should not have allowed that class at 7am.
For the practical side, I really like Jacob Baker’s books.