Modern processors () weren’t built as just fast processors, but as fast emulators of PDP-11. They are trying to expose the same programming model and keep compatibility.
This allowed C programmers to keep believing that their language is close to the underlying hardware. (C is no longer a low-level language)
The emulation comes to two points:
Sequential execution. The C standards before C11 have no parallelism whatsoever, instruction are expected to be executed sequentially and the result of one instruction is expected to be seen in the next instruction.
This has required modern processors to use instruction-level parallelism (ILP)—multiple instructions execute simultaneously as long as they are independent. ILP is one of the most complex (and power-hungry) parts of the modern processors.
A modern Intel processor has up to 180 instructions in flight at a time. —Chisnall2018
Flat memory. This is no longer true for decades. Processors have multiple levels of caches between registers and memory. They are largely invisible for programmers and are not exposed in the programming model.
First, this requires caches to be mutable and shared (The main complication for CPU cache is shared and mutable state).
Second, this prevents some useful optimizations. (e.g., a cache-aware garbage collector might eliminate copying of dead objects into memory)