Yes, that's exactly it.
I've always heard that CRTs don't have a native resolution though.
Let me answer both points at once, because they're actually related. CRTs indeed don't have a native resolution, because the cathode ray tube operates independently from the number of pixels on the actual screen panel. It just fires 240 lines per frame (240p or 480i is the same in that regard) 60 times per second (288/576/50 in PAL signals). The panel itself has pixels distributed in a non-square grid, as shown in the image below:
And come CRTs have more than others. Smaller ones typically have lower maximum resolutions due to a smaller number of physical pixels, while accepting and processing the same signals.
However, lines are, well, lines. This means that a single line will inevitably pass through the middle of two vertically-adjacent physical pixels every other pixel, affecting both. It also follows that every physical pixel will be affected by two different lines passing through, which means it will always be an average of two vertically-adjacent "pixels" in the signal. This is what results in the vertical component of the "light bleed" you describe.
As for the horizontal component, the explanation lies within the fact that these lines are analog, not digital. This means the number of different light/color points per line isn't fixed. The theoretical maximum horizontal points in an analog line is 720 (704 if you account for the usual amount of overscan). Imagining square pixels, this is enough for a 4:3 picture at 480i (640), but not at 576i (768), so PAL signals always have some horizontal blur. 16:9 always use non-square pixels because of this (854 pixels would be needed in 480i, and 1024 in 576i). This gets even more confusing for video game consoles in the 240p era, since some used 256 horizontal pixels (the max possible value of a single byte), while others used 320 for proper square pixels at a 4:3 aspect ratio. Now, this means the number of horizontal pixels on a TV would pretty much
never match (or be a multiple of) the number of different color points in the signal it's being fed from a console. As was the case with the vertical mismatch, this leads to a physical/signal pixel number mismatch, which results in blurring. Not from light bleed, but from the need to average out the conflicting signals that each physical pixel is getting.
Furthermore, lower-quality signals from composite or RF cables make the horizontal component even worse due to signal interference from channels other than video.
Now, you may say "that's all true, but the result is that I still see a blur and I don't like it". Well, yes and no.
First, because most games of that era were developed with that blur in mind, and thus need it for some intentional visual effects, like the transparency in the waterfall in Sonic mentioned above. I've seen instances of both stripes and checkerboard patterns for transparency. They sometimes rely on it so much that simply using RGB cables is enough to expose some of these artifacts.
Second, because the blur you get from trying to fit a 256×224 with black bars that make 256×240 signal into a 1920×1080 HDTV, even if we assume 1440×1080 with black bars on the sides, is going to be a hell of a lot worse, both due to bilinear filtering and an uneven multiplication factor. For consoles that use 320×240, a proper upscaler such as the Framemeister or a mod such as the UltraHDMI will properly upscale this to 1280×960 with small black bars top and bottom and big ones left and right for a true 1080p image. But this is an edge case that few people will have access to, and it still won't work if the signal is 256×224, as is the case with the NES and SNES.
So, in conclusion, even if the blur you see actually does happen, it's hard to find an alternative that's not even worse outside of emulation. And even if you do, it may have unintended consequences, such as the Sonic waterfall case (admittedly an edge case, but there are other examples).
Sorry for the huge post