can't find the link, but read somewhere, that they reduced the code to grab input (touch+buttons) from 8kb to 1kb, or along those lines.
i presume after all these years people figured so much thing about the DS, that they can do some heavy optimization to existing techniques, but some of them...