1. Some literals you can build quickly (for example, A0400h could be the two-cycle sequence "MOV r0, #0xA0000; ORR r0, r0, #0x400"), but if you're taking more than three cycles to do it (or two cycles for ITCM code on ARM9), you just use LDR.
2. By convention, you only need to save r4-r11/fp and...