Hopefully the following will not be too much rambling and something nice will come of it.
I already went and although I could bash machine translation (I certainly throw my hat in with those that consider it all but useless) it would serve little purpose at this point, however I have been thinking about writing a post or something about what level of Japanese a ROM hacker should know if they intend to work with Japanese ROMs. While I can do the hacking side of things I thought it best to also ask those that do translating to make sure I am not saying something truly silly. I am not aiming for the the bare minimum (which is arguably next to nothing) but something that allows people to be reasonably functional.
To this end
Things one should appreciate about the Japanese language when playing ROM hacker (hacker side of the fence)
The types of Japanese characters and how they work-
Hiragana and Katakana which are collectively known as the kana. They are the basic constructs of the language with katakana usually being used for loanwords and foreign words and being somewhat more angular than the more freeform hiragana which are used for native words. There are some fairly accepted ordering/sorting methods (I believe gojuon order is the name of the most popular- stuff like
http://www.romhackin...t/utilities/55/ should add it) and few games deviate here but said ordering might well leave out some of less common, possibly obsolete, ones the script writer might use or that the game might add entire characters for characters with punctuation (see Dakuten and Handakuten) so be aware of this when constructing tables and the fonts they match up with.
Kanji- the elaborate symbols that originally came out of China. There are many such symbols and most consider to be the harder part of the language, to the point where translators tend to know a selection (there are various levels but there is not a set upper limit of kanji) and some works/fields are known for downplaying their use (most notably in the game/anime world for the likes of shounen manga and anime which is nice as games aimed at those audiences are some of the most popular targets for translation). There is no universally accepted ordering of kanji which makes relative searching, one of the most powerful yet simple tools available to the ROM hacker dealing with an unknown text encoding, tricky at best.
However there are lists of selected Kanji that are taught to people but unless you also find yourself interested in Japanese education/orthographic history you probably do not need to delve into it as a ROM hacker (if I am not overstepping my mark then as a translator it might help as games tend to be written by modern writers for a modern audience and knowing the way the modern language is taught/learned/graded can be a good thing*). More modern games do often share orderings or parts thereof between games so do give it a look to other games, especially ones from the same developer/publisher. Although the lists, the distinction between kokuji, kokkun and categories/moji (such that might still be considered to exist) might not be that important you should know the terms radical and stroke ordering of which radical refers to the base stroke/component it derives from and stroke is quite literally how many strokes are needed to finish the deal and is about as close as you might get to a proper ordering system (everything else is so much good luck if you happen to encounter it).
I do not know of any ROM hacking tools (or really much in the way of tools in general) that can add these lists like you might be able to add kana or Roman alphabets.
Equally know that games can and have picked and chosen only the ones they use (although they can be in say the order of the script/use in the game/use in the font) or a subset of common ones and added any extras to them.
*for instance a game writer will probably not be calling on some ultra obscure kanji in general text (if in a game as a "magic symbol" or decoration however then all bets are off) so there is probably no need to bust out your copy of Dai Kan-Wa Jiten for the help screen of the latest shounen tie in game.
All three can be intermingled in the text effectively at random; obviously there are situations in the language where things are expected to follow others but as far as your basic regex searches are concerned it is random. If however you want to change a bit of the font to allow for some basic translation and keep some Japanese intact it is probably better to axe a few kanji.
Furigana- ostensibly an optional pronunciation key for Kanji, however it can go well beyond that for certain games/authors. Officially the concept is designed for those that might not know Kanji that well to read things in it (in Japan it is supposed to be for younger people, ones that might know the spoken word but not the symbol that represents it, but it often helps those with less familiarity with the language too). Uncommon back on 16 bit and earlier systems things changed for the DS (especially with the touch screen) and it has seen a fair bit of it (the first Zelda I believe is a good example). This can often be the reason a sentence might be longer than it should be at first glance or pointers might be odd.
Japanese characters in general are fixed width unlike the Roman alphabet (think ijlt vs WQMK and such to say nothing of punctuation) and tend to sit within lines (think jypqf) unlike Roman character sporting languages, this then leads to one of the harder aspects of text hacking in variable width fonts (not necessary for Japanese so they tend not to be added in to the game code) and if you are really flash true line handling code. This cuts the other way as well but I am not sure this is the venue to deal with half width and full width characters/encodings, not to mention I will go so far as to accuse it of largely being a historical quirk (indeed one might argue some of the insane stuff the older consoles did was to work around these sorts of issues) and not something that warrants more than a passing mention for most hacking work.
Japanese text by virtue of Kanji but often in general tends to be somewhat shorter in terms of actual characters (the debate still rages when it comes to speech) so expect issues when it comes to fixed width menus (if you have hacked the font then you might consider adding two or more characters into a single tile as a crude but effective workaround) and to have to deal with pointers a lot.
Japanese technically does not have spaces between words with any you see being more aesthetic thanservingavitalfunctioninthelanguage so some of the more cute sentence structure hacking methods (When playing with a relative search tool on an English game I quite often assume there will be a sentence using the word the with a space either side, such a thing often yields an entire basic table with just that search. Equally if you assume the most common character in text is space and work from there you can get far) or pointer inferences (if you are unfortunate enough to be dealing with word level pointers so be it but thinking more a variation on the space being the most common/coming in given locations).
More importantly you should consider this when making a script dump if you are heading down that path and are using such things as a boundary and at least be prepared to be a bit flexible for your translator (I said aesthetic but fitting things in a text box can still be problem for those using Japanese).
However Japanese is not my forte so to the translators short of true language skills what would you like any hackers you work with to have an appreciation of? If you can tie it into a hacking concept like I did for some above even better but that is certainly not necessary. Equally some of my things to know might be given too much emphasis (I can not think of any real occasions where furigana has troubled me as a ROM hacker and I mainly mentioned it as it can be a nice place to stick a note or two) or too little (I can not say I have examined custom or standard encodings to see if there is much in the way of kanji ordering vis a vis the ?-moji or schoolboy versions of kyoiku/gakushu vs joyo and if it would help to at least scan over such when dealing with custom kanji encoding setups then yeah). The only things I think I would want to add/expand upon is something on the various Japanese encoding methods (shiftJIS, EUC-JP and such) and how they might fall short but I have yet to be bored enough to vet the encodings against the language (did it once for Arabic and some of the lesser encoding systems there and pretty much vowed never again), something on the vertical vs horizontal writings but intro sequences and graphics aside I am not sure I have seen a game use tategaki (vertical) and maybe something on romaji but that is more of an IME problem than something that worries hackers (if truly necessary I would imagine it would be kicked to OS level, a nice library or to a programmer that knows Japanese).
On a different note entirely it has come up once or twice in recent months in various conversations so I figured it might be worth mentioning- translators official and otherwise have on occasion not shown the Japanese language much reverence (one of the many reasons several games have got retranslations over the years) and it does cut the other way so should something come into a Japanese game from another source (quite often Chinese and European history but certainly not limited to it) and have got butchered along the way then you can "revert" it to how it is/"should be". A less than brilliant example might be some of the names of creatures/concepts in Final Fantasy but if you prefer an analogy it is a bit like treating your DVD or some such as a perfect source for your video- the people charged with the initial encoding might still have hosed it up/phoned it in.
Also as for nice links I linked it up in the past but it was not here so
http://www.loekaliza...m/mistakes.html might be worth a scan through and to save me digging up the thread I also linked up
http://www.joelonsof...es/Unicode.html .