Densetsu's Translation Toolbox

Discussion in 'NDS - ROM Hacking and Translations' started by Densetsu, Oct 18, 2011.

Oct 18, 2011

Densetsu's Translation Toolbox by Densetsu at 9:40 AM (33,710 Views / 10 Likes) 33 replies

  1. Phoenix Goddess

    Member Phoenix Goddess The Ninja's Protégée

    Joined:
    Apr 25, 2009
    Messages:
    3,810
    Location:
    Away from civilization.
    Country:
    United States
    And this is why google translate(and machine translators in general) is always a bad idea.
    Not to mention that machine translators can't help you with verbs, particles, direct objects, or hardly anything else to make the sentences sound less like gibberish and more like accurate sentences. They will almost always get them wrong.
     
    4 people like this.


  2. Densetsu
    OP

    Former Staff Densetsu Pubic Ninja

    Joined:
    Feb 2, 2008
    Messages:
    3,435
    Location:
    Wouldn't YOU like to know?
    Country:
    United States
    Correct Translations
    So if I were being generous, I'd say you got 2 out of 3, or a 67% translation accuracy rate, doing your best. And those sentences weren't even that hard. Now for the fun stuff:
    Correct Translations
    So you see, Google Translate is utter garbage when it comes to translating. You can get barely acceptable translations of textbook Japanese, but when the Japanese involves puns and vernacular spoken language (which is more common than textbook Japanese), you're going to get it wrong if you rely on Google.

    And these are just a few examples. Imagine doing this for an entire game, which may contain anywhere from hundreds to thousands of lines of Japanese text.

    As a related aside, I also get annoyed when I see "translation" projects where someone is listed as a "translator" in the credits. Unless you can actually read Japanese, understand it, and put it into another language in a way that is meaningful to the target audience, you didn't "translate" anything. Google did. And poorly, I might add. You should list yourself in the credits as a "Google translation editor," or even "Google translation guesstimator." The title of "translator" should be reserved only for translators.

    /rant
     
  3. FAST6191

    Reporter FAST6191 Techromancer

    pip
    Joined:
    Nov 21, 2005
    Messages:
    22,353
    Country:
    United Kingdom
    Hopefully the following will not be too much rambling and something nice will come of it.
    I already went and although I could bash machine translation (I certainly throw my hat in with those that consider it all but useless) it would serve little purpose at this point, however I have been thinking about writing a post or something about what level of Japanese a ROM hacker should know if they intend to work with Japanese ROMs. While I can do the hacking side of things I thought it best to also ask those that do translating to make sure I am not saying something truly silly. I am not aiming for the the bare minimum (which is arguably next to nothing) but something that allows people to be reasonably functional.

    To this end
    Things one should appreciate about the Japanese language when playing ROM hacker (hacker side of the fence)
    The types of Japanese characters and how they work-
    Hiragana and Katakana which are collectively known as the kana. They are the basic constructs of the language with katakana usually being used for loanwords and foreign words and being somewhat more angular than the more freeform hiragana which are used for native words. There are some fairly accepted ordering/sorting methods (I believe gojuon order is the name of the most popular- stuff like http://www.romhackin...t/utilities/55/ should add it) and few games deviate here but said ordering might well leave out some of less common, possibly obsolete, ones the script writer might use or that the game might add entire characters for characters with punctuation (see Dakuten and Handakuten) so be aware of this when constructing tables and the fonts they match up with.

    Kanji- the elaborate symbols that originally came out of China. There are many such symbols and most consider to be the harder part of the language, to the point where translators tend to know a selection (there are various levels but there is not a set upper limit of kanji) and some works/fields are known for downplaying their use (most notably in the game/anime world for the likes of shounen manga and anime which is nice as games aimed at those audiences are some of the most popular targets for translation). There is no universally accepted ordering of kanji which makes relative searching, one of the most powerful yet simple tools available to the ROM hacker dealing with an unknown text encoding, tricky at best.
    However there are lists of selected Kanji that are taught to people but unless you also find yourself interested in Japanese education/orthographic history you probably do not need to delve into it as a ROM hacker (if I am not overstepping my mark then as a translator it might help as games tend to be written by modern writers for a modern audience and knowing the way the modern language is taught/learned/graded can be a good thing*). More modern games do often share orderings or parts thereof between games so do give it a look to other games, especially ones from the same developer/publisher. Although the lists, the distinction between kokuji, kokkun and categories/moji (such that might still be considered to exist) might not be that important you should know the terms radical and stroke ordering of which radical refers to the base stroke/component it derives from and stroke is quite literally how many strokes are needed to finish the deal and is about as close as you might get to a proper ordering system (everything else is so much good luck if you happen to encounter it).
    I do not know of any ROM hacking tools (or really much in the way of tools in general) that can add these lists like you might be able to add kana or Roman alphabets.
    Equally know that games can and have picked and chosen only the ones they use (although they can be in say the order of the script/use in the game/use in the font) or a subset of common ones and added any extras to them.

    *for instance a game writer will probably not be calling on some ultra obscure kanji in general text (if in a game as a "magic symbol" or decoration however then all bets are off) so there is probably no need to bust out your copy of Dai Kan-Wa Jiten for the help screen of the latest shounen tie in game.

    All three can be intermingled in the text effectively at random; obviously there are situations in the language where things are expected to follow others but as far as your basic regex searches are concerned it is random. If however you want to change a bit of the font to allow for some basic translation and keep some Japanese intact it is probably better to axe a few kanji.

    Furigana- ostensibly an optional pronunciation key for Kanji, however it can go well beyond that for certain games/authors. Officially the concept is designed for those that might not know Kanji that well to read things in it (in Japan it is supposed to be for younger people, ones that might know the spoken word but not the symbol that represents it, but it often helps those with less familiarity with the language too). Uncommon back on 16 bit and earlier systems things changed for the DS (especially with the touch screen) and it has seen a fair bit of it (the first Zelda I believe is a good example). This can often be the reason a sentence might be longer than it should be at first glance or pointers might be odd.

    Japanese characters in general are fixed width unlike the Roman alphabet (think ijlt vs WQMK and such to say nothing of punctuation) and tend to sit within lines (think jypqf) unlike Roman character sporting languages, this then leads to one of the harder aspects of text hacking in variable width fonts (not necessary for Japanese so they tend not to be added in to the game code) and if you are really flash true line handling code. This cuts the other way as well but I am not sure this is the venue to deal with half width and full width characters/encodings, not to mention I will go so far as to accuse it of largely being a historical quirk (indeed one might argue some of the insane stuff the older consoles did was to work around these sorts of issues) and not something that warrants more than a passing mention for most hacking work.

    Japanese text by virtue of Kanji but often in general tends to be somewhat shorter in terms of actual characters (the debate still rages when it comes to speech) so expect issues when it comes to fixed width menus (if you have hacked the font then you might consider adding two or more characters into a single tile as a crude but effective workaround) and to have to deal with pointers a lot.

    Japanese technically does not have spaces between words with any you see being more aesthetic thanservingavitalfunctioninthelanguage so some of the more cute sentence structure hacking methods (When playing with a relative search tool on an English game I quite often assume there will be a sentence using the word the with a space either side, such a thing often yields an entire basic table with just that search. Equally if you assume the most common character in text is space and work from there you can get far) or pointer inferences (if you are unfortunate enough to be dealing with word level pointers so be it but thinking more a variation on the space being the most common/coming in given locations).
    More importantly you should consider this when making a script dump if you are heading down that path and are using such things as a boundary and at least be prepared to be a bit flexible for your translator (I said aesthetic but fitting things in a text box can still be problem for those using Japanese).

    However Japanese is not my forte so to the translators short of true language skills what would you like any hackers you work with to have an appreciation of? If you can tie it into a hacking concept like I did for some above even better but that is certainly not necessary. Equally some of my things to know might be given too much emphasis (I can not think of any real occasions where furigana has troubled me as a ROM hacker and I mainly mentioned it as it can be a nice place to stick a note or two) or too little (I can not say I have examined custom or standard encodings to see if there is much in the way of kanji ordering vis a vis the ?-moji or schoolboy versions of kyoiku/gakushu vs joyo and if it would help to at least scan over such when dealing with custom kanji encoding setups then yeah). The only things I think I would want to add/expand upon is something on the various Japanese encoding methods (shiftJIS, EUC-JP and such) and how they might fall short but I have yet to be bored enough to vet the encodings against the language (did it once for Arabic and some of the lesser encoding systems there and pretty much vowed never again), something on the vertical vs horizontal writings but intro sequences and graphics aside I am not sure I have seen a game use tategaki (vertical) and maybe something on romaji but that is more of an IME problem than something that worries hackers (if truly necessary I would imagine it would be kicked to OS level, a nice library or to a programmer that knows Japanese).

    On a different note entirely it has come up once or twice in recent months in various conversations so I figured it might be worth mentioning- translators official and otherwise have on occasion not shown the Japanese language much reverence (one of the many reasons several games have got retranslations over the years) and it does cut the other way so should something come into a Japanese game from another source (quite often Chinese and European history but certainly not limited to it) and have got butchered along the way then you can "revert" it to how it is/"should be". A less than brilliant example might be some of the names of creatures/concepts in Final Fantasy but if you prefer an analogy it is a bit like treating your DVD or some such as a perfect source for your video- the people charged with the initial encoding might still have hosed it up/phoned it in.

    Also as for nice links I linked it up in the past but it was not here so http://www.loekaliza...m/mistakes.html might be worth a scan through and to save me digging up the thread I also linked up http://www.joelonsof...es/Unicode.html .
     
    3 people like this.
  4. DS1

    Member DS1 伝説の雀士

    Joined:
    Feb 18, 2009
    Messages:
    1,245
    Location:
    Yes!
    Country:
    United States
    1 person likes this.
  5. Densetsu
    OP

    Former Staff Densetsu Pubic Ninja

    Joined:
    Feb 2, 2008
    Messages:
    3,435
    Location:
    Wouldn't YOU like to know?
    Country:
    United States
    This topic may contain more information relevant to your question:
    Japanese Programming Madness
     
  6. Densetsu
    OP

    Former Staff Densetsu Pubic Ninja

    Joined:
    Feb 2, 2008
    Messages:
    3,435
    Location:
    Wouldn't YOU like to know?
    Country:
    United States
    I'm just posting here because I had a random thought regarding translation and I wanted to get it out in writing.

    I contend that the name of the character Palutena (of Kid Icarus fame) was mis-translated from the original Japanese.

    In Japanese, her name is パルテナ (refer to the Nintendo Wiki link above). The direct transliteration of パルテナ is PA-RU-TE-NA.

    But the name パルテナ is taken from the Parthenon in Greece. The Japanese word for "Parthenon" is パルテノン (transliterated as PA-RU-TE-NO-N).

    パルテノン = Parutenon = Parthenon
    パルテ = Parutena = Should have been translated as Parthena, not Palutena.

    It should also be noted that "Parthenon" is the English transliteration of the Greek "Παρθενών." The word "Parthenon" is as accurate as can be in capturing how the word is pronounced in Greek. But the Japanese language lacks the "th" sound, so it is pronounced as "Parutenon."


    My point in posting this to my Translation Toolbox is to get across the idea that when you translate, you really need to consider the source material from which a Japanese word or name is derived, rather than just directly transliterating because it's easy. I just so happened to know of the Parthenon, and I also happened to know that it was called パルテノン神殿 in Japanese, so I made the connection to Palutena (パルテナ).

    Perhaps the team responsible for localizing Kid Icarus (which, incidentally, is called 光神話 パルテナの鏡, lit. "Light Mythology: Palutena's Mirror" in Japan) didn't know about the Parthenon. Maybe Palutena would have come to be known as Parthena had someone on the localization team been more familiar with Greek history. Or maybe they did know, but decided to go with "Palutena" anyway.

    This issue came up once when someone pointed out the way we handled a translation in Blood of Bahamut:
    Warning: Spoilers inside!

    We arrived at this translation by holding a poll:
    Warning: Spoilers inside!

    The poll result prompted the following conversation:
    Warning: Spoilers inside!
    Aganar is right about the words "Bahamut" and "Behemoth" coming from the same Semitic root, but our translation "error" was propagated from SE's initial error of turning the two words into two separate things.

    We simply did not want to render the Japanese word kyojuu into "giant beast" because that would have sounded generic, so we had to come up with some candidate words to use instead. SE fans expected us to stick to terminology they're used to seeing in an SE game, so that's what we did. And correct though Aganar may be, the fact of the matter is that the vast majority of people wouldn't have cared about staying true to Semitic etymology.

    Which brings me back to Palutena. Although the etymologically correct translation of パルテナ would probably have been "Parthena," maybe the translators had their reasons for localizing it to Palutena instead. This is just one instance of the phrase "translation is an art."

    Go figure.
     
  7. Masquerade-Q
    This message by Masquerade-Q has been removed from public view by a moderator, Mar 30, 2017.
    May 6, 2012
  8. FAST6191

    Reporter FAST6191 Techromancer

    pip
    Joined:
    Nov 21, 2005
    Messages:
    22,353
    Country:
    United Kingdom
    A nice example and I get the reasoning but where they might derive from the same work and maybe even word to my reading at best it is a Bahamut and Neo Bahamut situation.

    "Catastrophis - God of Catastrophes"

    Depending upon your translation more than a few Abrahmic, Greek, Roman, Norse...... mythological figures lent their names to or had their names derived from phenomena or at least derived fairly recognised synonyms (possibly also antonyms and words that are related but take some explaining- Vulcanisation is the process of adding sulphur to plastics for instance). I agree it can look a bit odd but "Final chapter" tends to mean (whether it is a trope I will leave for a later debate) boss of all bosses give or take bonus content throughout fiction and even more so in games.

    Equally it might not have as much recognition as say samurai, ninja, shuriken or katana... but kyojuu/Kaiju (pushing it a bit perhaps) has never the less become something of a loanword in English (more or less referring to the monsters as seen in Godzilla films) with other connotations which may well be different to Japanese (which seems to roughly translate/have similar range as beast- beast of burden, a mythological beast, "what a beast"/is that your beast?) but exist none the less.

    Anyway the talk of continent/world bearing monsters has now got me thinking of Discworld so before I start pondering whether modern interpretations of mythology (it it a bit of a longer range but see the evolution and variations on elves) or indeed the mythology of modern fiction (certainly they have entered the lexicon- see superman) and have to beat down the meeja studies/philosophy part of me once more I think I will leave with if you ever needed another reason why machine translation is not something you should ever use this would be it- I can see machine translation getting somewhere even to the point where it can use context, dialect and maybe a massive database but this sort of thing is true or AI or not at all.
     
  9. Densetsu
    OP

    Former Staff Densetsu Pubic Ninja

    Joined:
    Feb 2, 2008
    Messages:
    3,435
    Location:
    Wouldn't YOU like to know?
    Country:
    United States
    Agreed. After all, we don't translate words like "sushi." Sushi is sushi is sushi. The same is true for all the other words you mentioned. In my last post I forgot to mention that we probably could have just rendered 巨獣 as simply "kyojuu" and left it at that.
    Warning: Spoilers inside!
     
  10. FAST6191

    Reporter FAST6191 Techromancer

    pip
    Joined:
    Nov 21, 2005
    Messages:
    22,353
    Country:
    United Kingdom
    English suffix or portmanteau of something like shogun and senate? I guess it matters little in the end but I figured it should be pondered at least.

    Also it might just be a bit too much total war back when but shogun to me translated more closely to a lord in the classical English sense as opposed to the rather more narrow terms general and warlord in that they were responsible for other things too (generals- not terribly useful in peacetime other than readiness but lords tend to do the actually running of stuff too) or am I conflating it with daimyo?. Now this might be falling back to the samurai as more than a warrior thing but I sense this is getting off topic.

    Speaking of translating sushi though ignoring that I can (whether I should though...) get it from street vendors outside asiatown if I want and it is not surprising to see it in the chiller cabinet of a basic city centre supermarket ( http://moblog.net/media/e/u/p/euphro/travel-snack.jpg ) it has some implications of the exotic and moreover it is not quite translated precisely (how many people will still call it raw fish as opposed to the general method of preparation that it is?). This applies even more so for rice balls and some milk drinks so there have been occasions in translations where even sushi will get axed in favour of some random generic foodstuff even to the point where it cuts the other way and it will end up being called something like scampi (I just checked and it too has similar "mistranslations" to sushi or regional variations).

    Equally I also failed to acknowledge properly that I agreed entirely with the decision and that as it has become part of the S.E. mythology (both as part of the translated works and in general) to differ here would be a cause to raise an eyebrow.
     
  11. Inori

    Member Inori GBAtemp Regular

    Joined:
    Dec 13, 2008
    Messages:
    109
    Location:
    泡沫の夢
    Country:
    Australia

    I think the general concensus is that a shogun is the supreme general and a daimyo is a lord / vassal of the shogun. The daimyo themselves then had samurai underneath them.

    But the importation of such words does raise the question of how much of the original semantic value is imported along with it. To my knowledge, a shogun is defined as a "barbarian-quelling genrealissimo", but I would probably equate him to a "commander-in-chief" instead. However, I usually see the term shogun equated with "general", and people seem pretty happy with that.

    You can draw a parallel between the "behemoth" issue and 和製英語 ("Japanese-made English"). A deconstruction of the term is enough to give it away: they should be treated as Japanese terms. The easiest example I can think of is テンション. More often than not, I see it being used to mean spirits (e.g テンション高い), and not in the sense that native English speakers would use it "There was a lot of tension between the two".


    Extending that argument when it comes to the translation of games: it's important to translate a game on the game's terms (such as how said game "imports" words or how they perceive / treat different terms). This is particularly true for a fantasy-genre game like Blood of Bahamut; real world knowledge is less important than specific lexical knowledge (where the lexicon is the Square Enix lexicon).
     
    1 person likes this.
  12. shadowmanwkp

    Member shadowmanwkp Your roms are on another rom site

    Joined:
    Apr 17, 2008
    Messages:
    486
    Location:
    Vleuten, The Netherlands
    Country:
    Netherlands
    Hey I can legitimately translate text but I can't understand that at all :<

    You know, I once watched discovery on Dutch tv and they actually pulled this one off very nicely. Dubbing is extremely uncommon here, so 90% of all things that originate from abroad is usually in its native language (with a general exception being kid's series). Anyways, I was watching dirty jobs on discovery, and it was about the preservation of the "common tern". Mike Rowe was churning out a lot of corny jokes about the name of the birds (like: why would you preserve a "common" bird).

    Sadly that doesn't translate to Dutch at all. In Dutch these birds are called "fish thieves". In a stroke of genius the Dutch translators made their own jokes about the name fish thieves, like: why the hell would you preserve a bird if they are stealing all of your fish away anyways? I listened to Mike and read the subs, and I actually liked the subs better xD.
     
  13. Densetsu
    OP

    Former Staff Densetsu Pubic Ninja

    Joined:
    Feb 2, 2008
    Messages:
    3,435
    Location:
    Wouldn't YOU like to know?
    Country:
    United States
  14. StorMyu

    Member StorMyu "I'm too old for this"

    Joined:
    Jan 2, 2010
    Messages:
    892
    Country:
    France
  15. FAST6191

    Reporter FAST6191 Techromancer

    pip
    Joined:
    Nov 21, 2005
    Messages:
    22,353
    Country:
    United Kingdom
    I stumbled across it when writing the little Japanese section of the ROM hacking docs so can I also recommend at least http://www.sljfaq.org/afaq/symbol.html (and probably the site as a whole); you have http://kanji.sljfaq.org/draw.html as a offhanded link but the main site is pretty good (it being web distillation of a fairly old newsgroup after all) for both hacking* and translation/general language learning. I am sure people would have dropped back down to the main site but a push in that direction might be good too.

    *earlier and for the most part it is still in effect I advised against some regular expressions style searches but 「 and 」being the equivalent of quotes would make it very good for sentence length and punctuation driven decoding both of which are quite powerful techniques for figuring out a custom encoding.

    Also from the guide I grabbed a few tools and started a list of CAT tools.
    ----
    General Japanese capable text editors

    Although given an input method editor and an appropriate configured operating system just about anything will do there are certain features that are useful to have in a text editor when you are editing Japanese. To this end a link to a couple of them

    NJStar
    NJStar
    This is the most commonly used text editor for the Japanese language and has found favour among the translation teams working on ROM hacking. It is largely shareware/trial although paid options do exist.

    JWPCE
    JWPCE
    An older freeware program that in many ways sits alongside NJStar above.

    Rom hacking tools

    A hex editor capable of reading tables is quite useful but there are a couple of other tools that are useful

    Get My Hex
    Filetrip download
    Author homepage
    [​IMG]
    Does what it is named for and will return the hexadecimal equivalent of the input text for several common encoding methods.

    CAT tools

    Although this is not a language document there are things you can do as a ROM hacker to help projects along and one of those is Computer assisted translation (CAT). This is not the same as machine translation but a kind of lookup program and database for previous translations and helps to ensure consistency in terms and other such things; for instance if you are translating a massive RPG and you meet a concept three times in a game but translate it three different ways it is not going to look good.

    Free and Open source tools

    Although the professional field is dominated by a handful of pricey tools there are some freeware/open source tools

    Anaphraseus
    Project sourceforge page

    OmegaT
    Project sourceforge page
    A java based tool and one of the more popular open source programs.

    XLIFF Translator
    Project homepage
    XLIFF is about as close as to an inter software conversion standard as it gets in CAT tool world. The program itself is an MIT licensed hook for a piece of professional software but functions none the less.

    Commercial tools

    On the paid/commercial front there are other options. Being “industrial”/professional/industry specific software though the prices have a habit of getting rather high and there is a fairly well recognised/supported format known as XLIFF (an XML based format aimed specifically at translation) many of the open source tools support as well as some of the commercial ones.
    Still

    Trados
    Homepage
    Arguably the market leader in the professional CAT tools.

    memoQ
    Homepage
    Not as popular as the other two and similarly priced but rapidly gaining a following.

    Wordfast
    Homepage
    Various tools have been released under this branding and depending where you go the term wordfast can refer to any or all of them. Still a very popular series of CAT tools and related technologies.[/url]
     
    Pablitox and WiiUBricker like this.

Share This Page