# # KRADFILE - Unicode # (kradfile-u) # # Radical Decomposition of 13,108 Japanese Characters # # A merger of kradfile, kradfile2, and 952 new decompositions. # # Copyright 2001 / 2007 / 2009: # # Michael Raine # Jim Breen # The EDR&D Group at Monash University # Jim Rose # The KanjiCafe.com # # 952 JIS x 0213 kanji radical decompositions # Copyright 2009 James Rose and the KanjiCafe.com # # The 5,801 JIS x 0212 kanji radical decompositions # Copyright 2007 James Rose and the KanjiCafe.com # # The 6,355 JIS x 0208 kanji radical decompositions # Copyright 2001/2007 Michael Raine, James Breen and the Electronic # Dictionary Research & Development Group at Monash University. # # A Grant of License detailing legal use of this file can be found at: # http://www.kanjicafe.com/kradfile_license.htm # # Jim Rose: # In the CJK Unified Ideographs range of Unicode, Japanese Kanji were # assigned code points corresponding to each of the characters in the # 2,965 most common kanji of the JIS x 0208 Level 1, the 3,390 next most # common kanji of the JIS x 0208 Level 2, and the 5,801 kanji of the # JIS x 0212 standard intended to supplement and extend the JIS x 0208. # # CJK Unified Ideographs Extension B of Unicode version 3.2 allocated code # points to the 3,695 kanji defined in the 2004 JIS x 0213, which was also # intended to supplement and extend the JIS x 0208. 952 kanji defined in # the JIS x 0213 do not occur in the JIS x 0212 (which I reckon means that # 3,058 kanji in the JIS x 0212 were not included in the JIS x 0213). # # CJK Unified Ideographs Extension A added to its list of encoded Japanese # characters the Unified Japanese IT Vendors Contemporary Ideographs. We do # not believe these characters are of any practical value to users of current # computing platforms and they are ignored. # # Personal Note: # When I first created my version of a kanji selection by-multi-radical # interface on the ICE MOCHA tool at Kanjicafe.com, I thought that the radical # selection interface would make a good starting point for a tool to both glean # errors and improve kradfile, and help build a bigger "kradfile" which added # the JIS x 0212 kanji. Jim Breen had mentioned that he would like to see the # JIS x 0213 set decomposed by radical, but my interface was designed to handle # Extended Unix Code Japanese, which included the JIS x 0208 and JIS x 0212, # but did not include the newer JIS x 0213 standard. Rather than deal with # learning how to cope with Unicode, I plowed ahead and developed kradefile2 as # a JIS x 0212 extension and companion to the JIS x 0208 based kradfile. # # But since we all know that we are slowly migrating our tools and systems to # Unicode, the idea of at least converting my little radical decomposition tool # over kept gnawing at me. So after pestering Jim Breen some more about it, on # June 1, 2009 Professor Breen sent me a file with the 952 JIS x 0213 kanji # needed to make a "complete" radical decomposition of all the Japanese kanji # defined in the CJK section of Unicode, and I commenced to finish this project # to a higher state of "doneness". # # Converting the tool over to Unicode opens up the possibility of using the same # code base to develop radical decompositions in Chinese or Korean (the C & K of # CJK), and if there is anyone interested in pursuing this, please contact me. # # There are some noteworthy changes to the new file vis-a-vis the kradfile # and kradfile2 legacy data. # # 1) the encoding scheme now in use is no longer EUC-JP, and # the convenient 2 bytes for the JIS x 0208 and 3 bytes for the JIS x 0212. # The encoding of this file is now UTF-8, and as such, the byte length of each # character is highly variable. Processing Unicode properly requires that your # software does not rely on a fixed byte length. The primary reason for the # change of encoding method is that the JIS x 0213 standard kanji are not # defined in the Extended Unix Code Japanese encoding scheme which predates # it (EUC-JP). # # 2) UTF-8 is a Unicode encoding, but keep in mind that Unicode itself is not. # There may come a time and place when you are using Unicode, but not UTF-8. # I doubt it, but I thought I would just throw that out for clarity. # # 3) The original kradfile used JIS x 0208 kanji to represent radicals. In # several instances there were no JIS x 0208 kanji which were also # representative of the radical alone, so a JIS x 0208 kanji containing the # radical was used as a kind of radical "place holder". When I developed # kradfile2, I maintained this convention so that kradfile2 would be simple # to integrate with existing tools already using kradfile. # # The following legacy JIS x 0208 kanji "place holders" are now replaced by # the radical/element itself: # # 化 which stood for ⺅ # 刈 which stood for ⺉ # 込 which stood for ⻌ # 汁 which stood for 氵 # 初 which stood for 衤 # 尚 which stood for ⺌ # 買 which stood for 罒 # 犯 which stood for 犭 # 忙 which stood for 忄 # 礼 which stood for 礻 # 个 which stood for 𠆢 # 老 which stood for ⺹ # 扎 which stood for 扌 # 杰 which stood for 灬 # 疔 which stood for 疒 # 禹 which stood for 禸 # 艾 which stood for ⺾ # 邦 which stood for ⻏ (2ECF) # 阡 which stood for ⻖ (2ED6) # # Unicode's inclusion of the JIS x 0212 and JIS x 0213 kanji allow us to # replace most of the "place holder" kanji with the actual radical. In # fact, Unicode also defines all 214 Kangxi radicals from Mei Yingzuo's Zihui, # or "Character Collection/Categorization" published in 1615, so we can do away # with all but two JIS x 0208 representative "place holder" characters. One # of these is a two stroke radical defined by Andrew Nelson in his 1962 # "The Modern Reader's Japanese-English Character Dictionary". # I'm not sure where the other 11 stroke radical came from, but Jim can edit # this sentence for me. These are represented instead by 并 (5E76) and # 滴 (6EF4). # # Other than the encoding change, the file is still in the same basic format # as the legacy kradfile and kradfile2. # # Decomposition of the JIS x 0213: # Two fonts were used in the decomposition of the JIS x 0213 so as to include # as much variation in the appearance of the kanji as possible. There were # several instances when one of the two fonts used (HiraMinPro-W3 and IPAMincho) # showed a particular stroke more distinctly than the other, and vise-versa. # # Thus despite the numerical pausity of fonts which reach into the JIS x 0213, # using two fonts provided enough variety to add valuable clarity when # distinguishing strokes and choosing radicals / elements. # # The useable portion of the file consists of 13,108 lines of text; one # for each of the: # # - 6,355 kanji defined in the JIS X 0208-1997 standard # - 5,801 kanji defined in the JIS x 0212-1990 standard # - 952 kanji defined in the JIS x 0213-2004 standard # and not found in the JIS x 0212 # # Each line is a follows: # - the kanji itself, # - a space followed by a colon (:) followed by a space, # - one or more radicals/elements which can be seen in the kanji. # - the radical/elements are themselves separated by a space # # The decomposition is based on what can be seen in typical kanji # glyphs. Elements themselves can be further subdivided. # # You can contact Jim Rose at Jim(at)Kanjicafe.com. # # Jim Rose, Christiansted, United States Virgin Islands # June 2009 ########################################################### # # K R A D F I L E # # Copyright 2001/2007 Michael Raine, James Breen and the Electronic # Dictionary Research & Development Group at Monash University. # See: http://www.csse.monash.edu.au/~jwb/edrdg/licence.html # for permissions for use and redistribution. # # This is the data file from which the "radkfile" is made, which in turn # drives the multi-radical lookup method in XJDIC, WWWJDIC and possibly # other dictionary and related software. # # The file is based on work done in 1994/1995 by Michael Raine in which he # analyzed all the JIS1/2 kanji and identified the constituent radicals and # other common elements, with the intention of facilitating the selection of # kanji within a dictionary program by identifying multiple elements. # The file was revised by Jim Breen in September 1995. Further revisions were # done in 1998/9 at the suggestion of Wolfgang Conrath, then a revision was # carried out in 2001 using suggestions from Yutaka Ohno based on a similar # decomposition made by Kobayashi. Further amendments were made in July # 2001 after suggestions from Hendrik. # # The file consists of 6,355 lines of text; one for each of the # JIS X 0208-1997 kanji. Each line is a follows: # - the kanji itself, # - a space followed by a colon (:) followed by a space, # - one or more radicals/elements which can be seen in the kanji. These # are drawn from JIS X 0208-1997. Where the element alone is not in # JIS X 0208, a kanji which contains the element is used instead. # # The decomposition is based on what can be seen in typical kanji # glyphs. Elements themselves can be further subdivided. For example, # 舌 is an element and so is 口, so the elements in 話 are <口 舌 言>. # # Jim Breen, Tokyo, January 2001 # Melbourne, July 2001 # Melbourne, Dec 2004 # ########################################################### # Nov 2004 - 八 replaced by ハ and 并 # Aug 2005 - added 斉; replaced 薺 with 齊 # Jan 2006 - added 一 to 今 # Apr 2006 - changed 坐, 座 and 挫 from 入 to 人 # Aug 2006 - added 卩 to 危 and 卵, dropped 刈 from 唖 # Sep 2006 - added 刀 and 氏 to 齊 and derivatives # Nov 2006 - added 巛 as an indexer, replacing 川 for many kanji # Jan 2007 - revised 春榛奏泰椿俸奉捧棒湊輳 adding 人 and removing ノ # Sep 2007 - made sure all the 糸 indices also had 幺 and 小 # Apr 2008 - added 廾 to all cases of 齊 # Dec 2008 - added ハ to 詮,粉; 一 and | to 置; | and 丶 to 否 ########################################################### # # K R A D F I L E - 2 # # Copyright 2007 James Rose and the KanjiCafe.com. # # Special GRANT OF LICENSE is hereby given to James Breen and the # Electronic Dictionary Research & Development Group at Monash # University such that said licensees may maintain, modify, use, # and redistribute this file. Derivatives should maintain this notice. # All other rights reserved. # # A Grant of License detailing legal use of this file can be found at: # http://www.kanjicafe.com/kradfile_license.htm # # Kradfile - 2 was created by James Rose by means of analysis of # all 5,801 JIS X 0212 Kanji and identification of the constituent # radicals and other common elements, with the goal of extending the # capability of current kanji selection by-multi-radical tools in this range. # Care has been exercised to maintain the same format as the original # kradfile by Michael Raine and Jim Breen to aid in integration with # existing electronic dictionary programs. # # Two fonts were used in decomposition so as to include as many glyphs as # possible. One apparently based on the JIS X 0212 standard itself, and # one based on Unicode. Each JIS X 0212 kanji is represented by 3 bytes # in EUC-JP encoding, as opposed to the two bytes used in the JIS X 0208 # range, so adjust your software accordingly if necessary. # # The useable portion of the file consists of 5,801 lines; one for each of the # JIS X 0212 kanji. Each line is a follows: # - the kanji itself, # - a space followed by a colon (:) followed by a space, # - one or more radicals/elements which can be seen in the kanji. These # are drawn from JIS X 0208-1997. Where the element alone is not in # JIS X 0208, a kanji which contains the element is used instead. # # The decomposition is based on what can be seen in typical kanji # glyphs. Elements themselves can be further subdivided. # # You can contact Jim Rose at Jim(at)Kanjicafe.com. # # Jim Rose, Christiansted, United States Virgin Islands # September 2007 ###########################################################