USING THE JIS X 0212-1990 CHARACTER SET ======================================= Jim Breen jwb@dgs.monash.edu.au 21 June 1996 I am going to start with an extract from Ken Lunde's excellent "cjk.inf" document: "2.1.3: JIS X 0212-1990 This supplemental Japanese character set standard enumerates 6,067 characters, 5,801 of which are kanji ordered by radical then total number of (remaining) strokes. All 5,801 kanji are unique when compared to those in JIS X 0208-1990 (see Section 2.1.2). The remaining 266 characters are categoried as non-kanji. o Row 2: 21 diacritics and symbols o Row 6: 21 Greek characters with diacritics o Row 7: 26 Eastern European characters o Rows 9 through 11: 198 alphabetic characters o Rows 16 through 77: 5,801 kanji (last is 77-67) Appendix C of UJIP provides a complete illustration of the JIS X 0212-1990 character set standard." [The UJIP Ken mentions is his "Understanding Japanese Text Processing" book, published by O'Reilly. As well as that Appendix, pages 42-45 contain important information about the set.] Usage of the characters in the Supplementary Kanji set (which I will refer to here as JIS212 for brevity,) is not wide-spread. In fact, regular access to these kanji is unlikely to become a feature of Japanese text processing until the rather larger set of characters in the CJK portion of the ISO 10646/Unicode set is implemented widely. Why?, well they are pretty obscure kanji, and it is very rare to find an occasion in normal Japanese text-handling when the 6,355 characters in the JIS X 0208-1990 set will not suffice. The only specific JIS212 font file I am aware of is the "jisksp16.bdf", prepared by Koichi Yasuoka and others. My copy is Version 0.9 and is dated April 25, 1995 (ANZAC Day!). I have used this file to create the JIS21216.FNT binary bitmap file for use with DOS applications. I have compiled an information file about the 5,801 kanji in the JIS212 set. The file is called KANJD212, and is in the same format as the KANJIDIC file covering the JIS208 set. See the KANJD212.DOC file for more information. I have started compiling an EDICT-format dictionary collection which uses JIS212 kanji, called edicth. The first (tiny) release has been made. It can be used with JDIC 2.6 and xjdic V2.2. As far as platforms supporting the JIS212 kanji, I have been able to ascertain as follows: A. UNIX (a) MULE The MULE (Multilingual Emacs) system supports JIS212 codes. I am not an emacs user, so I cannot comment much further. (I have been in touch with Taichi Kawabata who has been working on the bushu/stroke count files which are to be used in the formal merger of mule with (n)emacs. The data files for these are derived from KANJIDIC & KANJD212.) (b) kterm The kterm (kanji xterm) can be made to support JIS212 by applying a patch file "kterm-6.1.0-6.1.0.wd2.patch" to the X11R6 version of kterm. I have successfully installed this within X11R5 under both Linux and DEC Ultrix. This kterm version supports both JIS208 and JIS212 in JIS (ISO 2022) and EUC codings, but does not support Shift-JIS at all. (I also encountered a bug in the patch which I was able to fix, and the fix is now part of the formal patch.) (c) WNN I presume from the existence of the "hojo_wnn.src" file, which contains over 1,000 WNN henkan file entries of mappings into compounds of JIS212 kanji, that there are versions of WNN which support JIS212. I have never got around to investigating them. (d) jstevie I am an old `vi' user, and desperately wanted to be able to edit files containing JIS212 kanji using a vi-like editor. As I use jstevie under both Linux and Ultrix, I asked Junn Ohta, who did much of the kanjification of stevie, for some suggestions, and following his advice, I produced a version of jstevie which will edit EUC files containing JIS212 (and JIS208) kanji. Of course it will not handle Shift-JIS, and I haven't got around to doing a JIS version, although only a few changes are needed for this. Of course, it needs the special version of kterm. I have released this version on ftp://ftp.cc.monash.edu.au/pub/nihongo (d) xjdic Of course, I needed to be able to use my own dictionary system with JIS212 kanji, so I have extended the xjdic and xjdxgen software to handle files with JIS212 kanji. This was released as part of V2.2. I considered making xjdic handle KANJIDIC and KANJD212 as a single kanji dictionary file, however I have postponed that for a while. As an interim step, the way to use KANJD212 with xjdic is to treat it as one of the 9 possible dictionary files. This allows reasonable searching capability. B. MacIntosh I am not aware of any software which handles JIS212 kanji on Macs. Bear in mind that the JLK and Kanjitalk software uses Shift-JIS, which cannot support JIS212. C. Windows. As for Macs. (I have suggested to Stephen Chung that he extend JWP to support JIS212 kanji. JWP and WinJDic are among the very few pieces of Windows software that use EUC coding internally.) DOS. The Japanese version of DOS (DOS/V), and the NEC PC-9800 versions all use Shift-JIS, and hence cannot support JIS212. I have modified my JDIC dictionary and JREADER text reader programs to support files with JIS212 kanji. They both reference kanji in the KANJD212 file. This was part of the V2.6 release. Following is an extract on the subject from the JDIC26.DOC file. APPENDIX E: - SUPPORT FOR JIS X 0212-1990 KANJI =============================================== From V2.6, JDIC & JREADER support the 5,801 supplementary kanji of the JIS X 0212-1990 standard. These are more rarely use kanji than the common JIS X 0208 kanji, and JDIC and JREADER are the first PC software to handle them at all. They can only be coded using EUC (where they are preceded by a 0x8F byte) or JIS/ISO2022 coding; Shift-JIS cannot encode these kanji. The support for these kanji is more limited than with the JIS X 0208 kanji, and what is provided is as follows: (a) JDIC For JDIC, JIS212 kanji can occur in ordinary dictionary files, and will be displayed as normal. The kanji can be "selected" via the Alt-F10 function, and the history retrieved with the Alt-F4 function. For looking up kanji in a kanji dictionary: (i) instead of the KINFO.DAT file, the text KANJD212 file is used with a simple index file KANJD212.IND. The only lookup available with this file is a direct lookup via the Alt-F10 function, or entering the JIS code preceded with a "1", e.g. "13021". (ii) for more extended examination of the JIS212 kanji, a user can generate a .JDX index for the KANJD212 file, and treat it as a normal dictionary file. Thus by entering suitable keys, e.g. "B17", and possibly using the Ctrl-F2 filters, searches can be made for individual kanji. (b) for JREADER the function is much the same, except that the display of kanji following an "n" command is from the KANJD212 file. Both JDIC and JREADER can continue to operate without either the KANJD212 file of the JIS21216.FNT font file.