KANJD212

A Database of Information on the 5,801 Kanji in the JIS X 0212 Standard.

Copyright (C) 2003 The Electronic Dictionary Research and Development Group, Monash University.

INTRODUCTION

The KANJD212 file is a comprehensive collection of information about the 5,801 kanji in the JIS X 0212-1990 supplementary character set.

The basic philosophy and format of the KANJD212 file is the same as that in the original KANJIDIC file, which covers the kanji in the JIS X 0208-1990 set, and users are referred to the KANJIDIC.DOC file for further information. This document only covers the unique aspects of the KANJD212 file.

The KANJIDIC file is made available under the terms of the GNU Public Licence, a copy of which is appended to this document. As the GPL really refers to software, and refers to source code, etc. a specific copyright statement based on the wording from the GNU Emacs Manual is applied instead.

FORMAT

As in the the case of the KANJIDIC file, the KANJD212 contains a mixture of ASCII characters and kana/kanji encoded in the EUC (Extended Unix Code) coding. The first three bytes of each record are the kanji itself, coded in the EUC-3 method, in which the first byte is 0x8F. This is the main method of encoding the JIS X 0212 characters in EUC. The remainder of the fields in each record are in the same coding and format as in the KANJIDIC file.

CONTENTS

At the time of its current release, the KANJD212 file contains the following information about each kanji:

Further information is being added as it come available.

COMMENTS

Note that the bushu in this file are the "classical" versions, not the revised Nelson versions used in the KANJIDIC file.

COMPILATION

The KANJD212 file has been compiled from a number of sources:

The compilation of the KANJD212 has really be made possible by the activities of a group of people collecting material for a Unicode dictionary, and the assembly of the file is largely a by-product of that activity. The group is led by Jack Halpern, and I would like to express my appreciation to Jack and the other members of the group, in particular Koichi Yasuoka, Ken Lunde, Martin Du"rst and Christian Wittern.

Jim Breen
(jwb@csse.monash.edu.au)
School of Computer Science & Software Engineering
Monash University, Victoria, Australia

APPENDIX A.

USING THE JIS X 0212-1990 CHARACTER SET

[Some comments by Jim Breen.]

Usage of the characters in the Supplementary Kanji set (which I will refer to here as JIS212 for brevity,) is not wide-spread. In fact, regular access to these kanji is unlikely to become a feature of Japanese text processing until the rather larger set of characters in the CJK portion of the ISO 10646/Unicode set is implemented widely. Why?, well they are pretty obscure kanji, and it is very rare to find an occasion in normal Japanese text-handling when the 6,355 characters in the JIS X 0208-1990 set will not suffice.

The only specific JIS212 font file I am aware of is the "jisksp16.bdf", prepared by Koichi Yasuoka and others. My copy is Version 0.9 and is dated April 25, 1995 (ANZAC Day!). I have used this file to create the JIS21216.FNT binary bitmap file for use with DOS applications.

As far as platforms supporting the JIS212 kanji, I have been able to ascertain as follows:

A. UNIX

  1. MULE

    The MULE (Multilingual Emacs) system supports JIS212 codes. I am not an emacs user, so I cannot comment much further. (I have been in touch with Taichi Kawabata who has been working on the bushu/stroke count files which are to be used in the formal merger of mule with (n)emacs. The data files for these are derived from KANJIDIC & KANJD212.)

  2. kterm

    The kterm (kanji xterm) can be made to support JIS212 by applying a patch file "kterm-6.1.0-6.1.0.wd2.patch" to the X11R6 version of kterm. I have successfully installed this within X11R5 under both Linux and DEC Ultrix. This kterm version supports both JIS208 and JIS212 in JIS (ISO 2022) and EUC codings, but does not support Shift-JIS at all. (I also encountered a bug in the patch which I was able to fix, and the fix is now part of the formal patch.)

  3. WNN

    I presume from the existence of the "hojo_wnn.src" file, which contains over 1,000 WNN henkan file entries of mappings into compounds of JIS212 kanji, that there are versions of WNN which support JIS212. I have never got around to investigating them.

  4. jstevie

    I am an old `vi' user, and desperately wanted to be able to edit files containing JIS212 kanji using a vi-like editor. As I use jstevie under both Linux and Ultrix, I asked Junn Ohta, who did much of the kanjification of stevie, for some suggestions, and following his advice, I produced a version of jstevie which will edit EUC files containing JIS212 (and JIS208) kanji. Of course it will not handle Shift-JIS, and I haven't got around to doing a JIS version, although only a few changes are needed for this. Of course, it needs the special version of kterm.

    I'll get around to releasing this version someday, but if anyone wants a copy to try, please email me.

  5. xjdic

    Of course, I needed to be able to use my own dictionary system with JIS212 kanji, so I have extended the xjdic and xjdxgen software to handle files with JIS212 kanji. This was released as part of V2.2. In this version, the file was treated as another dictionary file.

    In 1998 V2.3 of xjdic was released, with support for the JIS X 0212 kanji as part of the main kanji dictionary file. In this mode, KANJIDIC and KANJD212 are merged to operate as a single file.

  6. yudit

    Gaspar Sinai's "yudit" editor handles JIS X 0212 kanji quite successfully, as ca be expected because it uses Unicode internally.

B. MacIntosh

I am not aware of any software which handles JIS212 kanji on Macs. Bear in mind that the JLK and Kanjitalk software uses Shift-JIS, which cannot support JIS212.

C. Windows.

As for Macs. (I have suggested to Stephen Chung that he extend JWP to support JIS212 kanji. JWP and WinJDic are among the very few pieces of Windows software that use EUC coding internally.)

D. DOS.

The Japanese version of DOS (DOS/V), and the NEC PC-9800 versions all use Shift-JIS, and hence cannot support JIS212.

I have modified my JREADER text reader program to support files with JIS212 kanji. It can also reference kanji in the KANJD212 file. This was released as part of V2.6. At the same time JDIC was extended to handle JIS212 kanji, however it does this by treating the KANJD212 file as one of its dictionary files, not as part of the kanji database.

E. WWW Servers

As far as I know, the only WWW server supporting the JIS212 kanji is my own WWWJDIC dictionary server. It supports a combined JIS208 and JIS212 in the same way as xjdic (above.) Initially I just treated the JIS212 kanji the same as the JIS208, as Netscape on a Sun at Monash (where I have the JIS212 fonts installed in the X11 server) displays these correctly. However as most users do not have this capability, I created 5,801 transparent .gif images of the JIS212 kanji, and send them out instead.

F. WWW Browsers

As described above, Netscape on a Unix system satisfactorily displays the JIS212kanji, provided the fonts are insalled in the X11 font server. Similarly, I have been able to displaythese kanji using the "lynx" text browser.

APPENDIX B.

KANJD212 COPYRIGHT STATEMENT

In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group at Monash University.

Information about the formal usage arrangement for KANJD212 can be found on the Group's WWW page.

In summary, KANJD212 can be freely used provided satisfactory acknowledgement is made, and a number of other conditions are met.