## This is the October 10, 2008 Version of the Tanaka Corpus ## Edited slightly and TAB delimited by Charles Kelly, April 22, 2010. ## ## The Original URL for the Tanaka Corpus: ## http://www.csse.monash.edu.au/~jwb/tanakacorpus.html) ## I downloaded the following file on April 22, 2010. ## ftp://ftp.monash.edu.au/pub/nihongo/examples_pd.gz ## This was the last public domain version from Jim Breen's website. ## (This version of my editing is also public domain.) ## ## Then I cleaned it up and edited it as follows: ## ## 1. Commented out items were removed. (The REJECTS). ## 2. The 8 lines with non-EUC characters were removed. ## 3. The data was tabbed for easy import into database programs, ## 4. The data has been sorted by the English. ## ## The numbers are ID numbers that were in the original file. ## The numbers used by the Tatoeba project no longer match these. ## ## This is the format of the data. ## ID NUMBER + TAB + ENGLISH + JAPANESE + SEARCH TAGS