Thursday, March 20, 2008

Python tool: traditional to simplified Chinese converter

I just wrote this script to convert traditional Chinese text to simplified Chinese. Since the relationship between traditional and simplified characters is many to one, I haven't decided to write the revert convertion script.

It has been tested with my files and can be downloaded here, and please report bugs and suggestions if you found any.

The package contains two files, simplify.py and utftable.txt. The python script is the converter and utftable.txt is the character table. The two files must be put into the same directory.

Usage:
python simplify.py input.txt >output.txt

Both the input and the output text files must be in UTF8.

Note that you can replace the character relationship table file with your own file (the new file must be in the same format as the original file), just in case there are more comprehensive tables than this one.

3 comments:

faceleg said...

This is an excellent tool, thank you very much!

Do you know who made the utf table?

Yue Zhang said...

The UTF table is from the web. I should have acknowledged the original source, but couldn't remember it very clearly. I remember that I had consulted some linux C code and some perl code by Googling. I think the table is more likely to come from a perl script, but I might have added relations. I believe that the original source is open to use, but sorry for forgetting where it is from!

Unknown said...

Thanks for shearing this informative blog about Simplified Chinese Translation