I just wrote this script to convert traditional Chinese text to simplified Chinese. Since the relationship between traditional and simplified characters is many to one, I haven't decided to write the revert convertion script.
It has been tested with my files and can be downloaded here, and please report bugs and suggestions if you found any.
The package contains two files, simplify.py and utftable.txt. The python script is the converter and utftable.txt is the character table. The two files must be put into the same directory.
Usage:
python simplify.py input.txt >output.txt
Both the input and the output text files must be in UTF8.
Note that you can replace the character relationship table file with your own file (the new file must be in the same format as the original file), just in case there are more comprehensive tables than this one.
Thursday, March 20, 2008
Subscribe to:
Post Comments (Atom)
3 comments:
This is an excellent tool, thank you very much!
Do you know who made the utf table?
The UTF table is from the web. I should have acknowledged the original source, but couldn't remember it very clearly. I remember that I had consulted some linux C code and some perl code by Googling. I think the table is more likely to come from a perl script, but I might have added relations. I believe that the original source is open to use, but sorry for forgetting where it is from!
Thanks for shearing this informative blog about Simplified Chinese Translation
Post a Comment