Sunday, March 30, 2008

Python note: the difference between __getattr__ and __getattribute__

I tried to overload __setattr__ today, so that when a value is assigned to item.type, I can check whether it is in the set of possible choices. However, I made a mistake by defining a __setattribute__ function instead of __setattr__.

There is no special function named __setattribute__. The only functions that intercepts attribute accesses in Python are __getattr__, __getattribute__ and __setattr__. The difference between __getattr__ and __getattribute is that, __getattr__ is called when the attribute is not in the object's dictionary, while __getatttribute__ is called whenever the attribute is accessed. Therefore, __getattribute__ will make the speed slower. __setattr__ is the same as __getattribute__ in the triggering mechanism -- it intercepts the assignment operation no matter the attribute to modify already exists or not.

Another note about __getattr__ is that the overloaded method must raise attribute error itself or the program may run into unexpected output. For example, there is an overloaded __getattr__ method:

class Foo(object):
   def __getattr__(self, attr):
       if attr == "bar":
          return "bar"

foo = Foo()

Now when we try to see foo.barrrr which doesn't exist, we get None value instead of a thrown attribute error. The code should be corrected into:

class Foo(object):
    def __getattr__(self, attr):
       if attr == "bar":
          return "bar"
       else:
          raise AttributeError, attr

Thursday, March 20, 2008

Python tool: traditional to simplified Chinese converter

I just wrote this script to convert traditional Chinese text to simplified Chinese. Since the relationship between traditional and simplified characters is many to one, I haven't decided to write the revert convertion script.

It has been tested with my files and can be downloaded here, and please report bugs and suggestions if you found any.

The package contains two files, simplify.py and utftable.txt. The python script is the converter and utftable.txt is the character table. The two files must be put into the same directory.

Usage:
python simplify.py input.txt >output.txt

Both the input and the output text files must be in UTF8.

Note that you can replace the character relationship table file with your own file (the new file must be in the same format as the original file), just in case there are more comprehensive tables than this one.

Tuesday, March 18, 2008

sqlite note: using the command line tool

The command line tool sqlite3 can be used to view the content of a database. One way of using it is typing in "sqlite3 FILE" and the database contained in FILE is opened for query commands.

The command line tool sqlite3 can also be used to perform a query directly. For example, typing "sqlite3 FILE 'select * from Table1'" will print out all contents in table Table1. This is handy for showing large tables, because we can pipe the output into a reader tool. "sqlite3 FILE 'select * from Table1' | more".

Monday, March 10, 2008

C++ note: static_cast from a reference to a value

By default, the result of a static_cast is a r-value. It can't be used as a l-value, and thus can't be given a new value.

Such misuse will lead to compile errors. But the report from the compiler can be misleading or quite hard to understand. For example, suppose we have a base class Base and a derived class Derived from Base. We want to overload the istream >> operator for Derived. The following way to overloading the operator does not work:

istream & operator >> (istream &is, Derived &derived) {
// special processing
is >> static_cast<Base>(derived);
}

This is because the result for casting will be passed as a reference function parameter. The reported error from the compiler has nothing to do with cast, however, and it simply sais that there is no match for operator >> from ...

It should be noticed that even if the cast is done for pointers, the results are still r-values.

To avoid the above problem, use static_cast<Base&>.