Tuesday, November 14, 2006

C++ note: be careful with ifstream

The following is the correct and actually standard way of reading something

file >> s;
cout << s;

However, the following is wrong:

cout << t ;

it will normally repeat the last char in the file. The reason is that file.good only changes after get() is called. This is the way

cout << t ;

file.get(t) will return success if the process works.

Wednesday, November 08, 2006

C++ note: TRACE macros

#ifdef DEBUG
#define TRACE(x) { cout << x << endl; cout.flush(); }
#define TRACE(x)

#ifdef DEBUG
#define RUN_ON_DEBUG(x) x
#define RUN_ON_DEBUG(x)

Tuesday, October 31, 2006

Python note: the value and the display of a unicode string

Sometimes it is necessary to view the unicode encoding of a special character or to see the display of a certain unicode code. Here is some example code.

print '\xe6\x8d\xae'.decode("utf-8")
<displays the unicode character>

print repr(<the unicode character here>)

Friday, September 29, 2006

wxPython note: about callbacks

Be careful about callbacks, for they sometimes bring unexpected logics. The process is not written explicitly in the code, and therefore may cause traps, just like aspect oriented programming. For example, I just wrote a callback from notebook page change. Then I used some manual calls to change the notebook pages. I did not notice that those manual calls also brought up callbacks, and therefore messing the logic. This problem is fixed by introducing a boolean variable bManual to indicate whether a page switch is done by pressing manually or by a function call. Whenever the notebook page is switched by function calls, it is marked with the variable.

Thinking of aspect oriented programming, sometimes a program is added with new functionalities and thus become larger all the time. Apart from using SVN to monitor the changes, it is also possible to use some special end of line comments to clarify. For example, some functionalities can be marked with # [-functionality-]. Then in further editing, we simply check the tags to find which code does the job. However, this approach should be used carefully so that all code and comments are consistent.

Thursday, September 28, 2006

Python note: module subprocess provides functionalities to open arbitrary file

To open an email attachment in some email client, the client needs to make new processes which opens the corresponding file. This can be done by using the subprocess module, which spawns new process and opens the file.

import subprocess
subprocess.Popen(filename, shell=True)

Sunday, September 03, 2006

C++ note: about strings

length - s.length() / s.size()
index - s[1] = 'x' / s[2]
slice - s.substr(1,4) means starting from 1, length 4.
append - s.append(t)
replace - s.replace(1,3, 'x') replaces slice (1,3)
erase - s.erase(1,2) same meaning
find - s.find("abc",2) find from index 2
reverse - reverse(s.begin(), s.end()) modifies strs.

It's alright concatenating strings with +
The constructor can also be s = string(t, start, length).

Monday, August 28, 2006

C++ note: int to string

import <string>
import <sstream>

using namespace std;

int main() {
...string s;
...int a=1;
...stringstream ss;
...ss << a;
ss >> s;

also, use atoi or strtol for string to int (cstdlib).
use s = c to assign directly from C string to string, and use s.c_str() to get c string from string

Saturday, July 08, 2006

Python note: __hash__

__hash__ provides the hash code for an object. It is used under two circumstances: when the object is put in a dictionary; or when hash() built-in function is called.

There is a requirement for this method - two objects should have the same hash when their __cmp__ returns zero (they are equal). Thus when a class does not provide __cmp__ member function, it should not provide __hash__.

Friday, July 07, 2006

VIM note: tabs

In VIM 7, tab view is supported.

To open a new tab, use :tabnew
To navigate among tabs, use 'gt'
To close a tab, use ':tabclose'

Tuesday, June 06, 2006

Python note: unicode example (switch utf to gb2312)

This is a script to switch files from unicode to gb2312

# gb2utf - switch encoding between text
# Yue Zhang 2006
import sys
iFile = open(sys.argv[1])
oFile = open(sys.argv[2], "w")
sLine = iFile.readline()
while sLine:
......uLine = sLine.decode("gb2312")
...except UnicodeDecodeError:
......sLine = iFile.readline()
...oFile.write(uLine.encode("utf8")) # note this.
...sLine = iFile.readline()

Monday, May 29, 2006

Python note: get your ip address

Here is a small platform independent script for you to get the IP address for the current machine.

import socket
print socket.gethostbyname(socket.gethostname())

Sunday, May 28, 2006

Python note: SimpleHTTPServer note

Special notice for SimpleHTTPServer:

1. Read more of the source code! The SimpleHTTPRequestHandler is initialised, does the job and then dissappears. To retain it code must be edited.

2. Notice that each HTTP request asks only one response. This is HTTP protocol, and more responses will cause problems.

Tuesday, May 23, 2006

C++ note: references vs pointers

Use references when the target is a fixed object:

a_class* const a_inst=&a; => a_class &a_inst=a;

It's ideal to use references when passing a parameter to a function, because they probably won't move. However, when using references for return values we must be careful that the original value is in the heap and not in the stack (local varaible). It's more natural to return a pointer.

Use pointers in the cases of iterating through a lot of objects. Actually, iteration is probably the best place to use pointers. Also when the address is needed explicitly pointer is the only choice.

Saturday, May 13, 2006

VIM note: abbreviations

Command :abbr (:ab) sets abbreviations for strings. For example, with

:abbr #a hello

You could use #a[ENTER] in the INSERT mode to write "hello". This is useful for writing comment.

In my python files, I usually use comment blocks like

# Function header

This could neatly be done by putting some abbr commands to the .vimrc file

1. You must use # plus one letter for abbreviation names
2. Under windows the vimrc file is _vimrc, and you must make an environment variable VIM.

Wednesday, May 10, 2006

C++ note: virtual inheritance

The format

public class C: virtual B {...}

The main reason for virtual inheritance is for the diamond structure - B extends D, C extends D and A extends B, C. Without virtual mark, the instances of A will include data slots defined in both B and C, while the data slots defined in B and C will each include the data slots from D .

Python's will always have a diamond structure, because it always uses references. Of course Java does not have such problems.

Python note: when you get a lot of instances

Some classes has quite a lot of instances, and it's important to reduce the memory consumption by these objects. I blog two ways of doing it by Python.

First is using __slots__. Define this in the class definition, with a sequence type (normally tuple, but never string). For example,

class C(object):
...__slots__ = "foo", "bar"

This will make the instances of the class only have two attributes "foo" and "bar". Methods are the same, and they just need to be defined in the normal way.

The reason that __slots__ might save memory is that it saves the need of making a dictionary object in every instance to store possible attributes. This works when there are a large number of instances.

The second way is the flyweight pattern. The idea of this pattern is reusing existing objects.

For example, an email client maintains many messages. Each mail could be tagged with "read", "flag" etc. One client might contain huge number of email instances viewed at a time, and it's wise to reuse certain property instances for each email.

Lastly, my view regarding patterns is that they are not something to be enjoyed as programming tips. Different problems must be solved in different ways, and applying a pattern blindly is of no good. However, reading some patterns could give me hints in problem solving. And a byproduct is that I would know what people in the Java world are talking about ;-)

Wednesday, May 03, 2006

Python note: a file merger

Problem: I've got two folders, containing images from two Cannon camera. They were taken the same day. Unfortunately, these two cameras gave the same names to their pictures. I wanted to merge these two folders, with more powerful functionalities.

Script: This script takes in two folders, moving files from one folder to the other one. When there are duplicate file names, it compares the files. If the files are really duplicated, it only keeps one copy. If the files are different in content, they are renamed to different names by adding postfixes.

Code: The following python source
# merge files - merge two folders with no duplicate files
g_sWelcome = """
merge_files - merge files from two directories into one.

The files from the "from" directory will be moved to the "to" directory, while
duplicated files will be removed. If two files are in the same name but are
different in signature (revision time, size), the new one will be renamed.

Author: Yue Zhang, 2006
import sys,os
import filecmp
import shutil
# Given a path, filename and extention, return a full path name without collision
def fileid_alloc(sPath, sPathFrom, sFileName, sExtension):
...global nDuplicateName, nDuplicateContent
...nIndex = 0
...sNewFileName = os.path.join(sPath, sFileName + sExtension)
...if os.path.exists(sNewFileName):
......nDuplicateName += 1
......if filecmp.cmp(sNewFileName, os.path.join(sPathFrom, sFileName + sExtension)):
.........nDuplicateContent += 1
.........return sNewFileName
...while os.path.exists(sNewFileName):
......nIndex += 1
......sNewFileName = os.path.join(sPath, sFileName + str(nIndex) + sExtension)
...return sNewFileName
# Main function
def merge_files(folder_from, folder_to):
...for sFullFileName in os.listdir(folder_from):
......sFileName, sExtension = os.path.splitext(sFullFileName)
......sNewFileName = fileid_alloc(folder_to, folder_from, sFileName, sExtension)
......shutil.move(os.path.join(folder_from, sFullFileName), sNewFileName)
# Main entry
if __name__ == '__main__':
...global nDuplicateName, nDuplicateContent
...nDuplicateName = 0
...nDuplicateContent = 0
...print g_sWelcome
...if len(sys.argv) != 3:
......print "Usage: merge_files.py folder_from folder_to"
...merge_files(sys.argv[1], sys.argv[2])
...print "In all %d duplicate file names processed, among which %d
............duplicate contents are merged and the rest are allocated new name."
............% (nDuplicateName, nDuplicateContent)

: I give no warranty of responsibilities of use of the code, though I have tested this code with my own photos.

Sunday, April 30, 2006

VIM note: switch dos format to unix format

Still the same problem as my last Python note, now I want to do it with vim.

This could be done by global pattern replacement. The command is:

The pattern needs to use literal mode: [Ctrl-v] key
In this case it's [Ctrl-v][CR], showed as ^M. The whole command will look like :%s/^M$//.

Wednesday, April 26, 2006

Python note: SunOS file format switching

When I opened boost-jam src under SunOS, I found that each line was tailed with ^M. This is chr(13), which means that the end-of-line character for these files are \0xd\0xa.

To remove \0xd in the end-of-line character and switch files to format which SunOS reads, I wrote such a Python program.

import os
for sFile in os.listdir("."):
  os.system("sed 's/%s//g' %s>TEMP && mv TEMP %s"
  % (chr(13), sFile, sFile))

's/pattern/replace/g' is the general expression for string substitution with UNIX.

Tuesday, March 14, 2006

Python note: "?:" in Python

In C++ and Java there is a handy way of writing conditional expression, such as x = y>1?1:0. However there is no ?: operator in Python.

Several ways have been suggested around this problem. Some people use "a and b or c", while others use "(a and [b] or [c])[0]". However I don't think them intuitive.

I was using dict for the expression, and to express x = y>1 ? 1 : 0, I type

x = {True : 1, False : 0}[y>1]

which is also concise.