Thursday, January 20, 2011

Python note: breaking a long statement to add end of line comments

There is a very simple tip for breaking a long line in order to add end comments. Python doesn't recognize arbitrary broken lines, but it allows lines to be broken if the line is inside a bracket so it is unambiguous. The following code

if condition1 and # comment1
condition2: # comment2

would not work because a line is broken in the if statement, however,

if (condition1 and #comment1
condition2): # comment2

would fix the problem and maintain neat code.

Python note: set equality test

Given a class with an __eq__ function, would a python set contain only members that are equal? The answer is not necessarily.

Consider the following class:

class C(object):
def __init__(self, x):
self.a = x
def __eq__(self, o):
return self.a == o.a
def __str__(self):
return str(self.a)

It defines a member which controls its equality. However, the following code will add four members to the set

s = set()
a = C(1)
b = C(2)
c = C(3)
s.add(a)
s.add(b)
s.add(c)
s.add(C(1))

The main reason is that set is implemented as a hash map, and without a hash function defined in the class C, python will use the object identity itself for hashing, and members of the class will be hashed into different places. The following code will fix the problem.

class C(object):
def __init__(self, x):
self.a = x
def __eq__(self, o):
return self.a == o.a
def __str__(self):
return str(self.a)
def __hash__(self):
return self.a

Tuesday, January 18, 2011

grep note: find all words that contains a particular substring

I used:
grep -h substring path/*.txt | tr ' ' '\n' | grep substring

grep substring path/*.txt
will list the files plus the matching lines.

grep -h substring path/*.txt
will remove the names of the matching files, leaving only lines.

grep -h substring path/*.txt | tr ' ' '\n' will translate the matching lines into a word-per-line form.

grep -h substring path/*.txt | tr ' ' '\n' | grep substring will then filter the words that do not contain the pattern from the lines.

I also wanted a list of unique words that contained the substring, and therefore piped the previous command into sort and uniq.