Yue Zhang - Techlog

Saturday, June 18, 2011

C++ note: commas and brackets in a macro

Sometimes it's handy to use a macro to avoid repeated writing of similar code, especially when there is no direct way to modularize some apparently similar but essentially very different functions. In such uses of macros it often occurs that commas and brackets need to be included in the macro. Direct inclusion of commas and brackets may confuse the compiler.


#define example_macros(left_code, right_code)\
left_code a right_code\
left_code b right_code\
left_code c right_code\
...
left_code z right_code

void func_1() {
   example_macros( my_module1.call( , ); );
}

void func_2() {
   example_macros( cout<< , <<(endl); );
}

When the usage become more complex, particularly when there are more arguments to macro, commas and brackets may confuse the macro. The bets solution I have found is using extra macros.


#define id_comma ,
#define id_left_bracket (
#define id_right_bracket )

and then replace the usage of commas and brackets in macros


void func_1() {
   example_macros( my_module1.call id_left_bracket , id_right_brackt ; );
}

This will avoid any confusion.

Wednesday, March 02, 2011

Bash note: get the path for the running script

The following expression returns the path the currently running script is from (note that the script might not be from pwd):

$(cd `dirname $0` && pwd)

Thursday, January 20, 2011

Python note: breaking a long statement to add end of line comments

There is a very simple tip for breaking a long line in order to add end comments. Python doesn't recognize arbitrary broken lines, but it allows lines to be broken if the line is inside a bracket so it is unambiguous. The following code


if condition1 and # comment1
   condition2: # comment2

would not work because a line is broken in the if statement, however,


if (condition1 and #comment1
    condition2): # comment2

would fix the problem and maintain neat code.

Python note: set equality test

Given a class with an __eq__ function, would a python set contain only members that are equal? The answer is not necessarily.

Consider the following class:


class C(object):
   def __init__(self, x):
      self.a = x
   def __eq__(self, o):
      return self.a == o.a
   def __str__(self):
      return str(self.a)

It defines a member which controls its equality. However, the following code will add four members to the set


s = set()
a = C(1)
b = C(2)
c = C(3)
s.add(a)
s.add(b)
s.add(c)
s.add(C(1))

The main reason is that set is implemented as a hash map, and without a hash function defined in the class C, python will use the object identity itself for hashing, and members of the class will be hashed into different places. The following code will fix the problem.


class C(object):
   def __init__(self, x):
      self.a = x
   def __eq__(self, o):
      return self.a == o.a
   def __str__(self):
      return str(self.a)
   def __hash__(self):
      return self.a

Tuesday, January 18, 2011

grep note: find all words that contains a particular substring

I used:
grep -h substring path/*.txt | tr ' ' '\n' | grep substring

grep substring path/*.txt will list the files plus the matching lines.

grep -h substring path/*.txt will remove the names of the matching files, leaving only lines.

grep -h substring path/*.txt | tr ' ' '\n' will translate the matching lines into a word-per-line form.

grep -h substring path/*.txt | tr ' ' '\n' | grep substring will then filter the words that do not contain the pattern from the lines.

I also wanted a list of unique words that contained the substring, and therefore piped the previous command into sort and uniq.

Tuesday, November 09, 2010

C++ note: pure virtual method called

The first thing to check is whether virtual functions are called from constructor.
http://www.artima.com/cppsource/pure_virtual.html

Saturday, October 30, 2010

C++ note: static variable from inline member function

I was thinking about Meyers singleton, and wonder how compilers will handle the same static variable defined in difference modules due to inline function expansion. Here are some relevant discussions (it occurs safe not to use the kind of singleton(?)).

http://objectmix.com/c/250462-static-local-variable-inline-static-member-function.html

Wednesday, October 13, 2010

Sh note: adding a line at the front of files

The example script adds a new line to the beginning of each file in the directory, recursively, except those files that contain svn.

for file in `find . -type f | grep -v 'svn'`;
do
sed '1i\New line' -i $file
done

Tuesday, October 12, 2010

Sh note: reversing a file or a line

rev can be used to reverse a line character by character
tac can be used to reverse a file line by line
these two commands are sometimes handy

sh$ rev
abc
cba

sh$ tac
a
b
c^D
c
b
a

echo abcde | rev

a particular context I used tac
ls /corpora/extra/nivre/corpus/conll-x/data/turkish/metu_sabanci/*/*.conll | cut -d '/' -f8-11 | tac

Wednesday, March 17, 2010

C++ note: enabling and disabling assert

In C++, assert is by default compiled. To exclude it, the macro NDEBUG must be defined when assert.h is included. This can be done by using -DNDEBUG compiler option or #define NDEBUG in code.

gcc -DNDEBUG -c test.cpp -o test.o

Monday, November 09, 2009

LaTeX note: two columns

To make a new page with two columns:

\begin{twocolumns}
...
\end{twocolumns}

Two columnns at the same page

\begin{multicols}{2}
...
\end{multicols}

Monday, August 24, 2009

Linux note: recursive replacement

This is a script to recursively do pattern replacing with dir

-----------------------------------
if [ $# -ne 3 ]
then
echo 'Usage: replace.sh dir pattern1 pattern2'
exit 85
fi

for f in `grep -rl $2 $1`;
do
sed "s/$2/$3/g" -i $f
done
--------------------------------------

$1 is the directory, $2 is the pattern to be replaced and $3 is the new pattern. This script iterates through all files in directory. Filtering can be done by changing the for loop into a detailed list of files.

Tuesday, April 21, 2009

C++ note: using a functor to help organise a vector of pointers

Suppose that some large structures need to be managed by using a heap. There is no limitation to the number of items, and therefore a vector is used to maintain the memory. Because they are organised in a heap, direct heap operation on the memory vector is slow. Therefore, another vector is used to store the indice.

template<cnode> myClass{
vector<cnode> memory;
vector<unsigned> indice;
}

Now the indice vector must store indice (unsigned) rather than pointers (CNode*). This is because when the memory vector is reallocated, the original pointers to the memory are invalid. Using pointer vectors will lead to segmentation-fault.

Pushing and popping heap operations can now be carried out on the indice. They should rely on node comparison. The standard node comparison function should read like:

// in myClass
bool less(const unsigned x, const unsigned y) { return memory[x] < memory[y]; }

The above code does not compile, because push_back and pop_back in std require a function with the signature less(x, y). A member function doesn't work. A solution is the functor.

struct CNodeLess {
const vector<CNode> *memory;
CNodeLess(const vector<CNode> &bm): memory(&bm) {}
bool operator()(const unsigned &x, const unsigned &y) { return (memory[x] < memory[y]); }

Now the class can have a member in CNodeLess that caches the memory, and provide the less(x,y) signature.

Thursday, December 11, 2008

Linux note: grep command

To grep multiple patterns from a file

grep -E "patternone|patterntwo" file

The option -E is to enable extended posix regular-expression (|). The quotation marks are to prevent the shell from taking | for pipe.

Friday, September 26, 2008

Java note: toarray

Because of the implementation of containers, the contained type cannot be instantiated within them. An inconvenient consequence is toArray. There are two ways to call it

1.
Object arr[] = list.toArray();

But it's not easy to change Object[] to Type[].

2.
Type arr[] = new Type[m];
list.toArray(arr);

But it would be much better not to have to allocate arr separately.

The next version of Java should really reconsider the implementation of templates.

Wednesday, September 17, 2008

Java note: generics

In Java 1.5 (and 1.6), generics is used by containers to avoid runtime type cast. Generics are not implemented in the same way as C++ templates, and are much less flexible. A generic class is compiled into only one class file, instead of one for each template instance. A big disadvantage is that a generic container cannot make new instances of its elements.

Java note: assertion

The assert statement in Java are ignored from compile by default. To enable assert, add -ea when compiling.

javac -ea example.java

Thursday, July 10, 2008

Python note: swapping objects

The best way to swap two objects is:

a, b = b, a

It swaps the names of two object by the use of a tuple, without altering the objects.

The way to swap objects in many languages involves copying. However, this method can be tricky with Python. This is particularly because object assignment in Python does not copy objects. It simply links a variable name to the existing object. Suppose

class X(object):
__init__(self, x)
self.x = x

a = X([1, 2, 3])
b = X([])

Then we want this:

b = a
a.x = []

The above will not work, because b.x will become [] too. To avoid copying, use

a, b = b, a
a.x = []

In another occasion, if a will be kept while b becomes a copy of a's value, define a copying function or use the copy module.

With the new-style class, everything is an object. So the rule for object assignment applies to lists, dicts etc. To make a copy of the original list, use l2 = l1[:].

Monday, July 07, 2008

Python note: functional programming saves code when dealing with lists

Another note on using functional programming. It makes code with lists shorter and clearer sometimes.

For example, there is a list l. We want to add one to each element:

l = map(lambda x: x+1, l)

LIBSVM note: problems

1. Why does svm-scale run forever, while keeping writing the output file (.scale)?

It might be because the original training file contains [0, 1] range features, but the scale output requires [-1, 1] range. This is the default option. Add the option -l 0.

2. Why does svm-train run forever?

It might be because of the epsilon value. The default parameter (-e 0.001) sets this value. The smaller the value is, the more accurate will the trained model be, but the more iterations will be taken. Consider setting epsilon to 1 and try.

If there are a lot of features, consider trying LIBLINEAR instead. It does not use a kernel, but runs faster than LIBSVM for a linear model.

Another param, -m, sets the memory cache. Make it as large as possible within RAM.