Yue Zhang - Techlog: 2011

Saturday, July 16, 2011

SRILM note: the use of MACHINE_TYPE for customized compilation

SRILM supports compilation for different architectures by the MACHINE_TYPE variable. By default, MACHINE_TYPE is defined automatically. However, for special purposes, such as 64 bit support, MACHINE_TYPE can be manually specified. For a particular machine type x (i.e. i686, i686-m64, i686-gcc4), the corresponding compiler options are defined in common/Makefile.machine.x. When running 'make MACHINE_TYPE=x', a separate folder x will be created under bin, lib, lm/obj, dstruct/obj and other folders. The fact can be taken advantage of when making a specific compilation of the code. For example, suppose that we want to compile a position independent version of srilm for sharing. We can do this by copying one machine specific makefile, such as Makefile.machine.i686, to Makefile.machine.foo. Then we can modify the compiler flag of foo, adding what we need. For this particular case, it is -fPIC. Then we compile srilm by 'make MACHINE_TYPE=foo'. The resulting objects, such as liboolm.a, will be placed at lib/foo. This is handy since it does not clash with existing libraries and binaries.

Friday, July 08, 2011

OpenOffice Note: selecting all footnotes

The current version of OpenOffice writer does not directly support the extracting of all footnotes, but there are ways to get around this. First, footnotes can be selected by Edit | Find & Replace ..., and then choosing More Options, and ticking Search for Styles below. Choose Footnote as the style in the drop down list above, then press the button Find All. All foot notes will be highlighted, and then can be copied into clipboard. Second, when trying to paste the selected footnotes into a target file, it turns out that all new lines after footnotes are gone. This can be solved by the following trick.

(1) In the original document from which footnotes are extracted, select all footnotes in the aforementioned way.

(2) (Assuming that each footnote ends with a period) remove trailing space characters (if any) by using Find and Replace, ticking Current Selection Only and Regular Expression in More Options. Find all patterns: [ ]+$ (note the space between [ and ]) and replacing them with an empty string (by emptying textbox).

(3) Again select all footnotes in that document, and then use the same method is the previous step to replace pattern \.$ with .ENDOFPARAGRAPH. The special word ENDOFPARAGRAPH is used as a placeholder for newline.

(4) Use the method mentioned before to select all footnotes again now, and copy them and paste them into a new document.

(5) from the new document, use Find and Replace to replace all ENDOFPARAGRAH with \n, ticking Find Regular Expression under MoreOptions.

That will do the extraction. Use an Undo step to undo the insertion of placeholder words in the original document. Or simply replace them again with ''.

Saturday, June 18, 2011

C++ note: commas and brackets in a macro

Sometimes it's handy to use a macro to avoid repeated writing of similar code, especially when there is no direct way to modularize some apparently similar but essentially very different functions. In such uses of macros it often occurs that commas and brackets need to be included in the macro. Direct inclusion of commas and brackets may confuse the compiler.


#define example_macros(left_code, right_code)\
left_code a right_code\
left_code b right_code\
left_code c right_code\
...
left_code z right_code

void func_1() {
   example_macros( my_module1.call( , ); );
}

void func_2() {
   example_macros( cout<< , <<(endl); );
}

When the usage become more complex, particularly when there are more arguments to macro, commas and brackets may confuse the macro. The bets solution I have found is using extra macros.


#define id_comma ,
#define id_left_bracket (
#define id_right_bracket )

and then replace the usage of commas and brackets in macros


void func_1() {
   example_macros( my_module1.call id_left_bracket , id_right_brackt ; );
}

This will avoid any confusion.

Wednesday, March 02, 2011

Bash note: get the path for the running script

The following expression returns the path the currently running script is from (note that the script might not be from pwd):

$(cd `dirname $0` && pwd)

Thursday, January 20, 2011

Python note: breaking a long statement to add end of line comments

There is a very simple tip for breaking a long line in order to add end comments. Python doesn't recognize arbitrary broken lines, but it allows lines to be broken if the line is inside a bracket so it is unambiguous. The following code


if condition1 and # comment1
   condition2: # comment2

would not work because a line is broken in the if statement, however,


if (condition1 and #comment1
    condition2): # comment2

would fix the problem and maintain neat code.

Python note: set equality test

Given a class with an __eq__ function, would a python set contain only members that are equal? The answer is not necessarily.

Consider the following class:


class C(object):
   def __init__(self, x):
      self.a = x
   def __eq__(self, o):
      return self.a == o.a
   def __str__(self):
      return str(self.a)

It defines a member which controls its equality. However, the following code will add four members to the set


s = set()
a = C(1)
b = C(2)
c = C(3)
s.add(a)
s.add(b)
s.add(c)
s.add(C(1))

The main reason is that set is implemented as a hash map, and without a hash function defined in the class C, python will use the object identity itself for hashing, and members of the class will be hashed into different places. The following code will fix the problem.


class C(object):
   def __init__(self, x):
      self.a = x
   def __eq__(self, o):
      return self.a == o.a
   def __str__(self):
      return str(self.a)
   def __hash__(self):
      return self.a

Tuesday, January 18, 2011

grep note: find all words that contains a particular substring

I used:
grep -h substring path/*.txt | tr ' ' '\n' | grep substring

grep substring path/*.txt will list the files plus the matching lines.

grep -h substring path/*.txt will remove the names of the matching files, leaving only lines.

grep -h substring path/*.txt | tr ' ' '\n' will translate the matching lines into a word-per-line form.

grep -h substring path/*.txt | tr ' ' '\n' | grep substring will then filter the words that do not contain the pattern from the lines.

I also wanted a list of unique words that contained the substring, and therefore piped the previous command into sort and uniq.

Yue Zhang - Techlog