Tuesday, January 18, 2011

grep note: find all words that contains a particular substring

I used:
grep -h substring path/*.txt | tr ' ' '\n' | grep substring

grep substring path/*.txt
will list the files plus the matching lines.

grep -h substring path/*.txt
will remove the names of the matching files, leaving only lines.

grep -h substring path/*.txt | tr ' ' '\n' will translate the matching lines into a word-per-line form.

grep -h substring path/*.txt | tr ' ' '\n' | grep substring will then filter the words that do not contain the pattern from the lines.

I also wanted a list of unique words that contained the substring, and therefore piped the previous command into sort and uniq.

No comments: