Session 3. Putting all together: extracting useful information from files



1. Connecting commands: pipes


% command1 | command2 [pipe]
  
% command | more [reading long outputs]
% command | wc [long outputs size]




2. The GREP command:


% grep pattern file [regular expression search]
  
% command | grep "pattern" [filter condition]
% command | grep -v "pattern" [filter no condition]




3. The SORT command:


% sort file [sort alphabetically]
  
% sort -n file [sort numerically]
% sort -r file [reverse sort]
% sort +x file [sort by column x+1]
% sort file | uniq [remove repeated lines]
% sort file | uniq -c [line counts]




4. The JOIN command (sorted files):


% join file1 file2 [matches between files (column 1)]
  
% join -1 i -2 j file1 file2 [columns i ,j]
% join -v 1 file1 file2 [Lines from file1 not in file2]


Practice 3. Putting all together: extracting useful information from files



Type the following commands (tutorial):

  1. % cd

  2. % pwd

  3. % mkdir work3

  4. % cd work3

  5. % wc refGene.txt (number of genes)

  6. % grep "chr21" refGene.txt | wc (number of genes located in chr21)

  7. % grep "+" refGene.txt | wc (number of genes in positive strand)

  8. % grep "chr21" refGene.txt | sort +3n | more (chr21 genes sorted by position)

  9. % sort +7nr refGene.txt | more (genes sorted by number of exons)



Enrique Blanco © 2004 -- eblanco@imim.es