Home > Net >  Filtering data from csv file
Filtering data from csv file

Time:09-14

I'm trying to filter a massive CSV export containing phone records. The file contains 97 columns and the rows depend on how many calls have been made. Current there are 838.239 rows. I have trouble with excel loading this much data so I've turned to linux.

Out of the 97 columns, I'm only interested in columns 13, 28 and 53. I have managed to extract the data using -

cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv >> filtered_CDR.CSV

I'm stuck on how to filter the rows.

Telephone       ic_hg       og_hg
111111111111    server03    slo.out
222222222222    HG_1        server02
333333333333    HG_1        server03
444444444444    Trunk       server02
555555555555    Trunk       server03
666666666666    server614   slo.out
777777777777    HG_1        server563
888888888888    server563   slo.out
999999999999    HG_2        server563

The only data I need is -

  • Any telephone number
  • ic_hg = HG_1 or HG_2 & og_hg = slo.out

Example -

222222222222 HG_1 slo.out

222222222222 HG_2 slo.out

any other combinations can be removed.

CodePudding user response:

You can easily do that with awk:

awk '($2 == "HG_1" || $2 == "HG2") || $3 == "slo.out" { print $1 }' filtered_CDR.CSV
111111111111
222222222222
333333333333
666666666666
777777777777
888888888888

CodePudding user response:

Just pipe it through a grep.

cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv |
  grep -E 'HG_[12][[:space:]]*slo[.]out' >> filtered_CDR.CSV

or maybe an awk

$: awk -F, '$53 ~ /^slo.out$/ && $28 ~ /^HG_[12]$/{ 
     print print $13"\t"$28"\t"$53}' cdr_export.csv >> filtered_CDR.CSV

I used tabs in the awk output. YMMV.

  • Related