I'm trying to filter a massive CSV export containing phone records. The file contains 97 columns and the rows depend on how many calls have been made. Current there are 838.239 rows. I have trouble with excel loading this much data so I've turned to linux.
Out of the 97 columns, I'm only interested in columns 13, 28 and 53. I have managed to extract the data using -
cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv >> filtered_CDR.CSV
I'm stuck on how to filter the rows.
Telephone ic_hg og_hg
111111111111 server03 slo.out
222222222222 HG_1 server02
333333333333 HG_1 server03
444444444444 Trunk server02
555555555555 Trunk server03
666666666666 server614 slo.out
777777777777 HG_1 server563
888888888888 server563 slo.out
999999999999 HG_2 server563
The only data I need is -
- Any telephone number
- ic_hg = HG_1 or HG_2 & og_hg = slo.out
Example -
222222222222 HG_1 slo.out
222222222222 HG_2 slo.out
any other combinations can be removed.
CodePudding user response:
You can easily do that with awk
:
awk '($2 == "HG_1" || $2 == "HG2") || $3 == "slo.out" { print $1 }' filtered_CDR.CSV
111111111111
222222222222
333333333333
666666666666
777777777777
888888888888
CodePudding user response:
Just pipe it through a grep
.
cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv |
grep -E 'HG_[12][[:space:]]*slo[.]out' >> filtered_CDR.CSV
or maybe an awk
$: awk -F, '$53 ~ /^slo.out$/ && $28 ~ /^HG_[12]$/{
print print $13"\t"$28"\t"$53}' cdr_export.csv >> filtered_CDR.CSV
I used tabs in the awk
output. YMMV.