I am working with a large csv in a linux shell that I narrowed down to 3 columns:
Species name, Latitude, and Longitude.
awk -F "\t" '{print $10,","$22,",",$23}' occurance.csv > three_col.csv
The file ends up looking like this:
species | Lat | Long |
----------------------|---------|---------
Leucoraja erinacea | 41.0748 | 72.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
Paralichthys dentatus | | 73.2354|
Paralichthys dentatus | | |
Leucoraja erinacea | 41.0748 | |
Brevoortia tyrannus | | |
Brevoortia tyrannus | | |
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
However this is what I want it to Look: Notice all species with no lat or long data have been removed
species | Lat | Long |
----------------------|---------|---------
Leucoraja erinacea | 41.0748 | 72.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
I've been trying to remove rows that are lacking either Lat or Long data. Using a line like this:
awk -F "\t" BEGIN '{print $1,$2,$3}' END '{$2!=" " && $3!= " " }' three_col.csv > del_blanks.csv
but it results in this error even with small changes that I make trying to solve the problem
awk: line 1: syntax error at or near end of line
How can I get rid of these rows with missing data, is this something I need a "for" loop for?
CodePudding user response:
Since I don't know what your occurance.csv file looks like, this is a shot in the dark:
awk -F "\t" '$22 && $23 {print $10,","$22,",",$23}' occurance.csv > three_col.csv
The expression $22 && $23
says: Both field 22 and field 23 must not be blank. It is a condition to filter out those lines which don't qualify. It is a shorthand for $22 != "" && $3 != ""
.
CodePudding user response:
perhaps something like this ?
mawk '($!NF=$10","$22","$23)!~",,$"' FS='\t' OFS=','
You already know only fields 10/22/23
needs to be printed, so you can first overwrite $0
with those just 3 columns, already-split by OFS
afterwards simply use a quick regex check, since 2 consecutive OFS
at the tail is the sign $22
and $23
are empty - saving the print statement and pattern-action blocks.