Home > Software design >  How to remove rows from a CSV with no data using AWK
How to remove rows from a CSV with no data using AWK

Time:04-16

I am working with a large csv in a linux shell that I narrowed down to 3 columns:
Species name, Latitude, and Longitude.

awk -F "\t" '{print $10,","$22,",",$23}' occurance.csv > three_col.csv


The file ends up looking like this:

      species         | Lat     | Long   |
----------------------|---------|---------
Leucoraja erinacea    | 41.0748 | 72.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|
Paralichthys dentatus |         | 73.2354|
Paralichthys dentatus |         |        |
Leucoraja erinacea    | 41.0748 |        |
Brevoortia tyrannus   |         |        |
Brevoortia tyrannus   |         |        |
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|

However this is what I want it to Look: Notice all species with no lat or long data have been removed

      species         | Lat     | Long   |
----------------------|---------|---------
Leucoraja erinacea    | 41.0748 | 72.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus   | 39.0748 | 70.9461|

I've been trying to remove rows that are lacking either Lat or Long data. Using a line like this:

awk -F "\t" BEGIN '{print $1,$2,$3}' END '{$2!=" " && $3!= " " }' three_col.csv > del_blanks.csv

but it results in this error even with small changes that I make trying to solve the problem

awk: line 1: syntax error at or near end of line

How can I get rid of these rows with missing data, is this something I need a "for" loop for?

CodePudding user response:

Since I don't know what your occurance.csv file looks like, this is a shot in the dark:

awk -F "\t" '$22 && $23 {print $10,","$22,",",$23}' occurance.csv > three_col.csv

The expression $22 && $23 says: Both field 22 and field 23 must not be blank. It is a condition to filter out those lines which don't qualify. It is a shorthand for $22 != "" && $3 != "".

CodePudding user response:

perhaps something like this ?

 mawk '($!NF=$10","$22","$23)!~",,$"' FS='\t' OFS=','

You already know only fields 10/22/23 needs to be printed, so you can first overwrite $0 with those just 3 columns, already-split by OFS

afterwards simply use a quick regex check, since 2 consecutive OFS at the tail is the sign $22 and $23 are empty - saving the print statement and pattern-action blocks.

  • Related