AWK filter rows with valid numbers or empty-CodePudding

I try to use awk to filter some rows based on whether a given column contains an integer, floating point or empty string.

I have problems with the integers or the way to add them to the rule but avoiding adding weird expressions like 2.e.

INPUT:

220802,2249,3
220802,2250,37.5
220802,2251,2.e
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2255,3.a
220802,2256,123.25
220802,2257,1.32e-8
220802,2258,N/A

EXPECTED OUTPUT:

220802,2249,3
220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8

I try this, but fails:

awk -F , '$3 ~ /(^$|[0-9] \.?[0-9] e?[- ]?[0-9]? )/' INPUT

220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8

Also try this:

awk -F , '$3 ~ /(^$|[0-9] |[0-9] \.?[0-9] e?[- ]?[0-9]? )/'

220802,2249,3
220802,2250,37.5
220802,2251,2.e
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2255,3.a
220802,2256,123.25
220802,2257,1.32e-8

Any aproach to consider? Thanks in advance!

CodePudding user response：

for the shown samples :

mawk '$NF <=  $NF' FS=','

220802,2249,3
220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8

CodePudding user response：

You can easily check for empty.

You can check if something is a number by comparing it to itself plus zero.

So:

awk -F, '!$3 || (0 $3 == $3)' INPUT

The reason your regex didn't work is that you didn't anchor it. For example, [0-9] matches any substring. An alternative that may match what you intended is:

/^([0-9] (\.[0-9] )?(e?[- ]?[0-9] )?)?$/

CodePudding user response：

To use AWK,

awk -F , '$3 ~ /(^$|[0-9] \.?[0-9] e?[- ]?[0-9]? )|([0-9] $)/' numbers.csv

To use rquery (it's easy to understand the filters. https://github.com/fuyuncat/rquery/releases)

./rq -q "p d/,/ | s @raw | f @%=2 or isnull(@3)=1 or datatype(@3)='LONG' or datatype(@3)='DOUBLE'" numbers.csv
220802,2249,3
220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8