regular expression to select records that meet numerical conditions-CodePudding

I need to implement a regular expression that given the following sample dataset, allows me to select those student records that were born in a month of January and prior to the year 2000, and whose course grade is higher than 1.7. For example, the result of the command in the following sample should be the second record

id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-10-12,8.51,0,1
9274,2001-25-12,9.65,0,0

I've tried the following but I get no results after executing


grep -E "^*,1[0-9]{3}(-[0-9]{2}){2}*[10],(1\.[7-9][1-9])]"

Any idea of what's wrong?

CodePudding user response：

Using grep, you might write the full pattern as

grep -E '^[0-9] ,1[0-9]{3}-[0-9]{2}-0?1,(1\.(7[1-9][0-9]*|[89][0-9]*)|[2-9](\.[0-9] )?|10(\.0 )?),[0-9] ,[0-9] $' file

^ Start of string
[0-9] , Match 1 digits and , for the id
1[0-9]{3}- Match 1, 3 digits and - for the year
[0-9]{2}-0?1, Match 2 digits - and either 01 or 1 for the January
( Group for the alternatives
- 1\.(7[1-9][0-9]*|[89][0-9]*) Match 1.71-1.79 or 1.8 or 1.9 all followed by optional digits
- | Or
- [2-9](\.[0-9] )? Match 2-9 optionally followed by . and 1 digits
- | Or
- 10(\.0 )? Match 10 optionally followed by . and zeroes (assuming 10 is the highest grade)
), Close the group and match a comma
[0-9] ,[0-9] Match the last 2 column values, assuming 1 digits
$ End of string

See a regex demo and a bash demo.

CodePudding user response：

In your regex, I can't find the january part and the grade doesn't look correct. Here's a simpler one:

grep -E '1...-..-01,([2-9]\.)|(1\.[7-9][1-9])'

Explanation:

1...-..-01     year=1xxx, month=01
[2-9]\.|       grade 2-9 or
1.[7-9][1-9]   grade 1.7[1-9]

Assumes grade < 10 (can be changed easily).

awk is a bit more straightforward:

awk -F[,-] '$2<2000 && $4=="01" && $5 > 1.7'