I need to implement a regular expression that given the following sample dataset, allows me to select those student records that were born in a month of January and prior to the year 2000, and whose course grade is higher than 1.7. For example, the result of the command in the following sample should be the second record
id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-10-12,8.51,0,1
9274,2001-25-12,9.65,0,0
I've tried the following but I get no results after executing
grep -E "^*,1[0-9]{3}(-[0-9]{2}){2}*[10],(1\.[7-9][1-9])]"
Any idea of what's wrong?
CodePudding user response:
Using grep, you might write the full pattern as
grep -E '^[0-9] ,1[0-9]{3}-[0-9]{2}-0?1,(1\.(7[1-9][0-9]*|[89][0-9]*)|[2-9](\.[0-9] )?|10(\.0 )?),[0-9] ,[0-9] $' file
^
Start of string[0-9] ,
Match 1 digits and,
for the id1[0-9]{3}-
Match 1, 3 digits and-
for the year[0-9]{2}-0?1,
Match 2 digits-
and either 01 or 1 for the January(
Group for the alternatives1\.(7[1-9][0-9]*|[89][0-9]*)
Match 1.71-1.79 or 1.8 or 1.9 all followed by optional digits|
Or[2-9](\.[0-9] )?
Match 2-9 optionally followed by.
and 1 digits|
Or10(\.0 )?
Match 10 optionally followed by.
and zeroes (assuming 10 is the highest grade)
),
Close the group and match a comma[0-9] ,[0-9]
Match the last 2 column values, assuming 1 digits$
End of string
See a regex demo and a bash demo.
CodePudding user response:
In your regex, I can't find the january part and the grade doesn't look correct. Here's a simpler one:
grep -E '1...-..-01,([2-9]\.)|(1\.[7-9][1-9])'
Explanation:
1...-..-01 year=1xxx, month=01
[2-9]\.| grade 2-9 or
1.[7-9][1-9] grade 1.7[1-9]
Assumes grade < 10
(can be changed easily).
awk is a bit more straightforward:
awk -F[,-] '$2<2000 && $4=="01" && $5 > 1.7'