Home > front end >  Regular expression in bash to match multiple conditions
Regular expression in bash to match multiple conditions

Time:04-07

I would like to implement a regular expression in bash that allows me to verify a series of characteristics on a dataset. A sample is attached below:

id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1

id is required to have only 3 digits, date of birth less than 2000, minimum grade point average is 5.60 with the second decimal place being other than 0, and at least one expulsion or serious misconduct.

The result of executing the regular expression should be:

582, 1999-05-12, 8.51, 0, 1

I have tried to implement the following regular expression and it does not give me any result.

grep -E "^\d{0,3},[0-2][0-9][0-9][0-9].*,[1-5].[0-5][1-9],[1-9],[1-9]$"

Any idea?

CodePudding user response:

If it is mandatory to use grep, would you please try:

grep -E '^[0-9]{1,3},1[0-9]{3}(-[0-9]{2}){2},(5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9] \.[0-9][1-9]),([1-9][0-9]*,[0-9] |[0-9] ,[1-9][0-9]*)$' input_file

Result:

582,1999-05-12,8.51,0,1
  • [0-9]{1,3} matches if id has 1-3 digits. (I have interpreted only 3 digits like that. If it means differently, tweak the regex accordingly.)
  • 1[0-9]{3}(-[0-9]{2}){2} matches if the birth year is before 200 exclusive.
  • (5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9] \.[0-9][1-9]) matches if grade is greater than 5.60 with the second decimal place being other than 0.
  • ([1-9][0-9]*,[0-9] |[0-9] ,[1-9][0-9]*) matches if either or both of explusion and serious misdemeanor have non-zero value.

CodePudding user response:

Regular expressions do not understand numeric values, and they certainly do not understand boolean logic. All it knows is text. You'll need to use an actual programming language like Awk or Perl to do this.

Here's an example:

$ perl -l -a -F, -E'say if length($F[0])>3 || $F[2] < 5.60' foo.txt
123,2005-01-01,5.36,1,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1

This call to perl splits apart the fields on commas, and then prints the line if the length of the first column is over 3, or the value of the third column is less than 5.60.

This is just a starting point, but this is the direction to go.

  • Related