I have a dataset with data separated by commas, here is an example:
id, date of birth, grade, explusion, serious misdemeanor, info
123,2005-01-01,5.36,1,1,
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0,pass
21,2006-14-05,0.53,4,1,repeat
The case, is that I need to implement a regular expression using sed to remove all those records from the student dataset that do not have any explusion nor a serious misdemeanor. So the result of executing the command would be the third register of the previous sample.
sed -i "/^*,*,*,0,0$/d" file.csv
Any idea of what's missing?
CodePudding user response:
You might want to use awk
to check Fields 4 and 5, and only return line where they are not 0
:
awk -F, '$4 != 0 || $5 != 0' file.csv > output.csv
Or, to get the other lines:
awk -F, '$4 == 0 && $5 == 0' file.csv > output.csv
See the online demo.
You can also use
sed -i '/,0,0$/d' file.csv
With this, you will remove all lines ending with ,0,0
.
See the online demo:
#!/bin/bash
s='id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1'
sed '/,0,0$/d' <<< "$s"
Output:
id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
21,2006-14-05,0.53,4,1
To see the other lines, use a reverse command like
sed -i -n '/,0,0$/p' file.csv
It will print the lines that end with ,0,0
.
CodePudding user response:
You seem to think *
means "anything" but it means "repeat the previous regular expression zero or more times, as many as possible". Regular expressions are different from wildcards as used in many shells and search engines, where *
often does mean "any string".
The regular expression .*
means "any character at all, repeated as many times as possible" but in this case you clearly mean [^,]*
which means "any character which isn't a comma, repeated as many times as possible."
However, sed
will happily match on a substring, so just
sed -i '/,0,0$/d' file.csv
should work, or equivalently
grep -v ',0,0$' file.csv >temp && mv temp file.csv
CodePudding user response:
Using sed
$ sed 's/,/&#/3;/#0/d;s/,/&#/4;/#0/d;s/#//g' input_file
id, date of birth, grade, explusion, serious misdemeanor, info
123,2005-01-01,5.36,1,1,
21,2006-14-05,0.53,4,1,repeat
Match the third/fourth occurance of a comma and place a marker in all lines. If the marker has a 0
beside it, then it matches as a field with no expulsion or serious misdemeanor and is deleted.