I have a csv where I need to trim some text from a column probably using a regex. This is an example:
test.csv:
Number | Text | Group |
---|---|---|
1 | the id is G11 (sometimes) | A |
2 | the id is G12 | B |
3 | the id is G15 | C |
4 | the id is Z13 (always) | D |
I want to only grab the code value like (G11, G12, G15, Z13) from the Text column and replace the value so that I would get:
Number | Text | Group |
---|---|---|
1 | G11 | A |
2 | G12 | B |
3 | G15 | C |
4 | Z13 | D |
I have tried using forms of grep, awk, and sed but haven't got anything to work. I'm unfamiliar with bash, but essentially I want a new csv that looks like table 2.
These are my attempts, unsure how to only work on the Test column....
sed 's/([A-Z]\d{2}).*/([A-Z]\d{2})' test.csv > test2.csv
CodePudding user response:
Using sed
sed -E '1!s/([^a-z]*)[^A-Z]*([^ ]* )[^|]*/\1\2/'
Number | Text | Group |
---|---|---|
1 | G11 | A |
2 | G12 | B |
3 | G15 | C |
4 | Z13 | D |
1!
- Do not match line 1
([^a-z]*)
- Retain everything within the parenthesis up until the next occurrence of lower case letters which will later be returned with back reference \1
.
[^A-Z]*
- Exclude everything up till the next occurrence of capital letters as it is not captured within parenthesis.
([^ ]* )
- Retain everything up to the next occurrence of a space and including a space which will later be returned with back reference \2
.
[^|]*
- Exclude everything up till the next occurrence of a pipe symbol
Anything not included in the find will be returned with the replacement.