A folder contains a README.txt and several dicom files named emr_000x.sx (where x are numerical values). In the README.txt are different lines, one of which contains the characters "xyz" and a corresponding emr_000x.sx in the line.
I would like to: read into the .txt, identify which line contains "xyz", and extract the emr_000x.sx from that line only. For reference, the line in the .txt is formatted in this way:
A:emr_000x.sx, B:00001, C:number, D(characters)string_string_number_**xyz**_number_number
I think using grep might be helpful, but am not familiar enough to bash coding myself. Does anyone know how to solve this? Many thanks!
CodePudding user response:
You can use awk
to match fields on you csv:
awk -F, '$4 ~ "xyz" {sub(/^A:/, "", $1); print $1}'
CodePudding user response:
I like sed
for this sort of thing.
sed -nE '/xyz/{ s/^.*A:([^,] ),.*/\1/; p; }' README.txt
This says, "On lines where you see xyz
replace the whole line with the non-commas between A:
and a comma, then print the line."
-n
is n
o printing unless I say so. (p
means p
rint.)
-E
just means to use E
xtended regexes.
/xyz/{...}
means "on lines where you see xyz
do the stuff between the curlies."
s/^.*A:([^,] ),.*/\1/
will s
ubstitute the matched part (which should be the whole line) with just the part between the parens.