I'm trying to extract from a tab delimited file a number that i need to store in a variable. I'm approaching the problem with a regex that thanks to some research online I have been able to built.
The file is composed as follow:
0 0 2500 5000
1 5000 7500 10000
2 10000 12500 15000
3 15000 17500 20000
4 20000 22500 25000
5 25000 27500 30000
I need to extract the number in the second column given a number of the first one. I wrote and tested online the regex:
(?<=5\t).*?(?=\t)
I need the 25000 from the sixth line.
I started working with sed but as you already know, it doesn't like lookbehind and lookahead pattern even with the -E
option to enable extended version of regular expressions. I tried also with awk and grep and failed for similar reasons.
Going further I found that perl could be the right command but I'm not able to make it work properly. I'm trying with the command
perl -pe '/(?<=5\t).*?(?=\t)/' | INFO.out
but I admit my poor knowledge and I'm a bit lost.
The next step would be to read the "5" in the regex from a variable so if you already know problems that could rise, please let me know.
CodePudding user response:
One option is to use sed, match 5 at the start of the string and after the tab capture the digits in a group
sed -En 's/^5\t([[:digit:]] )\t.*/\1/p' file > INFO.out
The file INFO.out contains:
25000
CodePudding user response:
Why do you need to use a regex? If all you are doing is finding lines starting with a 5 and getting the second column you could use sed
and cut
, e.g.:
<infile sed -n '/^5\t/p' | cut -f2
Output:
25000
CodePudding user response:
Using sed
$ var1=$(sed -n 's/^5[^0-9]*\([^ ]*\).*/\1/p' input_file)
$ echo "$var1"
25000
CodePudding user response:
No need for lookbehinds -- split each line on space and check whether the first field is 5
.
In Perl there is a command-line option convenient for this, -a
, with which each line gets split for us and we get @F
array with fields
perl -lanE'say $F[1] if $F[0] == 5' data.txt
Note that this tests for 5
numerically (==
)