Home > Back-end >  How to make regex works with perl command and extract numbers from a file?
How to make regex works with perl command and extract numbers from a file?

Time:11-19

I'm trying to extract from a tab delimited file a number that i need to store in a variable. I'm approaching the problem with a regex that thanks to some research online I have been able to built.

The file is composed as follow:

0   0   2500    5000
1   5000    7500    10000
2   10000   12500   15000
3   15000   17500   20000
4   20000   22500   25000
5   25000   27500   30000

I need to extract the number in the second column given a number of the first one. I wrote and tested online the regex:

(?<=5\t).*?(?=\t)

I need the 25000 from the sixth line.

I started working with sed but as you already know, it doesn't like lookbehind and lookahead pattern even with the -E option to enable extended version of regular expressions. I tried also with awk and grep and failed for similar reasons.

Going further I found that perl could be the right command but I'm not able to make it work properly. I'm trying with the command

perl -pe '/(?<=5\t).*?(?=\t)/' | INFO.out

but I admit my poor knowledge and I'm a bit lost.

The next step would be to read the "5" in the regex from a variable so if you already know problems that could rise, please let me know.

CodePudding user response:

One option is to use sed, match 5 at the start of the string and after the tab capture the digits in a group

sed -En 's/^5\t([[:digit:]] )\t.*/\1/p' file > INFO.out

The file INFO.out contains:

25000

CodePudding user response:

Why do you need to use a regex? If all you are doing is finding lines starting with a 5 and getting the second column you could use sed and cut, e.g.:

<infile sed -n '/^5\t/p' | cut -f2

Output:

25000

CodePudding user response:

Using sed

$ var1=$(sed -n 's/^5[^0-9]*\([^ ]*\).*/\1/p' input_file)
$ echo "$var1"
25000

CodePudding user response:

No need for lookbehinds -- split each line on space and check whether the first field is 5.

In Perl there is a command-line option convenient for this, -a, with which each line gets split for us and we get @F array with fields

perl -lanE'say $F[1] if $F[0] == 5' data.txt

Note that this tests for 5 numerically (==)

  • Related