Home > Software engineering >  Word extraction with regex string
Word extraction with regex string

Time:10-20

From this post, I am able recognize the pattern object.* by use or regex string m/(?<=object\.)\w*. However, since I am unfamiliar with Linux, I cannot use the commands sed or perl properly to extract desired tokens. Thus, I need your help. My best guess is grep -E -n object file.txt | perl -nle 'm/(?<=object\.)\w*/; print $1'.

CodePudding user response:

You can use grep or sed:

grep -oP '(?<=object\.)\w ' file
sed -nE 's/.*object\.([[:alnum:]_] ).*/\1/p' file

See the online demo.

The grep -oP allows you to use PCRE regex (with -P option) and extract all matched texts (with -o option).

The sed command is more complex, it allows extracting matches (that are the last on a line) once per line: first, it suppresses the default line output with -n and sets the regex flavor to POSIX ERE (with -E), then matches a line with object. one or more alphanumeric or underscore chars captured into \1 and replaces the full line with the Group 1 value, and only that result is returned.

CodePudding user response:

$1 contains what the first capture ((...)) captured. But you don't have any captures.

Instead, you want $&, which contains the text matched by the pattern.

grep -E -n object file.txt | perl -nle'm/(?<=object\.)\w*/; print $&'

And rather than printing unconditionally, you can print only if a match is found, eliminating the need for grep.

perl -nle'print $? if /(?<=object\.)\w /' file.txt

Finally, we don't need the relatively-slow lookaround.

perl -nle'print $1 if /object\.(\w )/' file.txt

On some systems, grep can also do the job using -o and -P.

grep -oP '(?<=object\.)\w ' file.txt
  • Related