Home > Software design >  Regex that extracts everything until finds "/", starting from the end
Regex that extracts everything until finds "/", starting from the end

Time:08-13

I'm writing a script in bash where I use the grep function with a regex expression to extract an id which I will be using as a variable.

The goal is to extract all characters until it finds /, but the caracter ' and } should be ignored.

file.txt:

{'name': 'projects/data/locations/us-central1/datasets/dataset/source1/messages/B0g2_e8gG_xaZzpbliWvjlShnVdRNEw='}

command:

cat file.txt | grep -oP "[/] ^"

The current command isn't working.

desired output:

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

CodePudding user response:

The regex you gave was: [/] ^

It has a few mistakes:

  • Your use of ^ at the end seems to imply you think you can ask the software to search backwards - You can't;
  • [/] matches only the slash character.

Your sample shows what appears to be a malformed JSON object containing a key-value pair, each enclosed in single-quotes. JSON requires double-quotes so perhaps it is not JSON.

If several assumptions are made, it is possible to extract the section of the input that you seem to want:

  • file contains a single line; and
  • key and value are strings surrounded by single-quote; and
  • either:
    • the value part is immediately followed by }; or
    • the name part cannot contain /

You are using -P option to grep, so lookaround operators are available.

(?<=/)[^/] (?=')
  • lookbehind declares match is preceded by /
  • one or more non-slash (the match)
  • lookahead declares match is followed by '
[^/] (?='})
  • one or more non-slash (the match)
  • lookahead declares match is followed by ' then }

Note that the match begins as early in the line as possible and with greedy it is as long as possible.

CodePudding user response:

Using any awk:

$ awk -F"[/']" '{print $(NF-1)}' file.txt
B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

CodePudding user response:

Basic parameter parsing.

$: x="$(<file.txt)"            # file contents in x
$: x="${x##*/}"                # strip to last / to get rid of 'name'
$: x="${x//[^[:alnum:]=]}"     # strip not alphanumeric or = to clean the end
$: echo "$x"
B0g2e8gGxaZzpbliWvjlShnVdRNEw=
  • Related