Home > Blockchain >  CLI method to replace white space from regex capture group
CLI method to replace white space from regex capture group

Time:09-27

I have some markdown files where the URLs have spaces in them. I want to replace the whitespace in the URL with a hyphen. I am not sure if this is even possible with sed.

For example:

[the name of the link](www.example.com/a badly named thing)

should become

[the name of the link[(www.example.com/a-badly-named-thing)

I know that I can capture the bad url with the expression below, but how can I then do something with it?

s/[.*](/[/1/])/<do something to group 1>/g

CodePudding user response:

You can use

sed -e ':a' -e 's/\(\[[^][]*]([^()[:space:]]*\)[[:space:]]\{1,\}\([^()]*)\)/\1-\2/' -e 'ta' file > newfile

See the online demo.

Details:

  • :a - sets a label
  • \(\[[^][]*]([^()[:space:]]*\)[[:space:]]\{1,\}\([^()]*)\) - a POSIX BRE pattern matching
    • \(\[[^][]*]([^()[:space:]]*\) - Group 1 (\1):
      • \[[^][]*] - a [, then zero or more chars other than ] and [ and then a ]
      • ( - a ( char
      • [^()[:space:]]* - zero or more chars other than (, ) and whitespace
    • [[:space:]]\{1,\} - one or more whitespace chars
    • \([^()]*)\) - Group 2 (\2):
      • [^()]* - zero or more chars other than ( and )
      • ) - a ) char
  • The \1-\2 replacement replaces the match with Group 1 value - Group 2 value
  • ta means that if there was a successful substitution, the engine jumps back to the label location.

CodePudding user response:

Using GNU sed

$ sed -E ':a;s/\(([^)]*) /(\1-/;ta' input_file
[the name of the link](www.example.com/a-badly-named-thing)

CodePudding user response:

Perl, but it's pretty gross:

perl -pe 's{ \[. ?\] \( \K [^)]  }{ $url = $&; $url =~ tr/ /-/; $url }xeg'

That matches the link name in brackets then the opening parenthesis of the url part and the forgets about that stuff, and then matches a sequence of non-close-parentheses. That matched text is then replaced by the results of a sub-script that changes spaces into hyphens

  • Related