Home > OS >  Perl regex with capture groups not working
Perl regex with capture groups not working

Time:08-17

I have the following file:

/Users/x/-acite _1660475490N.html
/Users/x/-acite _1660464772N.html
/Users/x/-acite _1660464242N.html
/Users/x/-acite _1660463321N.html
/Users/x/-acite _1660421444N.html
/Users/x/-acite _1612414441N.html

/Users/x/fix _1660399672N.html
/Users/x/fix _1660398829N.html

/Users/x/water witching _1660460617N.html
/Users/x/water witching _1660388149N.html
/Users/x/water witching _1632222441N.html
/Users/x/water witching _1660003224N.html

I need

/Users/x/-acite _1660475490N.html
/Users/x/fix _1660399672N.html
/Users/x/water witching _1660460617N.html

I use the following perl regex:

find . -type f -exec perl -pi -w -e 's/(.*)(\R)(.*)(\R)/$1$2/' \{\} \;

or

find . -type f -exec perl -pi -w -e 's/(.*?)(\R)(.*?)(\R)/$1$2/g;' \{\} \;

Why are these not working?

CodePudding user response:

Also, you could read in paragraph mode, (-00), and match and print the first line of each 'paragraph'.

C:\Old_Data\perlp>perl -00 -ne 'print /(. \n)/' test01.txt
/Users/x/-acite _1660475490N.html
/Users/x/fix _1660399672N.html
/Users/x/water witching _1660460617N.html

CodePudding user response:

You are

  • not slurping the whole file to a single string and
  • only replacing the first occurrence
  • and you do not need so many groups, you just need one since you want to keep one part of a match.

You need

find . -type f -exec perl -0777 -i -pe 's/^(. )(?:\R. )*\n/$1/gm' \{\} \;

Here,

  • -0777 slurps the file
  • ^ - start of a line (due to m flag)
  • (. ) - matches a non-empty line
  • (?:\R. )* - zero or more sequences of a line break and a non-empty line
  • \n - matches a newline
  • Related