Home > Blockchain >  perl regex fails for no apparent reason
perl regex fails for no apparent reason

Time:08-30

I'm trying to rename some tickets with the rename package, but I'm only getting return code 4.

I have already entered some articles, and I am not finding the problem in my expression.

The text pattern is:

companyName-mm.yyyy.pdf

And I would like the pattern

companyName-yyyy.mm.pdf

PS: the companyName has 5 letters

I tried these commands without success:

rename 's/(\w{5})-(\d{2}).(\d{4}).*/' 's/$1-$3\.$2\.pdf/' *.pdf
rename 's/(\w{5})-(\d{2}).(\d{4}).*/$1-$3\.$2\.pdf/' *.pdf

Could someone tell me what I'm doing wrong?

CodePudding user response:

Hey Ruan!

with the details you specified, I think this regex could solve your problem:

rename 's/(?<=\w{5}-)(\d{2}).(\d{4})/$2.$1/' *.pdf

It basically uses lookbehind to not include the companyName result in a capture group but still validating if it matches or not with the alphabet characters and the length we specified.

It also does not read the ".pdf", so only 3 things are captured on the first section of the regex.

  • 0 - mm.yyyy (we'll not be using)
  • 1 - mm
  • 2 - yyyy

After capturing those groups and storing them into indexes starting from 0, we will replace their orders, so instead of "mm(1).yyyy(2)" they are replaced by "yyyy(2).mm(1)".

CodePudding user response:

rename -n s'/\w -\K([0-9]{2})\.([0-9]{4})(?=\.pdf)/$2.$1/' *pdf

The \w matches all consecutive "word-characters," [a-zA-Z0-9_]. So it stops matching at -, as needed. If you want to restrict to five chars change to \w{5}.

The \K following the word-and-hyphen makes it drop all matches up to that point so they don't need to be captured and restored in the replacement part.

Then the two-digit and four-digit numbers are captured, swapped in the replacement part.

Finally, the (?=...) is a positive lookahead, which asserts (only "looks" without consuming) that what follows the matched numbers is .pdf. If it's spooky or it doesn't work for some reason then just capture that and put it back

rename -n s'/\w -\K([0-9]{2})\.([0-9]{4})\.(pdf)/$2.$1.$3/' *pdf

Here I don't capture . before pdf but type it back in as I find this clearer to look at.

Remove -n to actually rename once you inspect the printed output.

Note, the command comes in different names across systems (prename on CentOS etc).

See perlretut for Perl's own regex tutorial.


The first command in the question is simply broken, as s/.../ isn't a valid substitution operator. The second one works for me but is potentially dangerous: it matches anything after the two numbers -- and regardless of what it is it replaces it with .pdf!

  • Related