Home > Back-end >  RTL lookup from extension until " or = is hit
RTL lookup from extension until " or = is hit

Time:02-24

I have a web application which processes file uploads. Upon extracting the file names this can be in formats such as

["attachment; filename=","New Microsoft Excel Worksheet.xlsx",""]
["attachment; filename=","/New Microsoft Excel Worksheet.xlsx/",""]
["attachment; filename="jimmy.xlsx""]
["attachment; filename=jimmy.xlsx]

I was initially only aware of example 2 and so did a lookup for all characters between the scape slashes

(?<=\/)(.*?)(?=\/)

However I need something that would in the examples above output either

New Microsoft Excel Worksheet.xlsx

OR

jimmy.xlsx

I.e. reading the string from right to left starting past file extension and ending at either " or =

CodePudding user response:

The pattern (?<=\/)(.*?)(?=\/) only matches between 2 mandatory forward slashes which is only present in 1 of the example strings.

One option could be to match not allowed characters, and end the match with a dot and word characters.

[^/\s"=\][][^/"=\][]*\.\w 
  • [^/\s"=\][] Start the match with a non whitespace char, except for / " = [ ]
  • [^/"=\][]* The same as before, now allowing a whitespace char
  • \. Match a .
  • \w Match 1 or more word characters

Regex demo

Or using a capture group, making the match more specific and for example leaving out matching a dot and word characters at the end.

\bfilename=[",/]*([^/\s"=\][][^/"=\][]*)

Regex demo

CodePudding user response:

Solved using the following regex thanks to:

https://www.reddit.com/user/CryptoOTC_creator/

[\w\s] \.[\.\w\d] 



[Character set. Match any character in the set.
\w Word. Matches any word character (alphanumeric & underscore).
\s Whitespace. Matches any whitespace character (spaces, tabs, line breaks).
] 
 Quantifier. Match 1 or more of the preceding token.
\.Escaped character. Matches a "." character (char code 46).
[Character set. Match any character in the set.
\.Escaped character. Matches a "." character (char code 46).
\w Word. Matches any word character (alphanumeric & underscore).
\d Digit. Matches any digit character (0-9).
] 
 Quantifier. Match 1 or more of the preceding token.
  • Related