Home > Enterprise >  How to write regex for finding the extensions of files
How to write regex for finding the extensions of files

Time:10-08

I am trying to find the types of file extensions in a blob of text.

The file extensions are either followed by blank space or a special characters. For example. ".txt 123" or ".xlsx-(123)". In these examples the texts I want to extract are txt and xlsx.

Note that extensions may contain either uppercase or lowercase letters (or both).

I have tried the following.

.[a-z]([-\s])

But it doesn't work. Please help me how to solve this problem.

CodePudding user response:

\.([a-zA-Z] )[^A-Za-z]
  • A dot is a metacharacter so it needs to be escaped (with the slash).
  • The parentheses create a group.
  • The character class [a-zA-Z] means any lowercase or uppercase letter.
  • The means one or more occurrences of the previous expression, hence [a-zA-Z] means one or more letters.
  • A ^ character as the first character in a character class means NOT. In other words [^a-zA-Z] means any character that is NOT a letter.

So the above expression looks for a [literal] dot followed by any number of lowercase or uppercase letters and finally followed by a non-letter (either lowercase or uppercase).

  • Related