I am trying to find the types of file extensions in a blob of text.
The file extensions are either followed by blank space or a special characters. For example. ".txt 123" or ".xlsx-(123)". In these examples the texts I want to extract are txt and xlsx.
Note that extensions may contain either uppercase or lowercase letters (or both).
I have tried the following.
.[a-z]([-\s])
But it doesn't work. Please help me how to solve this problem.
CodePudding user response:
\.([a-zA-Z] )[^A-Za-z]
- A dot is a metacharacter so it needs to be escaped (with the slash).
- The parentheses create a group.
- The character class
[a-zA-Z]
means any lowercase or uppercase letter. - The
[a-zA-Z]
means one or more letters. - A
^
character as the first character in a character class means NOT. In other words[^a-zA-Z]
means any character that is NOT a letter.
So the above expression looks for a [literal] dot followed by any number of lowercase or uppercase letters and finally followed by a non-letter (either lowercase or uppercase).