Home > Enterprise >  Extract Filenames from a list of files including Full-Path in Notepad
Extract Filenames from a list of files including Full-Path in Notepad

Time:01-17

I have a list containing thousonds of files in a text file like this:

C:\AAAA\BBB\CCC\file1.dat
D:\AAAA\FF FF F\CCC\file 2.dat
D:\ANN NN\BBB\CCC\The.Third.File.dat

and I want to keep just the filenames like this:

file1
file 2
The.Third.File

How can I do it? Maybe someone can do it with RegEx?

I can do it in Delphi (the language I master) like this:

var
  St: TStringList;
  i: Integer;
begin
  st := TStringList.Create;
  try
    st.LoadFromFile('F:\TheFile.txt');
    for i := 0 to st.Count - 1 do
      st[i] := ChangeFileExt(ExtractFileName(st[i]), '');
    st.SaveToFile('F:\TheFile.txt');
  finally
    st.Free;
  end;
end;

but I want to learn it in NotePad .

CodePudding user response:

To keep the filenames after the last \ you could use a pattern to match the leading char A-Z followed by :\ and then optionally match until the last \

Find what

^[A-Z]:\\(?:.*\\)?

The pattern matches:

  • ^ Start of string
  • [A-Z]:\\ Match a single char A-Z and then :\
  • (?:.*\\)? Optionally match the rest of the line until the last occurrence of \

See a enter image description here


If you want to remove the extension, and by that being the characters after the last dot, you can match what you want to remove and use a capture group for what you want to keep.

In the replacement use the capture group 1 value noted as $1

Assuming no \ or . or spaces in the extension:

^[A-Z]:\\(?:.*\\)?([^\r\n\\] )\.[^\\.\s] $

The pattern matches:

  • ^ Start of string
  • [A-Z]:\\ Match a single char A-Z and then :\
  • (?:.*\\)? Optionally match the rest of the line until the last occurrence of \
  • ( Capture group 1
    • [^\r\n\\] Match 1 chars other than newlines or \
  • ) Close group 1
  • \.[^\\.\s] Match . and 1 chars other than \ . or a whitespace char
  • $ End of string

See another regex101 demo

CodePudding user response:

Just use search and replace in Notepad with this as search field:

^.*\\

And an empty replace value.

Then choose regular expression and don't check the option for . matching new lines.

Explained and testable here: https://regex101.com/r/8Fp6E6/1

  • ^ asserts position at start of a line.
  • . matches any character (except for line terminators).
  • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy).
  • \\ matches the character \.

As you can see the idea is to get rid of the path before the file name. We take advantage of the fact the by default the .* matches anything until a \ char but by beeing greedy so continuing until it can. The ungreedy pattern would by ^.*?\\ and in this case it would get rid of the drive and first backslash only.

If you want to search for the files only by doing it by searching for anything which isn't a backslash and is then ending the line then you could search for this:

[^\\] $

Explained:

  • [^ ... ] matches any char which isn't in the given list (...).
  • So [^\\] matches any char which isn't a backslash.
  • matches the previous token one or several times. It's similar to * but we don't want to match zero times so this is why we use instead of *.
  • $ matches the end of the line.

Second question to also remove the file extension

To remove the path and the extension in the same time, you'll have to match and capture the part you want (the filename without the extension) with some parenthesis and use the captured content as replacement. Search field:

^.*\\([^\\] ?)(?:\.\w )?$

Replacement field: $1 (= capturing group number 1)

Explained:

  • ^.*\\ matches the path, as before.
  • ([^\\] ?) will capture the filename part in an ungreedy way.
  • (?:\.\w )?$ will match the dot and the file extension if it exists:
    • \. matches the dot character.
    • \w matches any word char and is equivalent to [a-zA-Z0-9_].
    • (?: ) is a non-capturing group. We use it because we want to say that the file extension is optional with a question mark after it: (?:\.\w )?.
    • $ matches the end of the line.

Test it here: https://regex101.com/r/wvDjAJ/2

  • Related