Melting Python Pandas and Input Filename in column-CodePudding

I have about (100 files ) XLS files in a folder with different columns names and data types

File_1.xls:

Id  test  category
1    ab      4
2    cs      3
3    cs      1

FILE_2.xls:

index remove  stocks  category
1       dr      4         a
2       as      3         b
3       ae      1         v

File 3: ....

File 100.....

Thats my code:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

I would like to have the Dataframe in my output like this:

        filename variable value
    0   File_1     Id      1
    1   File_1     Id      2
    2   File_1     Id      3
    3   File_1     test    ab
    4   File_1     test    cs
    5   File_1     test    cs
    6   File_1  category    4
    7   File_1  category    3
    8   File_1  category    1
    9   File_1   index      1
    10  File_1   index      2
    11  File_1   index      3
    12  FILE_2   remove     dr
    13  FILE_2  remove      as
    14  FILE_2  remove      ae
    15  FILE_2  stocks      4
    16  FILE_2  stocks      3
    17  FILE_2  stocks      1
    18  FILE_2  category    a
    19  FILE_2  category    b
    20  FILE_2  category    v
  1000  FILE_100 ....       ..

How I can melt all mu columns and keep in the column "filename" the name of the source file?

CodePudding user response：

Don't read your data in the loop but rather collect the filenames, then use a dictionary comprehension to add the filenames as concatenation keys:

import pandas as pd
import pathlib

files = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        files.append(filename)
        
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
        .rename_axis(['filename', None]).reset_index(0)
        .reset_index(drop=True)
     )

output:

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2
2    File_1        Id     3
3    File_1      test    ab
4    File_1      test    cs
5    File_1      test    cs
6    File_1  category     4
7    File_1  category     3
8    File_1  category     1
9    File_2     index     1
10   File_2     index     2
11   File_2     index     3
12   File_2    remove    dr
13   File_2    remove    as
14   File_2    remove    ae
15   File_2    stocks     4
16   File_2    stocks     3
17   File_2    stocks     1
18   File_2  category     a
19   File_2  category     b
20   File_2  category     v