Home > OS >  Melting Python Pandas and Input Filename in column
Melting Python Pandas and Input Filename in column

Time:07-07

I have about (100 files ) XLS files in a folder with different columns names and data types

File_1.xls:

Id  test  category
1    ab      4
2    cs      3
3    cs      1

FILE_2.xls:

index remove  stocks  category
1       dr      4         a
2       as      3         b
3       ae      1         v

File 3: ....

File 100.....

Thats my code:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

I would like to have the Dataframe in my output like this:

        filename variable value
    0   File_1     Id      1
    1   File_1     Id      2
    2   File_1     Id      3
    3   File_1     test    ab
    4   File_1     test    cs
    5   File_1     test    cs
    6   File_1  category    4
    7   File_1  category    3
    8   File_1  category    1
    9   File_1   index      1
    10  File_1   index      2
    11  File_1   index      3
    12  FILE_2   remove     dr
    13  FILE_2  remove      as
    14  FILE_2  remove      ae
    15  FILE_2  stocks      4
    16  FILE_2  stocks      3
    17  FILE_2  stocks      1
    18  FILE_2  category    a
    19  FILE_2  category    b
    20  FILE_2  category    v
  1000  FILE_100 ....       ..

How I can melt all mu columns and keep in the column "filename" the name of the source file?

CodePudding user response:

Don't read your data in the loop but rather collect the filenames, then use a dictionary comprehension to add the filenames as concatenation keys:

import pandas as pd
import pathlib

files = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        files.append(filename)
        
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
        .rename_axis(['filename', None]).reset_index(0)
        .reset_index(drop=True)
     )

output:

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2
2    File_1        Id     3
3    File_1      test    ab
4    File_1      test    cs
5    File_1      test    cs
6    File_1  category     4
7    File_1  category     3
8    File_1  category     1
9    File_2     index     1
10   File_2     index     2
11   File_2     index     3
12   File_2    remove    dr
13   File_2    remove    as
14   File_2    remove    ae
15   File_2    stocks     4
16   File_2    stocks     3
17   File_2    stocks     1
18   File_2  category     a
19   File_2  category     b
20   File_2  category     v
  • Related