I have about (100 files ) XLS files in a folder with different columns names and data types
File_1.xls:
Id test category
1 ab 4
2 cs 3
3 cs 1
FILE_2.xls:
index remove stocks category
1 dr 4 a
2 as 3 b
3 ae 1 v
File 3: ....
File 100.....
Thats my code:
import pandas as pd
import pathlib
data = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)
I would like to have the Dataframe in my output like this:
filename variable value
0 File_1 Id 1
1 File_1 Id 2
2 File_1 Id 3
3 File_1 test ab
4 File_1 test cs
5 File_1 test cs
6 File_1 category 4
7 File_1 category 3
8 File_1 category 1
9 File_1 index 1
10 File_1 index 2
11 File_1 index 3
12 FILE_2 remove dr
13 FILE_2 remove as
14 FILE_2 remove ae
15 FILE_2 stocks 4
16 FILE_2 stocks 3
17 FILE_2 stocks 1
18 FILE_2 category a
19 FILE_2 category b
20 FILE_2 category v
1000 FILE_100 .... ..
How I can melt all mu columns and keep in the column "filename" the name of the source file?
CodePudding user response:
Don't read your data in the loop but rather collect the filenames, then use a dictionary comprehension to add the filenames as concatenation keys:
import pandas as pd
import pathlib
files = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
files.append(filename)
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
.rename_axis(['filename', None]).reset_index(0)
.reset_index(drop=True)
)
output:
filename variable value
0 File_1 Id 1
1 File_1 Id 2
2 File_1 Id 3
3 File_1 test ab
4 File_1 test cs
5 File_1 test cs
6 File_1 category 4
7 File_1 category 3
8 File_1 category 1
9 File_2 index 1
10 File_2 index 2
11 File_2 index 3
12 File_2 remove dr
13 File_2 remove as
14 File_2 remove ae
15 File_2 stocks 4
16 File_2 stocks 3
17 File_2 stocks 1
18 File_2 category a
19 File_2 category b
20 File_2 category v