I have about (100 files ) XLS files in a folder with different columns names and data types.
File_1.xls:
Id test category
1 ab 4
2 cs 3
3 cs 1
.. ..
FILE_2.xls:
index remove stocks category
1 dr 4 a
2 as 3 b
3 ae 1 v
..
File 3: ....
File 100.....
This is my code:
import pandas as pd
import pathlib
files = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
files.append(filename)
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
.rename_axis(['filename', None]).reset_index(0)
.reset_index(drop=True)
.iloc[:1]
)
Output
filename variable value
0 File_1 Id 1
1 File_1 Id 2
Expected Results
filename variable value
0 File_1 Id 1
1 File_1 Id 2
2 File_1 test ab
3 File_1 test cs
4 File_1 category 4
5 File_1 category 3
6 File_2 index 1
7 File_2 index 2
8 File_2 remove dr
9 File_2 remove as
10 File_2 stocks 4
11 File_2 stocks 3
12 File_2 category a
13 File_2 category b
...
I want to get for all the files the first 2 rows, Melt, and export them in a dataframe.
CodePudding user response:
try this
import pathlib
import pandas as pd
files = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
files.append(filename)
# DataFrame for final result
empty_df = pd.DataFrame()
# iterating over each file
for f in files:
# read first 2 rows of DataFrame and melt
temp = pd.melt(pd.read_excel(f).head(2))
# add a new column that assigns the filename
temp['filename'] = f
# concat with the final result DataFrame
empty_df = pd.concat([empty_df, temp])
# if column order is relevant you may re-arrange them like so
empty_df = empty_df[['filename', 'variable', 'value']]
display(empty_df)