I have about (100 files ) XLSX files in a folder with different columns names and data types
File 1:
Id test category
1 ab 4
2 cs 3
3 cs 1
FILE 2:
index remove stocks category
1 dr 4 a
2 as 3 b
3 ae 1 v
File 3: ....
File 4.....
This is my try based on another example:
# current directory (including python script & all excel files)
mydir = (os.getcwd()).replace('\\','/') '/'
#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)
# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1
df=[]
for i in range(number_of_files):
try:
df.melt(pd.read_excel(r'' filelist[i]))
except:
print('Empty Excel File')
print(df)
RESULTS:
Empty Excel File
Empty Excel File
Empty Excel File
Empty Excel File
[]
How I could unpivot the data instead of "appending" the data in columns?
I want to unpivot all my files data to this dataframe format.
Dataframe:
Id 1
Id 2
Id 3
test ab
test cs
test cs
category 4
category 3
category 1
index 1
index 1
index 1
remove dr
remove as
remove ae
stocks 4
stocks 3
stocks 1
category a
category b
category v
CodePudding user response:
You could use:
import pandas as pd
import pathlib
data = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)
Output:
>>> df
variable value
0 Id 1
1 Id 2
2 Id 3
3 test ab
4 test cs
5 test cs
6 category 4
7 category 3
8 category 1
9 index 1
10 index 2
11 index 3
12 remove dr
13 remove as
14 remove ae
15 stocks 4
16 stocks 3
17 stocks 1
18 category a
19 category b
20 category v
CodePudding user response:
I have tested it with your example input:
one={"Id": [1,2,3], "test": ["ab","cs","cs"], "category": [4,3,1]}
two= {"index": [1,2,3], "remove": ["dr","as","ae"], "stocks": [4,3,1], "category": ["a", "b", "v"]}
df1 = pd.DataFrame(one)
df2 = pd.DataFrame(two)
final = pd.concat([df1.melt(),df2.melt()])
final:
variable value
0 Id 1
1 Id 2
2 Id 3
3 test ab
4 test cs
5 test cs
6 category 4
7 category 3
8 category 1
0 index 1
1 index 2
2 index 3
3 remove dr
4 remove as
5 remove ae
6 stocks 4
7 stocks 3
8 stocks 1
9 category a
10 category b
11 category v