Home > Software design >  Pandas read excel files in folder and Unpivot columns into Dataframe
Pandas read excel files in folder and Unpivot columns into Dataframe

Time:07-07

I have about (100 files ) XLSX files in a folder with different columns names and data types

File 1:

Id  test  category
1   ab      4
2   cs      3
3   cs      1

FILE 2:

index  remove  stocks  category
1      dr      4         a
2      as      3         b
3      ae      1         v

File 3: ....

File 4.....

This is my try based on another example:

    #  current directory (including python script & all excel files)
    mydir = (os.getcwd()).replace('\\','/')   '/'
    
    #Get all excel files include subdir
    filelist=[]
    for path, subdirs, files in os.walk(mydir):
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
                filelist.append(os.path.join(path, file))
    number_of_files=len(filelist)
    print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1


df=[]
for i in range(number_of_files):
    try:
        df.melt(pd.read_excel(r'' filelist[i]))
    except:
        print('Empty Excel File')
print(df)

RESULTS:

Empty Excel File
Empty Excel File
Empty Excel File
Empty Excel File
[]

How I could unpivot the data instead of "appending" the data in columns?

I want to unpivot all my files data to this dataframe format.

Dataframe:

Id    1
Id    2
Id    3
test  ab
test  cs
test  cs
category 4
category 3
category 1
index    1
index    1
index    1
remove   dr
remove   as
remove   ae
stocks   4
stocks   3
stocks   1
category a
category b
category v

CodePudding user response:

You could use:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

Output:

>>> df
     variable value
0          Id     1
1          Id     2
2          Id     3
3        test    ab
4        test    cs
5        test    cs
6    category     4
7    category     3
8    category     1
9       index     1
10      index     2
11      index     3
12     remove    dr
13     remove    as
14     remove    ae
15     stocks     4
16     stocks     3
17     stocks     1
18   category     a
19   category     b
20   category     v

CodePudding user response:

I have tested it with your example input:

one={"Id": [1,2,3], "test": ["ab","cs","cs"],  "category": [4,3,1]}
two= {"index": [1,2,3],  "remove": ["dr","as","ae"],  "stocks": [4,3,1],  "category": ["a", "b", "v"]}
df1 = pd.DataFrame(one)
df2 = pd.DataFrame(two)
final = pd.concat([df1.melt(),df2.melt()])
final:
    variable value
0         Id     1
1         Id     2
2         Id     3
3       test    ab
4       test    cs
5       test    cs
6   category     4
7   category     3
8   category     1
0      index     1
1      index     2
2      index     3
3     remove    dr
4     remove    as
5     remove    ae
6     stocks     4
7     stocks     3
8     stocks     1
9   category     a
10  category     b
11  category     v
  • Related