Home > Net >  Split Pandas dataframe into multiple dataframes based on empty column delimiter
Split Pandas dataframe into multiple dataframes based on empty column delimiter

Time:10-17

I'm reading the following excel sheet into a dataframe.

enter image description here

I want to split it into three dataframes by product. The tables will always be delimited by a single blank column in between, but each table can have different number of columns.

CodePudding user response:

Based on the article introduced in the comment, you can process it as follows.

import pandas as pd

#### Read excel file to dataframe
df = pd.read_excel('test.xlsx', index_col=None, header=None)

#### Find empty column and listed
empcols = [col for col in df.columns if df[col].isnull().all()]
df.fillna('', inplace=True)

#### Split into consecutive columns of valid data
allcols = list(range(len(df.columns)))
start = 0
colslist = []
for sepcol in empcols:
    colslist.append(allcols[start:sepcol])
    start = sepcol 1

colslist.append(allcols[start:])

#### Extract consecutive columns of valid data and store them in a dictionary
dfdic = {}
for i in range(len(colslist)):
    wkdf = df.iloc[:, colslist[i]]
    title = ''.join(wkdf.iloc[0].tolist())
    wkcols = wkdf.iloc[1].tolist()
    wkdf.drop(wkdf.index[[0,1]], inplace=True)
    wkdf.columns = wkcols
    dfdic[title] = wkdf.reset_index(drop=True)

#### Display each DataFrame stored in the dictionary
dfkeys = dfdic.keys()
for k in dfkeys:
    print(k)
    print(dfdic[k])
    print()
  • Related