I'm reading the following excel sheet into a dataframe.
I want to split it into three dataframes by product. The tables will always be delimited by a single blank column in between, but each table can have different number of columns.
CodePudding user response:
Based on the article introduced in the comment, you can process it as follows.
import pandas as pd
#### Read excel file to dataframe
df = pd.read_excel('test.xlsx', index_col=None, header=None)
#### Find empty column and listed
empcols = [col for col in df.columns if df[col].isnull().all()]
df.fillna('', inplace=True)
#### Split into consecutive columns of valid data
allcols = list(range(len(df.columns)))
start = 0
colslist = []
for sepcol in empcols:
colslist.append(allcols[start:sepcol])
start = sepcol 1
colslist.append(allcols[start:])
#### Extract consecutive columns of valid data and store them in a dictionary
dfdic = {}
for i in range(len(colslist)):
wkdf = df.iloc[:, colslist[i]]
title = ''.join(wkdf.iloc[0].tolist())
wkcols = wkdf.iloc[1].tolist()
wkdf.drop(wkdf.index[[0,1]], inplace=True)
wkdf.columns = wkcols
dfdic[title] = wkdf.reset_index(drop=True)
#### Display each DataFrame stored in the dictionary
dfkeys = dfdic.keys()
for k in dfkeys:
print(k)
print(dfdic[k])
print()