I have an excel table (sample.xlsx) which contains 3 sheets ('Sheet1','Sheet2','Sheet3'). Now I have read all the sheets and combine them into one dataframe.
import pandas as pd
data_df = pd.concat(pd.read_excel("sample.xlsx", header=None, index_col=None, sheet_name=None))
data_df looks like this:
0 1 2
Sheet1 0 val1 val2 val3
1 val11 val21 val31
Sheet2 0 val1 val2 val3
1 val11 val21 val31
Sheet3 0 val1 val2 val3
1 val11 val21 val31
Is there any way to create a new dataframe which has the same shape with data_df but each cell value is the cell position info?
I have tried to get multiple index:
multi_index = data_df.index.levels[:]
and I get:
[['Sheet1', 'Sheet2', 'Sheet3'], [0, 1]]
But I don't know how to use these data to create a new dataframe like this:
0 1 2
0 Sheet1 - A1 Sheet1 - B1 Sheet1 - C1
1 Sheet1 - A2 Sheet1 - B2 Sheet1 - C2
2 Sheet2 - A1 Sheet2 - B1 Sheet2 - C1
3 Sheet2 - A2 Sheet2 - B2 Sheet2 - C2
4 Sheet3 - A1 Sheet3 - B1 Sheet3 - C1
5 Sheet3 - A2 Sheet3 - B2 Sheet3 - C2
Thanks in advance!
CodePudding user response:
Look into the “openpyxl” package. I think you can do something like “for sheet in sheets” then read each sheet to your data frame like that. Hope this guides you to the right place.
CodePudding user response:
Since the values in your data_df
don't matter, you could build the cartesian product of index and columns and build a new dataframe of it.
mapping = dict(enumerate('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
mapping is needed to convert the column numbers 0,1,2,... to A,B,C,...
import itertools
import numpy as np
out = (
pd.DataFrame(
np.array([f"{x[0][0]} - {mapping[int(x[1])]}{x[0][1] 1}"
for x in itertools.product(df.index, df.columns)])
.reshape(len(df), -1)
)
)
print(out)
0 1 2
0 Sheet1 - A1 Sheet1 - B1 Sheet1 - C1
1 Sheet1 - A2 Sheet1 - B2 Sheet1 - C2
2 Sheet2 - A1 Sheet2 - B1 Sheet2 - C1
3 Sheet2 - A2 Sheet2 - B2 Sheet2 - C2
4 Sheet3 - A1 Sheet3 - B1 Sheet3 - C1
5 Sheet3 - A2 Sheet3 - B2 Sheet3 - C2