Home > database >  How to reate a dataframe based on excel sheet name and cell position?
How to reate a dataframe based on excel sheet name and cell position?

Time:11-21

I have an excel table (sample.xlsx) which contains 3 sheets ('Sheet1','Sheet2','Sheet3'). Now I have read all the sheets and combine them into one dataframe.

import pandas as pd
data_df = pd.concat(pd.read_excel("sample.xlsx", header=None, index_col=None, sheet_name=None))

data_df looks like this:

              0      1      2
Sheet1 0   val1   val2   val3
       1  val11  val21  val31
Sheet2 0   val1   val2   val3
       1  val11  val21  val31
Sheet3 0   val1   val2   val3
       1  val11  val21  val31

Is there any way to create a new dataframe which has the same shape with data_df but each cell value is the cell position info?

I have tried to get multiple index:

multi_index = data_df.index.levels[:]

and I get:

[['Sheet1', 'Sheet2', 'Sheet3'], [0, 1]]

But I don't know how to use these data to create a new dataframe like this:

             0            1            2
0  Sheet1 - A1  Sheet1 - B1  Sheet1 - C1
1  Sheet1 - A2  Sheet1 - B2  Sheet1 - C2
2  Sheet2 - A1  Sheet2 - B1  Sheet2 - C1
3  Sheet2 - A2  Sheet2 - B2  Sheet2 - C2
4  Sheet3 - A1  Sheet3 - B1  Sheet3 - C1
5  Sheet3 - A2  Sheet3 - B2  Sheet3 - C2

Thanks in advance!

CodePudding user response:

Look into the “openpyxl” package. I think you can do something like “for sheet in sheets” then read each sheet to your data frame like that. Hope this guides you to the right place.

CodePudding user response:

Since the values in your data_df don't matter, you could build the cartesian product of index and columns and build a new dataframe of it.

mapping = dict(enumerate('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

mapping is needed to convert the column numbers 0,1,2,... to A,B,C,...

import itertools
import numpy as np

out = (
    pd.DataFrame(
        np.array([f"{x[0][0]} - {mapping[int(x[1])]}{x[0][1] 1}" 
                  for x in itertools.product(df.index, df.columns)])
        .reshape(len(df), -1)
    )
)

print(out)
             0            1            2
0  Sheet1 - A1  Sheet1 - B1  Sheet1 - C1
1  Sheet1 - A2  Sheet1 - B2  Sheet1 - C2
2  Sheet2 - A1  Sheet2 - B1  Sheet2 - C1
3  Sheet2 - A2  Sheet2 - B2  Sheet2 - C2
4  Sheet3 - A1  Sheet3 - B1  Sheet3 - C1
5  Sheet3 - A2  Sheet3 - B2  Sheet3 - C2
  • Related