Home > Software engineering >  How to read unmerged column in Pandas and transpose them
How to read unmerged column in Pandas and transpose them

Time:01-04

I have a excel with multiple sheets in the below format. I need to create a single dataframe by concatenating all the sheets, unmerging the cell and then transposing them into a column based on the sheet

Sheet 1: enter image description here

Sheet 2: enter image description here

Final Dataframe should look like below Result expected - I need the below format with an extra coulmn as below

enter image description here

Code So far: Reading File:

df = pd.concat(pd.read_excel('/Users/john/Desktop/Internal/Raw Files/Med/Dig/File_2017_2022.xlsx', sheet_name=None, skiprows=1))

Creating Column :

df_1 = pd.concat([df.assign(name=n) for n,df in dfs.items()])

CodePudding user response:

I created my own version of your excel, which looks like this

The code below is far from perfect but it should do fine as long as you do not have millions of sheets

# First, obtain all sheet names
full_df = pd.read_excel(r'C:\Users\.\Downloads\test.xlsx', 
sheet_name=None, skiprows=0)

# Store them into a list
sheet_names = list(full_df.keys())

# Create an empty Dataframe to store the contents from each sheet
final_df = pd.DataFrame()
for sheet in sheet_names:
    df = pd.read_excel(r'C:\Users\.\Downloads\test.xlsx', sheet_name=sheet, skiprows=0)
    # Get the brand name 
    brand = df.columns[1]

    # Remove the header columns and keep the numerical values only
    df.columns = df.iloc[0]
    df = df[1:]
    df = df.iloc[:, 1:]

    # Set the brand name into a new column
    df['Brand'] = brand

    # Append into the final dataframe
    final_df = pd.concat([final_df, df])

Your final_df should look like this once exported back to excel

EDIT: You might need to drop the dataframe's index upon saving it by using the df.reset_index(drop=True) function, to remove the first column shown in the image right above.

CodePudding user response:

Use:

dfs = pd.read_excel('Input.xlsx',sheet_name=None, header=[0,1], index_col=[0,1])

df_1 = (pd.concat([df.stack(0).assign(name=n) for n,df in dfs.items()])
          .rename_axis(index=['Date','WK','Brand'], columns=None)
          .reset_index())

print (df_1)
    Date  WK Brand        A1     A10       A11     A12      A13      A2  \
0      1  20   XYZ  0.879300  0.0261  0.231800  0.2412  0.27180  0.7796   
1      2  21   XYZ  0.807900  0.8426  0.386300  0.8667  0.84040  0.8457   
2      3  22   XYZ  0.872900  0.6538  0.681700  0.0752  0.68900  0.4728   
3      4  23   XYZ  0.954200  0.4246  0.642500  0.4989  0.69840  0.6284   
4      5  24   XYZ  0.110200  0.2384  0.817600  0.1307  0.70820  0.6780   
5      6  25   XYZ  0.162800  0.6447  0.004100  0.0528  0.33490  0.1759   
6      7  26   XYZ  0.594600  0.9878  0.711700  0.9426  0.79770  0.0709   
7      8  27   XYZ  0.191000  0.9067  0.800300  0.0115  0.77450  0.3929   
8      9  28   XYZ  0.495700  0.5614  0.612100  0.3440  0.83600  0.7441   
9     10  29   XYZ  0.732700  0.8441  0.190500  0.3554  0.77730  0.3197   
10    11  30   XYZ  0.182700  0.0233  0.312400  0.1103  0.14030  0.7587   
11     1  20   ABC  0.452776  0.9780  0.709827  0.2569  0.82034  0.0330   
12     2  21   ABC  0.639173  0.5810  0.224544  0.9578  0.31603  0.8100   
13     3  22   ABC  0.781147  0.5610  0.393751  0.2752  0.89457  0.1560   
14     4  23   ABC  0.267159  0.0140  0.510502  0.8702  0.02365  0.0590   
15     5  24   ABC  0.533062  0.9880  0.423679  0.8909  0.19700  0.5450   
16     6  25   ABC  0.212764  0.5850  0.295257  0.8627  0.02916  0.6440   
17     7  26   ABC  0.210353  0.0290  0.344722  0.5893  0.58645  0.2710   
18     8  27   ABC  0.400770  0.4960  0.657140  0.1964  0.24827  0.4680   
19     9  28   ABC  0.784346  0.6200  0.389639  0.6611  0.96997  0.7160   
20    10  29   ABC  0.563395  0.9440  0.111782  0.9427  0.97576  0.8460   

        A3      A4      A5       A6      A7        A8       A9    name  
0   0.9385  0.7753  0.7786  0.07560  0.6403  0.436500  0.02470  Sheet1  
1   0.8395  0.7836  0.6573  0.25450  0.9042  0.105200  0.86310  Sheet1  
2   0.2821  0.1440  0.2007  0.22530  0.6513  0.469500  0.95240  Sheet1  
3   0.6310  0.9538  0.0459  0.64120  0.3661  0.255400  0.32200  Sheet1  
4   0.2798  0.4732  0.7211  0.05150  0.3435  0.820600  0.65000  Sheet1  
5   0.4093  0.7066  0.0728  0.05840  0.3661  0.429500  0.42950  Sheet1  
6   0.3238  0.0201  0.3456  0.14760  0.9303  0.133900  0.22700  Sheet1  
7   0.6638  0.9480  0.0811  0.34470  0.0816  0.089800  0.17560  Sheet1  
8   0.8123  0.7700  0.3594  0.13080  0.7899  0.206200  0.44610  Sheet1  
9   0.7114  0.3807  0.8601  0.60490  0.6659  0.157800  0.56270  Sheet1  
10  0.3607  0.2414  0.6609  0.92080  0.9176  0.740000  0.13420  Sheet1  
11  0.0389  0.6040  0.6508  0.57646  0.6231  0.985285  0.88505  Sheet2  
12  0.7297  0.1390  0.2412  0.30669  0.3714  0.572789  0.73314  Sheet2  
13  0.7449  0.8730  0.0929  0.24519  0.6180  0.464157  0.09081  Sheet2  
14  0.9603  0.6090  0.9586  0.42426  0.5402  0.383903  0.56418  Sheet2  
15  0.5817  0.8250  0.6945  0.19948  0.0507  0.071899  0.70657  Sheet2  
16  0.9808  0.3340  0.8652  0.09141  0.5028  0.400358  0.19238  Sheet2  
17  0.0821  0.2770  0.0378  0.55174  0.1446  0.291331  0.11745  Sheet2  
18  0.9705  0.7380  0.3926  0.78333  0.8904  0.887290  0.84794  Sheet2  
19  0.3167  0.6270  0.7094  0.01363  0.7816  0.143793  0.49700  Sheet2  
20  0.4576  0.0220  0.5176  0.17527  0.2589  0.223100  0.54703  Sheet2  
  • Related