Home > Net >  How to index the elements in a list for pandas? (Part 2)
How to index the elements in a list for pandas? (Part 2)

Time:12-23

After expanding list to columns and applying the multi-index, then I created a new dataframe. Unfortunately, the result for each column does not appear as in the screenshot. Can you tell me what should I do, please?


process_steps_n = [1, 2, 3, 4]

Processing =  
0 [127, 178, 49, 298, 262]
1 [380, 400, 48, 210, 134]
2 [343, 484, 459, 137, 324]
3 [441, 210, 213, 247, 109]

Cleaning = 
0 [75, 397, 83, 211, 80]
1 [211, 254, 88, 491, 82]
2 [213, 0, 20, 250, 261]
3 [260, 243, 157, 446, 318]

df_rawdata = pd.DataFrame(list(zip(process_steps_n, p,c)),columns =['Steps','Processing','Cleaning'])
df1 = df_rawdata['Processing'].apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
df2 = df_rawdata['Cleaning'].apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
outp = pd.concat([df1], keys=['Processing'], axis=1)
outc = pd.concat([df2], keys=['Cleaning'], axis=1)

df_rawdata2 = pd.DataFrame(list(zip(process_steps_n, outp,outc)),columns =['Steps','Processing','Cleaning'])

Result from pandas

cc: @Panda Kim

CodePudding user response:

Example

i make answer by using minimal example.

df = pd.DataFrame([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], index=['step1', 'step2'], columns=['process', 'clean'])

df

        process clean
step1   [1, 2]  [3, 4]
step2   [5, 6]  [7, 8]

Code

out = (df.stack()
       .apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
       .unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))

out

        process         clean
        sp 1    sp 2    sp 1    sp 2
step1   1       2       3       4
step2   5       6       7       8

CodePudding user response:

First for performance dont use .apply(pd.Series), it is slow, instead use DataFrame constructor with convert list columns to lists:

df1 = pd.DataFrame(df_rawdata['Processing'].tolist()).rename(columns=lambda x: f'sp {x 1}')
df2 = pd.DataFrame(df_rawdata['Cleaning'].tolist()).rename(columns=lambda x: f'sp {x 1}')

Then join both DataFrames together with keys parameters for MultiIndex and add column Steps - there is no MultiIndex, so added to index:

df = pd.concat([df1, df2], axis=1, keys=['Processing','Cleaning']).set_index(df_rawdata['Steps'])
print (df)
      Processing                     Cleaning                    
            sp 1 sp 2 sp 3 sp 4 sp 5     sp 1 sp 2 sp 3 sp 4 sp 5
Steps                                                            
1            127  178   49  298  262       75  397   83  211   80
2            380  400   48  210  134      211  254   88  491   82
3            343  484  459  137  324      213    0   20  250  261
4            441  210  213  247  109      260  243  157  446  318

If need MultiIndex also with Steps column use:

df = pd.concat([df_rawdata['Steps'], df1, df2], axis=1, keys=['', 'Processing','Cleaning'])
print (df)
        Processing                     Cleaning                    
  Steps       sp 1 sp 2 sp 3 sp 4 sp 5     sp 1 sp 2 sp 3 sp 4 sp 5
0     1        127  178   49  298  262       75  397   83  211   80
1     2        380  400   48  210  134      211  254   88  491   82
2     3        343  484  459  137  324      213    0   20  250  261
3     4        441  210  213  247  109      260  243  157  446  318
  • Related