After expanding list to columns and applying the multi-index, then I created a new dataframe. Unfortunately, the result for each column does not appear as in the screenshot. Can you tell me what should I do, please?
process_steps_n = [1, 2, 3, 4]
Processing =
0 [127, 178, 49, 298, 262]
1 [380, 400, 48, 210, 134]
2 [343, 484, 459, 137, 324]
3 [441, 210, 213, 247, 109]
Cleaning =
0 [75, 397, 83, 211, 80]
1 [211, 254, 88, 491, 82]
2 [213, 0, 20, 250, 261]
3 [260, 243, 157, 446, 318]
df_rawdata = pd.DataFrame(list(zip(process_steps_n, p,c)),columns =['Steps','Processing','Cleaning'])
df1 = df_rawdata['Processing'].apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
df2 = df_rawdata['Cleaning'].apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
outp = pd.concat([df1], keys=['Processing'], axis=1)
outc = pd.concat([df2], keys=['Cleaning'], axis=1)
df_rawdata2 = pd.DataFrame(list(zip(process_steps_n, outp,outc)),columns =['Steps','Processing','Cleaning'])
cc: @Panda Kim
CodePudding user response:
Example
i make answer by using minimal example.
df = pd.DataFrame([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], index=['step1', 'step2'], columns=['process', 'clean'])
df
process clean
step1 [1, 2] [3, 4]
step2 [5, 6] [7, 8]
Code
out = (df.stack()
.apply(pd.Series).rename(columns=lambda x: f'sp {x 1}')
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))
out
process clean
sp 1 sp 2 sp 1 sp 2
step1 1 2 3 4
step2 5 6 7 8
CodePudding user response:
First for performance dont use .apply(pd.Series)
, it is slow, instead use DataFrame
constructor with convert list columns to lists:
df1 = pd.DataFrame(df_rawdata['Processing'].tolist()).rename(columns=lambda x: f'sp {x 1}')
df2 = pd.DataFrame(df_rawdata['Cleaning'].tolist()).rename(columns=lambda x: f'sp {x 1}')
Then join both DataFrames together with keys
parameters for MultiIndex
and add column Steps
- there is no MultiIndex
, so added to index
:
df = pd.concat([df1, df2], axis=1, keys=['Processing','Cleaning']).set_index(df_rawdata['Steps'])
print (df)
Processing Cleaning
sp 1 sp 2 sp 3 sp 4 sp 5 sp 1 sp 2 sp 3 sp 4 sp 5
Steps
1 127 178 49 298 262 75 397 83 211 80
2 380 400 48 210 134 211 254 88 491 82
3 343 484 459 137 324 213 0 20 250 261
4 441 210 213 247 109 260 243 157 446 318
If need MultiIndex
also with Steps
column use:
df = pd.concat([df_rawdata['Steps'], df1, df2], axis=1, keys=['', 'Processing','Cleaning'])
print (df)
Processing Cleaning
Steps sp 1 sp 2 sp 3 sp 4 sp 5 sp 1 sp 2 sp 3 sp 4 sp 5
0 1 127 178 49 298 262 75 397 83 211 80
1 2 380 400 48 210 134 211 254 88 491 82
2 3 343 484 459 137 324 213 0 20 250 261
3 4 441 210 213 247 109 260 243 157 446 318