I have a dataframe with column names ranging from 5 to 100 each one having both pred_ and real_ before their respective numeric part. For example, the first column is real_5, followed by pred_5, real_6, pred_6 and so forth until pred_100 and last column name as real_100. All the columns have the same number of rows (15000).
I want to create a new dataframe which has only two new columns, the first one being the vertical stack of all the real columns placed on top of each other and the second column to be vertical stack of all the pred columns placed on top of each other. I assume the number of rows in the new dataframe would be (100-5)*15000.
Is there any easy way to do this manual operation in Python?
CodePudding user response:
You can split the columns to MultiIndex and stack
:
out = (df
.set_axis(df.columns.str.split('_', expand=True), axis=1)
.stack()
)
example output:
pred real
0 5 1 0
6 3 2
7 5 4
1 5 7 6
6 9 8
7 11 10
2 5 13 12
6 15 14
7 17 16
used input:
real_5 pred_5 real_6 pred_6 real_7 pred_7
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
CodePudding user response:
Given the dataframe below :
real_5 pred_5 real_6 pred_6 real_7 pred_7
0 val1 val2 val3 val4 val5 val6
1 val7 val8 val9 val10 val11 val12
2 val13 val14 val15 val16 val17 val18
You can use pandas.melt
and pandas.concat
to stack columns (with the same prefix) one below the other.
real_df = df.filter(like='real').melt()
pred_df = df.filter(like='pred').melt()
result = (
pd.concat([real_df, pred_df], axis=1, ignore_index=True)
.drop(columns=[0,2])
.rename(columns={1: 'real', 3: 'prev'})
)
# Output :
print(result)
real prev
0 val1 val2
1 val7 val8
2 val13 val14
3 val3 val4
4 val9 val10
5 val15 val16
6 val5 val6
7 val11 val12
8 val17 val18