Home > Back-end >  How to vertically stack every other columns of a dataframe into a new column
How to vertically stack every other columns of a dataframe into a new column

Time:10-02

I have a dataframe with column names ranging from 5 to 100 each one having both pred_ and real_ before their respective numeric part. For example, the first column is real_5, followed by pred_5, real_6, pred_6 and so forth until pred_100 and last column name as real_100. All the columns have the same number of rows (15000).

I want to create a new dataframe which has only two new columns, the first one being the vertical stack of all the real columns placed on top of each other and the second column to be vertical stack of all the pred columns placed on top of each other. I assume the number of rows in the new dataframe would be (100-5)*15000.

Is there any easy way to do this manual operation in Python?

CodePudding user response:

You can split the columns to MultiIndex and stack:

out = (df
 .set_axis(df.columns.str.split('_', expand=True), axis=1)
 .stack()
)

example output:

     pred  real
0 5     1     0
  6     3     2
  7     5     4
1 5     7     6
  6     9     8
  7    11    10
2 5    13    12
  6    15    14
  7    17    16

used input:

   real_5  pred_5  real_6  pred_6  real_7  pred_7
0       0       1       2       3       4       5
1       6       7       8       9      10      11
2      12      13      14      15      16      17

CodePudding user response:

Given the dataframe below :

  real_5 pred_5 real_6 pred_6 real_7 pred_7
0   val1   val2   val3   val4   val5   val6
1   val7   val8   val9  val10  val11  val12
2  val13  val14  val15  val16  val17  val18

You can use pandas.melt and pandas.concat to stack columns (with the same prefix) one below the other.

real_df = df.filter(like='real').melt()
pred_df = df.filter(like='pred').melt()

result = (
            pd.concat([real_df, pred_df], axis=1, ignore_index=True)
              .drop(columns=[0,2])
              .rename(columns={1: 'real', 3: 'prev'})
         )

# Output :

print(result)

    real   prev
0   val1   val2
1   val7   val8
2  val13  val14
3   val3   val4
4   val9  val10
5  val15  val16
6   val5   val6
7  val11  val12
8  val17  val18
  • Related