Home > Enterprise >  Create new columns from existing column and sort the values from smallest to largest
Create new columns from existing column and sort the values from smallest to largest

Time:09-23

I want to locate all values greater than 0 within columns n_1 to n_3 inclusive, and populate them into columns new_1 to new_3 inclusive in order of smallest to largest, such that column new_1 has the smallest value and new_3 has the largest value. If any columns are not populated because there are not enough values to do so, then populate them with 0

EVENT_ID    n_1      n_2    n_3
            
143419013   0.00    7.80    12.83
143419017   1.72    20.16   16.08
143419021   3.03    12.00   17.14
143419025   2.63    0.00    2.51
143419028   2.38    22.00   2.96
143419030   0.00    40.00   0.00

Expected Output:

EVENT_ID    n_1      n_2    n_3    new_1  new_2   new_3             
143419013   0.00    7.80    12.83  7.80    12.83  0.00
143419017   1.72    20.16   16.08  1.72   16.08   20.16 
143419021   3.03    12.00   17.14  3.03   12.00   17.14
143419025   2.63    0.00    2.51   2.51   13.78   0.00
143419028   2.38    22.00   2.96   2.38   2.96    22.00 
143419030   3.92    40.00   11.23  40.00  0.00    0.00      

I tried using the apply function to create this new column but I got an error down the line.

df[['new_1','new_2','new_3']] = pivot_df.apply(lambda a,b,c: a.n_1, b.n_2, c.n_3 axis=1)

CodePudding user response:

Let's subset the DataFrame remove values that do not meet the condition with where, then use np.sort to sort across rows and fillna to replace any missing values with 0:

cols = ['n_1', 'n_2', 'n_3']
df[[f'new_{i}' for i in range(1, len(cols)   1)]] = pd.DataFrame(
    np.sort(df[cols].where(df[cols] > 0), axis=1)
).fillna(0)

df:

    EVENT_ID   n_1    n_2    n_3  new_1  new_2  new_3
0  143419013  0.00   7.80  12.83   7.80  12.83   0.00
1  143419017  1.72  20.16  16.08   1.72  16.08  20.16
2  143419021  3.03  12.00  17.14   3.03  12.00  17.14
3  143419025  2.63   0.00   2.51   2.51   2.63   0.00
4  143419028  2.38  22.00   2.96   2.38   2.96  22.00
5  143419030  0.00  40.00   0.00  40.00   0.00   0.00

Setup used:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'EVENT_ID': [143419013, 143419017, 143419021, 143419025, 143419028,
                 143419030],
    'n_1': [0.0, 1.72, 3.03, 2.63, 2.38, 0.0],
    'n_2': [7.8, 20.16, 12.0, 0.0, 22.0, 40.0],
    'n_3': [12.83, 16.08, 17.14, 2.51, 2.96, 0.0]
})

df:

    EVENT_ID   n_1    n_2    n_3
0  143419013  0.00   7.80  12.83
1  143419017  1.72  20.16  16.08
2  143419021  3.03  12.00  17.14
3  143419025  2.63   0.00   2.51
4  143419028  2.38  22.00   2.96
5  143419030  0.00  40.00   0.00
  • Related