Home > Enterprise >  How to Create new dataframe columns from existing column and sort the values from smallest to larges
How to Create new dataframe columns from existing column and sort the values from smallest to larges

Time:09-23

I want to locate all values greater than 0 within columns n_1 to n_3 inclusive, and populate them into columns new_1 to new_3 inclusive in order of smallest to largest, such that column new_1 has the smallest value and new_3 has the largest value. If any columns are not populated because there are not enough values to do so, then populate them with 0

I have asked a similar question before but the answer does not align with expected output

EVENT_ID    n_1      n_2    n_3
            
143419013   0.00    7.80    12.83
143419017   1.72    20.16   16.08
143419021   3.03    12.00   17.14
143419025   2.63    0.00    2.51
143419028   2.38    22.00   2.96
143419030   0.00    40.00   0.00

Expected Output:

EVENT_ID    n_1      n_2    n_3    new_1  new_2   new_3             
143419013   0.00    7.80    12.83  7.80    12.83  0.00
143419017   1.72    20.16   16.08  1.72   16.08   20.16 
143419021   3.03    12.00   17.14  3.03   12.00   17.14
143419025   2.63    0.00    2.51   2.51   13.78   0.00
143419028   2.38    22.00   2.96   2.38   2.96    22.00 
143419030   3.92    40.00   11.23  40.00  0.00    0.00      

This was the solution to the previous question. The below solution create a new column and sort values from 0 to largest, rather I want the new column to have values from 1 onwards. column towards the end that was not filled due to insufficient value can be filled with 0

df[['new_1', 'new_2', 'new_3']] = np.sort(
    df[['n_1', 'n_2', 'n_3']].fillna(0),
    axis=1
)

CodePudding user response:

The trick here is to replace 0.0 by np.inf and sort your values in ascending order and rollback the replaced values:

new_df = df.filter(like='n_') \
           .replace(0., np.inf) \
           .apply(lambda x: sorted(x), axis=1, result_type='expand') \
           .replace(np.inf, 0.0)

new_df.columns = ['new_1', 'new_2', 'new_3']

out = pd.concat([df, new_df], axis=1)
>>> out
    EVENT_ID   n_1    n_2    n_3  new_1  new_2  new_3
0  143419013  0.00   7.80  12.83   7.80  12.83   0.00
1  143419017  1.72  20.16  16.08   1.72  16.08  20.16
2  143419021  3.03  12.00  17.14   3.03  12.00  17.14
3  143419025  2.63   0.00   2.51   2.51   2.63   0.00
4  143419028  2.38  22.00   2.96   2.38   2.96  22.00
5  143419030  0.00  40.00   0.00  40.00   0.00   0.00

CodePudding user response:

Zeroes at the end

Took an approach similar to the original near-solution:

import pandas as pd
import numpy as np
data = [(143419013,0.00,7.80,12.83),
(143419017,1.72,20.16,16.08),
(143419021,3.03,12.00,17.14),
(143419025,2.63,0.00,2.51),
(143419028,2.38,22.00,2.96),
(143419030,0.00,40.00,0.00)]
df = pd.DataFrame(data, columns=["EVENT_ID", "n_1", "n_2", "n_3"])
df[['new_1', 'new_2', 'new_3']] = np.sort(df[['n_1', 'n_2', 'n_3']] \
    .replace(0.0, np.nan), axis=1)
df.fillna("0.0", inplace=True)
print(df)

    EVENT_ID   n_1    n_2    n_3  new_1  new_2  new_3
0  143419013  0.00   7.80  12.83   7.80  12.83    0.0
1  143419017  1.72  20.16  16.08   1.72  16.08  20.16
2  143419021  3.03  12.00  17.14   3.03   12.0  17.14
3  143419025  2.63   0.00   2.51   2.51   2.63    0.0
4  143419028  2.38  22.00   2.96   2.38   2.96   22.0
5  143419030  0.00  40.00   0.00  40.00    0.0    0.0

Steps of this approach:

  1. Create a copy of the n_1 - n_3 original columns, with 0.0 replaced with NaN ("not a number")
  2. Sort these (NaN ends up at the end)
  3. Add these as the new fields to the dataframe (including the NaN values)
  4. Replace any NaN values in the dataframe with 0.0 (using fillna)

(Works in the highly-unlikely case that any of your source values are infinity!)


Anything < 1 becomes 0.0

The above solution doesn't quite exactly fulfil the question, as it's supposed to sort items from 1 onwards, not items which are non-zero.

This slightly more clunky version produces only values from 1 or greater. Not sure which version is the one which is actually required.

data = [(143419013,0.00,7.80,12.83),
(143419017,1.72,20.16,16.08),
(143419021,3.03,12.00,17.14),
(143419025,2.63,0.00,2.51),
(143419028,2.38,22.00,2.96),
(143419030,0.00,40.00,0.00),
# Extra example row, with a negative value, and a value between 0 and 1
(143419030,-1.00,0.40,27.00)]
df = pd.DataFrame(data, columns=["EVENT_ID", "n_1", "n_2", "n_3"])
df[['new_1', 'new_2', 'new_3']] = np.sort(df[['n_1', 'n_2', 'n_3']] \
    .applymap(lambda v : v if v >= 1 else np.nan), axis=1)
df.fillna("0.0", inplace=True)
print(df)

    EVENT_ID   n_1    n_2    n_3  new_1  new_2  new_3
0  143419013  0.00   7.80  12.83   7.80  12.83    0.0
1  143419017  1.72  20.16  16.08   1.72  16.08  20.16
2  143419021  3.03  12.00  17.14   3.03   12.0  17.14
3  143419025  2.63   0.00   2.51   2.51   2.63    0.0
4  143419028  2.38  22.00   2.96   2.38   2.96   22.0
5  143419030  0.00  40.00   0.00  40.00    0.0    0.0
6  143419030 -1.00   0.40  27.00  27.00    0.0    0.0
  • Related