I want to locate all values greater than 0 within columns n_1 to n_3 inclusive, and populate them into columns new_1 to new_3 inclusive in order of smallest to largest, such that column new_1 has the smallest value and new_3 has the largest value. If any columns are not populated because there are not enough values to do so, then populate them with 0
I have asked a similar question before but the answer does not align with expected output
EVENT_ID n_1 n_2 n_3
143419013 0.00 7.80 12.83
143419017 1.72 20.16 16.08
143419021 3.03 12.00 17.14
143419025 2.63 0.00 2.51
143419028 2.38 22.00 2.96
143419030 0.00 40.00 0.00
Expected Output:
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
143419013 0.00 7.80 12.83 7.80 12.83 0.00
143419017 1.72 20.16 16.08 1.72 16.08 20.16
143419021 3.03 12.00 17.14 3.03 12.00 17.14
143419025 2.63 0.00 2.51 2.51 13.78 0.00
143419028 2.38 22.00 2.96 2.38 2.96 22.00
143419030 3.92 40.00 11.23 40.00 0.00 0.00
This was the solution to the previous question. The below solution create a new column and sort values from 0 to largest, rather I want the new column to have values from 1 onwards. column towards the end that was not filled due to insufficient value can be filled with 0
df[['new_1', 'new_2', 'new_3']] = np.sort(
df[['n_1', 'n_2', 'n_3']].fillna(0),
axis=1
)
CodePudding user response:
The trick here is to replace 0.0 by np.inf and sort your values in ascending order and rollback the replaced values:
new_df = df.filter(like='n_') \
.replace(0., np.inf) \
.apply(lambda x: sorted(x), axis=1, result_type='expand') \
.replace(np.inf, 0.0)
new_df.columns = ['new_1', 'new_2', 'new_3']
out = pd.concat([df, new_df], axis=1)
>>> out
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
0 143419013 0.00 7.80 12.83 7.80 12.83 0.00
1 143419017 1.72 20.16 16.08 1.72 16.08 20.16
2 143419021 3.03 12.00 17.14 3.03 12.00 17.14
3 143419025 2.63 0.00 2.51 2.51 2.63 0.00
4 143419028 2.38 22.00 2.96 2.38 2.96 22.00
5 143419030 0.00 40.00 0.00 40.00 0.00 0.00
CodePudding user response:
Zeroes at the end
Took an approach similar to the original near-solution:
import pandas as pd
import numpy as np
data = [(143419013,0.00,7.80,12.83),
(143419017,1.72,20.16,16.08),
(143419021,3.03,12.00,17.14),
(143419025,2.63,0.00,2.51),
(143419028,2.38,22.00,2.96),
(143419030,0.00,40.00,0.00)]
df = pd.DataFrame(data, columns=["EVENT_ID", "n_1", "n_2", "n_3"])
df[['new_1', 'new_2', 'new_3']] = np.sort(df[['n_1', 'n_2', 'n_3']] \
.replace(0.0, np.nan), axis=1)
df.fillna("0.0", inplace=True)
print(df)
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
0 143419013 0.00 7.80 12.83 7.80 12.83 0.0
1 143419017 1.72 20.16 16.08 1.72 16.08 20.16
2 143419021 3.03 12.00 17.14 3.03 12.0 17.14
3 143419025 2.63 0.00 2.51 2.51 2.63 0.0
4 143419028 2.38 22.00 2.96 2.38 2.96 22.0
5 143419030 0.00 40.00 0.00 40.00 0.0 0.0
Steps of this approach:
- Create a copy of the n_1 - n_3 original columns, with 0.0 replaced with NaN ("not a number")
- Sort these (NaN ends up at the end)
- Add these as the new fields to the dataframe (including the NaN values)
- Replace any NaN values in the dataframe with 0.0 (using
fillna
)
(Works in the highly-unlikely case that any of your source values are infinity!)
Anything < 1 becomes 0.0
The above solution doesn't quite exactly fulfil the question, as it's supposed to sort items from 1 onwards, not items which are non-zero.
This slightly more clunky version produces only values from 1 or greater. Not sure which version is the one which is actually required.
data = [(143419013,0.00,7.80,12.83),
(143419017,1.72,20.16,16.08),
(143419021,3.03,12.00,17.14),
(143419025,2.63,0.00,2.51),
(143419028,2.38,22.00,2.96),
(143419030,0.00,40.00,0.00),
# Extra example row, with a negative value, and a value between 0 and 1
(143419030,-1.00,0.40,27.00)]
df = pd.DataFrame(data, columns=["EVENT_ID", "n_1", "n_2", "n_3"])
df[['new_1', 'new_2', 'new_3']] = np.sort(df[['n_1', 'n_2', 'n_3']] \
.applymap(lambda v : v if v >= 1 else np.nan), axis=1)
df.fillna("0.0", inplace=True)
print(df)
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
0 143419013 0.00 7.80 12.83 7.80 12.83 0.0
1 143419017 1.72 20.16 16.08 1.72 16.08 20.16
2 143419021 3.03 12.00 17.14 3.03 12.0 17.14
3 143419025 2.63 0.00 2.51 2.51 2.63 0.0
4 143419028 2.38 22.00 2.96 2.38 2.96 22.0
5 143419030 0.00 40.00 0.00 40.00 0.0 0.0
6 143419030 -1.00 0.40 27.00 27.00 0.0 0.0