I have the following df, the last column is the desired output. thanks!
group date value desired_first_nonzero
1 jan2019 0 2
1 jan2019 2 2
1 feb2019 3 2
1 mar2019 4 2
1 mar2019 5 2
2 feb2019 0 4
2 feb2019 0 4
2 mar2019 0 4
2 mar2019 4 4
2 apr2019 5 4
I want to group by "group" and find the first non-zero value
CodePudding user response:
You can use GroupBy.transform
with a custom function to get the index of the first non-zero value with idxmax
(that return the first True
value here):
df['desired_first_nonzero'] = (df.groupby('group')['value']
.transform(lambda s: s[s.ne(0).idxmax()])
)
alternatively, using an intermediate Series:
s = df.set_index('group')['value']
df['desired_first_nonzero'] = df['group'].map(s[s.ne(0)].groupby(level=0).first())
output:
group date value desired_first_nonzero
0 1 jan2019 0 2
1 1 jan2019 2 2
2 1 feb2019 3 2
3 1 mar2019 4 2
4 1 mar2019 5 2
5 2 feb2019 0 4
6 2 feb2019 0 4
7 2 mar2019 0 4
8 2 mar2019 4 4
9 2 apr2019 5 4
CodePudding user response:
This should do the job:
# the given example
d = {'group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], 'value': [0, 2, 3, 4, 5, 0, 0, 0, 4, 5]}
df = pd.DataFrame(data=d)
first_non_zero = pd.DataFrame(df[df['value'] != 0].groupby('group').head(1))
print(first_non_zero)
Output:
group value
1 1 2
8 2 4
Then you can distributed as needed for each group row.