I need to use the transform function to sum column 'bar' when I group by column 'foo'. I use the following code
df.groupby(['foo'])['bar'].transform(np.sum)
However, when all the values in 'bar' are NaNs, my desired output is NaNs but the above code returns zero instead. How can I fix this? I know in the sum function I can use min_count = 1 but I am not sure how to use that in the above context.
CodePudding user response:
sum
method has min_count
argument that controls the required number of non nan values to sum. If there are fewer than min_count
non nan values, the result is nan.
# at least one non nan value must be there in order to sum
df.groupby(['foo'])['bar'].transform('sum', min_count=1)