Need to fill the missing value-using mode method(most frequently occuring value).
Product | Another |
---|---|
A | 3 |
A | 3 |
A | Nan |
A | Nan |
B | 4 |
B | 4 |
B | Nan |
B | Nan |
c | 5 |
C | 5 |
Output needs as below:
Product | Another |
---|---|
A | 3 |
A | 3 |
A | 3 |
A | 3 |
B | 4 |
B | 4 |
B | 4 |
B | 4 |
c | 5 |
C | 5 |
If the product is A, the value should be 3 and for B, it is 4.
CodePudding user response:
You can use a custom groupby.apply
. Note that there can be several modes, we take the first available one here:
df['Another'] = (df.groupby('Product')['Another']
.apply(lambda g: g.fillna(g.mode()[0], downcast='infer'))
)
output:
Product Another
0 A 3
1 A 3
2 A 3
3 A 3
4 B 4
5 B 4
6 B 4
7 B 4
Alternative
If you expect a single valid value per group, use groupby.transform('first')
instead:
df['Another'] = df.groupby('Product')['Another'].transform('first')
CodePudding user response:
Solution
import pandas as pd
dict = {"Product":["A", "A", "A", "A", "B", "B", "B", "B"], "Another" :
[3,3,None,None,4,4,None,None]}
df = pd.DataFrame(dict)
null_data = df[df.isnull().any(axis=1)]
mode = df['Another'].mode()
df.loc[df.Product == "A", "Another"] = mode[0]
df.loc[df.Product == "B", "Another"] = mode[1]