Here is my DataFrame.
df = pd.DataFrame ( {'CNN': ['iphone 11 63 GB TMO','iphone 11 128 GB ATT','iphone 11 other carrier','iphone 12 256 GB TMO','iphone 12 64 GB TMO','iphone 12 other carrier'],
'Family Name':['iphone 11', 'iphone 11', 'iphone 11', 'iphone 12', 'iphone 12', 'iphone 12'],
'Storage': [63, 128,np.nan, 256,64, np.nan]})
Output:
CNN Family Name Storage
0 iphone 11 63 GB TMO iphone 11 63.0
1 iphone 11 128 GB ATT iphone 11 128.0
2 iphone 11 other carrier iphone 11 NaN
3 iphone 12 256 GB TMO iphone 12 256.0
4 iphone 12 64 GB TMO iphone 12 64.0
5 iphone 12 other carrier iphone 12 NaN
What I am trying to achieve is find NAs. Criteria is minimum of storage from group(Family Name). I have tried to group by and fillna(min()) but it doesnt seems to be working.
#Tried
df["Storage"] = df.groupby("Family Name").apply(lambda x: x.fillna(x.min()))
Here is final output expected.
Expected Output:
CNN Family Name Storage
0 iphone 11 63 GB TMO iphone 11 63.0
1 iphone 11 128 GB ATT iphone 11 128.0
2 iphone 11 other carrier iphone 11 63.0
3 iphone 12 256 GB TMO iphone 12 256.0
4 iphone 12 64 GB TMO iphone 12 64.0
5 iphone 12 other carrier iphone 12 64.0
CodePudding user response:
Use groupby.transform
with fillna
:
df['Storage'] = df['Storage'].fillna(df.groupby('Family Name')['Storage'].transform('min'))
Or, potentially more efficient, groupby.min
and map
with boolean indexing depending on the size of the DataFrame, number of NaNs (few of them), and number of groups:
s = df.groupby('Family Name')['Storage'].min()
m = df['Storage'].isna()
df.loc[m, 'Storage'] = df.loc[m, 'Family Name'].map(s)
Output:
CNN Family Name Storage
0 iphone 11 63 GB TMO iphone 11 63.0
1 iphone 11 128 GB ATT iphone 11 128.0
2 iphone 11 other carrier iphone 11 63.0
3 iphone 12 256 GB TMO iphone 12 256.0
4 iphone 12 64 GB TMO iphone 12 64.0
5 iphone 12 other carrier iphone 12 64.0
CodePudding user response:
Use GroupBy.transform
for Series with same size like original column with Series.fillna
:
df["Storage"] = df["Storage"].fillna(df.groupby("Family Name")["Storage"].transform('min'))
Your solution should be changed:
df["Storage"] = df.groupby("Family Name")["Storage"].transform(lambda x: x.fillna(x.min()))
CodePudding user response:
Another possible solution:
(df.groupby('Family Name', group_keys=True, as_index=False)
.apply(lambda g: g.sort_values('Storage', ascending=False).ffill().sort_index())
.reset_index(drop=True))
Output:
CNN Family Name Storage
0 iphone 11 63 GB TMO iphone 11 63.0
1 iphone 11 128 GB ATT iphone 11 128.0
2 iphone 11 other carrier iphone 11 63.0
3 iphone 12 256 GB TMO iphone 12 256.0
4 iphone 12 64 GB TMO iphone 12 64.0
5 iphone 12 other carrier iphone 12 64.0