I have a set of data which i would like to get the maximum values based on a column with similar data first, and the outcome is the data of another column of the maximum. If the column data is empty, then i would like use another column with similar data to get the maximum values, else, it will just return to the original data value
Data set is as below
df = pd.DataFrame({'ID': ['M2','M7','M9','M4','M3','M6','M99','M8','M10','M10'],
'Color' : ['pink','blue','blue','yellow','blue','pink','red','pink','red','blue'],
'Number': [23255,1755,7897,6666,1311,78946,44893,148791,18783,487913],
'Brand':["Brand9","Brand9","Brand9","Brand7","Brand7","","Brand1","Brand1","",""]})
In[1]:
ID Color Number Brand
0 M2 pink 23255 Brand9
1 M7 blue 1755 Brand9
2 M9 blue 7897 Brand9
3 M4 yellow 6666 Brand7
4 M3 blue 1311 Brand7
5 M6 pink 78946
6 M99 red 44893 Brand1
7 M8 pink 148791 Brand1
8 M10 red 18783
9 M10 blue 487913
First, i would like to find the Max Number based on same Brand, then index the color, and also the ID, those empty data will return empty first
i used df['Max'] = df.groupby('Brand')['Number'].transform('max')
to get the max number but i stuck to get the color and the ID
expected outcome as below:
ID Color Number Brand Max Max Color Max ID
0 M2 pink 23255 Brand9 23255 pink M2
1 M7 blue 1755 Brand9 23255 pink M2
2 M9 blue 7897 Brand9 23255 pink M2
3 M4 yellow 6666 Brand7 6666 yellow M4
4 M3 blue 1311 Brand7 6666 yellow M4
5 M6 pink 78946
6 M99 red 44893 Brand1 148791 pink M8
7 M8 pink 148791 Brand1 148791 pink M8
8 M10 red 18783
9 M10 blue 487913
***************************************
Max : Max Number based on brand
Max Color : Max Number based on brand index color
Max ID : Max Number based on ID
Then, for those empty, i will find the maximum based on the same ID, if there is no same ID or brand, it will return to the same data
the final expected result is as below:
ID Color Number Brand Max Max Color Max ID Final ID Final Color
0 M2 pink 23255 Brand9 23255 pink M2 M2 pink
1 M7 blue 1755 Brand9 23255 pink M2 M2 pink
2 M9 blue 7897 Brand9 23255 pink M2 M2 pink
3 M4 yellow 6666 Brand7 6666 yellow M4 M4 yellow
4 M3 blue 1311 Brand7 6666 yellow M4 M4 yellow
5 M6 pink 78946 M6 pink ---- return to same data
6 M99 red 44893 Brand1 148791 pink M8 M8 pink
7 M8 pink 148791 Brand1 148791 pink M8 M8 pink
8 M10 red 18783 M10 blue ---- max based on ID
9 M10 blue 487913 M10 blue ---- max based on ID
CodePudding user response:
Try:
mask = df["Brand"].eq("")
x = df.loc[df.groupby("Brand")["Number"].transform("idxmax")].reset_index(
drop=True
)
y = df.loc[df[mask].groupby("ID")["Number"].transform("idxmax")]
z = x.copy()[["Number", "Color", "ID"]]
z[mask] = ""
x.loc[mask] = y.values
print(
pd.concat(
[df, z.add_prefix("Max "), x[["ID", "Color"]].add_prefix("Final ")],
axis=1,
)
)
Prints:
ID Color Number Brand Max Number Max Color Max ID Final ID Final Color
0 M2 pink 23255 Brand9 23255 pink M2 M2 pink
1 M7 blue 1755 Brand9 23255 pink M2 M2 pink
2 M9 blue 7897 Brand9 23255 pink M2 M2 pink
3 M4 yellow 6666 Brand7 6666 yellow M4 M4 yellow
4 M3 blue 1311 Brand7 6666 yellow M4 M4 yellow
5 M6 pink 78946 M6 pink
6 M99 red 44893 Brand1 148791 pink M8 M8 pink
7 M8 pink 148791 Brand1 148791 pink M8 M8 pink
8 M10 red 18783 M10 blue
9 M10 blue 487913 M10 blue