I have a Pandas dataframe which look like this.
Customer ID Customer Name Price_Val ID_value
5015 AXN 17.12 2
5015 AXN 2.08 4
5015 AXN 3.45 3
7315 BXN 2.22 0
7315 BXN 8.46 2
3283 CXN 88 0
3283 CXN 0.46 2
3283 CXN 6.46 4
Every Unique Customer should have same ID_value. But from the dataframe you can see we have different ID_Value for unique customer ID.
Lets's take Customer ID 5015 as example.
Customer ID Customer Name Price_Val ID_value
5015 AXN 17.12 2
5015 AXN 2.08 4
5015 AXN 3.45 3
For 5015 you can see we have three different ID_value. we need same ID_value for all the three records. To determine the ID_value we need to find the Maximum ID_value from the three records and assign that as ID_value for all the records of 5015.
For 5015 Customer ID Maximum ID_value will be 4 so we need to change other ID_values of 5015 to 4.
After conversion This how it should look like for 5015 customer id.
Customer ID Customer Name Price_Val ID_value
5015 AXN 17.12 4
5015 AXN 2.08 4
5015 AXN 3.45 4
For 7315 Max ID_value is 2 so we need to assign that value to all the ID_value of 7315
7315 BXN 2.22 2
7315 BXN 8.46 2
I need to do this for every unique customer id. I have 800 different unique customer id.What's the most efficient way to do this?
CodePudding user response:
Use transform
:
df['ID_value'] = df.groupby('Customer ID')['ID_value'].transform('max')
print(df)
# Output:
Customer ID Customer Name Price_Val ID_value
0 5015 AXN 17.12 4
1 5015 AXN 2.08 4
2 5015 AXN 3.45 4
3 7315 BXN 2.22 2
4 7315 BXN 8.46 2
5 3283 CXN 88.00 4
6 3283 CXN 0.46 4
7 3283 CXN 6.46 4
CodePudding user response:
For this specific case you should use the transform function from pandas, as it will preserve the original dataframe and will do it in one line of code.
df['ID_value'] = df.groupby('Customer ID')['ID_value'].transform('max')