I have a dataframe 'raw' that looks like this -
It has many rows with duplicate values in each column. I want to make a new dataframe 'new_df' which has unique customer_code corresponding and market_code. The new_df should look like this -
CodePudding user response:
It sounds like you simply want to create a DataFrame with unique customer_code
which also shows market_code
. Here's a way to do it:
df = df[['customer_code','market_code']].drop_duplicates('customer_code')
Output:
customer_code market_code
0 Cus001 Mark001
1 Cus003 Mark003
3 Cus004 Mark003
4 Cus005 Mark004
The part reading df[['customer_code','market_code']]
gives us a DataFrame containing only the two columns of interest, and the drop_duplicates('customer_code')
part eliminates all but the first occurrence of duplicate values in the customer_code
column (though you could instead keep the last occurrence of each duplicate by calling it using the keep='last'
argument).