I have a dataframe containing data on 3 car dealerships and the sales they've made. The two columns of interest look like this:
dealer_id manufacturer
0 34 Audi
1 34 Audi
2 34 BMW
3 55 Audi
4 55 Ford
5 55 BMW
6 55 Ford
7 12 Mercedes
8 12 Porsche
9 12 Mercedes
10 12 Audi
In short I want to change the dataframe to where I would only have one row for each manufacturer, for each dealer. So that I can see how many distinct manufacturers had cars sold by each dealer. I'm not really fussed on how this is decided, It can be the first row of each type but I would want the output to look like this before I reset the index:
dealer_id manufacturer
0 34 Audi
2 34 BMW
3 55 Audi
4 55 Ford
5 55 BMW
7 12 Mercedes
8 12 Porsche
10 12 Audi
CodePudding user response:
Try .drop_duplicates()
:
df = df.drop_duplicates()
print(df)
Prints:
dealer_id manufacturer
0 34 Audi
2 34 BMW
3 55 Audi
4 55 Ford
5 55 BMW
7 12 Mercedes
8 12 Porsche
10 12 Audi
Or with:
df = df.drop_duplicates(["dealer_id", "manufacturer"])