Home > other >  Python get the number of distinct values in a column grouped by another column
Python get the number of distinct values in a column grouped by another column

Time:09-21

I have a dataframe containing data on 3 car dealerships and the sales they've made. The two columns of interest look like this:

     dealer_id   manufacturer
0    34          Audi
1    34          Audi
2    34          BMW
3    55          Audi
4    55          Ford
5    55          BMW
6    55          Ford
7    12          Mercedes
8    12          Porsche
9    12          Mercedes
10   12          Audi

In short I want to change the dataframe to where I would only have one row for each manufacturer, for each dealer. So that I can see how many distinct manufacturers had cars sold by each dealer. I'm not really fussed on how this is decided, It can be the first row of each type but I would want the output to look like this before I reset the index:

    dealer_id    manufacturer
0    34           Audi
2    34           BMW
3    55           Audi
4    55           Ford
5    55           BMW
7    12           Mercedes
8    12           Porsche
10   12           Audi

CodePudding user response:

Try .drop_duplicates():

df = df.drop_duplicates()
print(df)

Prints:

    dealer_id manufacturer
0          34         Audi
2          34          BMW
3          55         Audi
4          55         Ford
5          55          BMW
7          12     Mercedes
8          12      Porsche
10         12         Audi

Or with:

df = df.drop_duplicates(["dealer_id", "manufacturer"])
  • Related