creating a df of unique items and the max value of another column for each unique item-CodePudding

I have a DataFrame that looks something like this:

Place| University | Field | Number of students
| New York | NYU | Fashion | Female | 1000
| New York | NYU | Fashion | Male | 500 |
| New York | NYU | CS | Female | 300 |
| New York | NYU | CS | Male |3050 |
| New York | NYU | Bio | Female | 350 |
| New York | NYU | Bio | Male | 400 |
| Texas | UT | Fashion | Female | 600 |
| Texas | UT | Fashion | Male | 100 |
| Texas | UT | CS | Female | 500 |
| Texas | UT | CS | Male |5050 |
| Texas | UT | Bio | Female | 650 |
| Texas | UT | Bio | Male | 700 |

Using only Pandas, I need to find all the unique fields of study then for each one, find the state with the most students:

| Field | Place | No. of Students|
| Fashion | New York | 1500 |
| CS | Texas | 5550 |
| Bio | Texas | 1350 |

I know we have to groupby the field but i'm unable to get the sum of students and then the one with the max number of students (col 2 and 3). I've tried agg and count but I might be using them in the wrong way. Please help guide me on this. TIA!

CodePudding user response：

You can do this

>>> df = df.groupby(['Place','Field'])['Number of students'].sum().reset_index()
      Place    Field  Number of students
0  New York       CS                 500
1  New York  Fashion                1500
2     Texas      Bio                 200
3     Texas       CS                 700
>>> df.sort_values('Number of students',ascending=False)
      Place    Field  Number of students
1  New York  Fashion                1500
3     Texas       CS                 700
0  New York       CS                 500
2     Texas      Bio                 200

CodePudding user response：

df.groupby(['State', 'Place', 'Field']).sum('Nos').groupby('Field').max()