Home > Mobile >  Finding the most frequent strings and their counts for each group using pandas
Finding the most frequent strings and their counts for each group using pandas

Time:07-20

I'm trying to find the name of the person who submitted the most applications in any given year over a series of years.

Each application is its own row in the dataframe. It comes with the year it was submitted, and the applicant's name.

I tried using groupby to organize the data by year and name, then a variety of methods such as value_counts(), count(), max(), etc...

This is the closest I've gotten:

df3.groupby(['app_year_start'])['name'].value_counts().sort_values(ascending=False)

It produces the following output:

app_year_start        name               total_apps
2015                  John Smith         622
2013                  John Smith         614
2014                  Jane Doe           611
2016                  Jon Snow           549

My desired output:

app_year_start        name                  total_apps
2015                  top_applicant         max_num
2014                  top_applicant         max_num
2013                  top_applicant         max_num
2012                  top_applicant         max_num

Some lines of dummy data:

app_year_start        name
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  Jane Doe
2013                  Jane Doe
2012                  John Snow
2015                  John Snow
2014                  John Smith
2015                  John Snow
2012                  John Snow
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  Jane Doe
2013                  Jane Doe
2012                  John Snow
2015                  John Snow
2014                  John Smith
2015                  John Snow
2012                  John Snow
2012                  John Smith

I've consulted the follow SO posts:

result

CodePudding user response:

With value_counts and a groupby:

dfc = (df.value_counts().reset_index().groupby('app_year_start').max()
          .sort_index(ascending=False).reset_index() 
          .rename(columns={0:'total_apps'})
      )

print(dfc)

Result

   app_year_start        name  total_apps
0            2015   John Snow           4
1            2014  John Smith           2
2            2013    Jane Doe           2
3            2012   John Snow           8
  • Related