Home > OS >  In Pandas, how can I create a pivot table that returns corresponding column value, for a max in anot
In Pandas, how can I create a pivot table that returns corresponding column value, for a max in anot

Time:02-20

sorry if this sounds like a silly question, I'm very new to python and pandas and I've been having trouble with a pivot table. Any help would be appreciated.

I have a file which looks like this:

Idx State Sex  Year    Name  Count
0    GA   F  1910    Mary    841
1    GA   F  1910   Annie    553
2    GA   F  1910  Mattie    320
3    GA   F  1910    Ruby    279
4    GA   F  1910  Willie    275
5    GA   F  1910  Louise    231

I wanted to create a pivot table, which returns a column of sequential years [one row for each], along with a column for the Name which has the highest count, and a Count column showing that maximum eg

Year    Name  Count
1910    Mary    1841
1911    Paul    553
1912   June     9620

I've tried:

import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values="Count", index=['Year','Name'], aggfunc='max' )
print(by_yr.head(10))

as well as:

import pandas as pd
bn = pd.read_csv("baby_names.csv")
bn['Count'].astype(int)
by_yr = pd.pivot_table(data=bn, values=['Name',"Count"], index='Year', aggfunc='max' )
print(by_yr.head(10))

but neither of them work. I'm sure I'm missing something stupid but any help is appreciated. Thanks

CodePudding user response:

You can use groupby('Year') and nlargest to find, for each group/year, the row of the maximum Count.

step1 = df.groupby('Year').apply(lambda df: df.nlargest(1, 'Count'))

Then takes care of the formatting with

step1.droplevel(1, axis=0)[['Name', 'Count']].reset_index()

CodePudding user response:

try this:

I expanded your dataframe sample so we could be able to better verify the results:

df = pd.DataFrame({0:[0,'GA','F','1910','Mary',841],
                   1:[1,'GA','F','1910','Annie',553],
                   2:[2,'GA','F','1910','Mattie',320],
                   3:[3,'GA','F','1910','Ruby',279],
                   4:[4,'GA','F','1910','Willie',275],
                   5:[5,'GA','F','1910','Louise',231],
                   6:[6,'GA','M','1911','Paul',234],
                   7:[7,'GA','F','1911','Mattie',745],
                   8:[8,'GA','M','1911','Paul',654],
                   9:[9,'GA','M','1911','Paul',105],
                  10:[10,'GA','F','1911','Louise',76],
                  11:[11,'GA','M','1911','Ringo',894],
                  12:[12,'GA','F','1912','Louise',436],
                  13:[13,'GA','M','1912','Ringo',354],
                  14:[14,'GA','F','1912','Louise',654],
                  15:[15,'GA','F','1912','Ruby',325],
                  16:[16,'GA','M','1912','Ringo',846],
                  17:[17,'GA','F','1912','Louise',100],
                  18:[17,'GA','F','1912','Louise',98]}, 
               index=['Idx','State','Sex','Year','Name','Count']).T

Output: df>>>

  Idx   State   Sex   Year    Name  Count
0   0      GA     F   1910    Mary    841
1   1      GA     F   1910   Annie    553
2   2      GA     F   1910  Mattie    320
3   3      GA     F   1910    Ruby    279
4   4      GA     F   1910  Willie    275
5   5      GA     F   1910  Louise    231
6   6      GA     M   1911    Paul    234
7   7      GA     F   1911  Mattie    745
8   8      GA     M   1911    Paul    654
9   9      GA     M   1911    Paul    105
10  10     GA     F   1911  Louise     76
11  11     GA     M   1911   Ringo    894
12  12     GA     F   1912  Louise    436
13  13     GA     M   1912   Ringo    354
14  14     GA     F   1912  Louise    654
15  15     GA     F   1912    Ruby    325
16  16     GA     M   1912   Ringo    846
17  17     GA     F   1912  Louise    100
18  17     GA     F   1912  Louise     98

First be sure that the 'Count' row is numeric by using this:

df['Count'] = pd.to_numeric(df['Count'])

Then just do this:

df[['Year','Name','Count']].loc[df.groupby('Year')['Count'].idxmax()]

Output: df>>

    Year     Name   Count
0   1910     Mary     841
11  1911    Ringo     894
16  1912    Ringo     846
  • Related