Home > Back-end >  What does this line of code mean? .agg({'account':'count'}) | Aggregation & Pand
What does this line of code mean? .agg({'account':'count'}) | Aggregation & Pand

Time:06-14

I am currently learning Python through the DataCamp Data Scientist Course and I came across a line of code that doesn't make sense to me.

In this lesson, you will explore this further by finding out what is the most common business owner title. (i.e., secretary, CEO, or vice president)


Below contains the datasets that you will need:

licenses dataset:

     account ward  aid                   business                     address    zip
0     307071    3  743       REGGIE'S BAR & GRILL             2105 S STATE ST  60616
1         10   10  829                 HONEYBEERS         13200 S HOUSTON AVE  60633
2      10002   14  775                CELINA DELI           5089 S ARCHER AVE  60632
3      10005   12  NaN  KRAFT FOODS NORTH AMERICA              2005 W 43RD ST  60609
4      10044   44  638  NEYBOUR'S TAVERN & GRILLE        3651 N SOUTHPORT AVE  60613

biz_owners dataset:

      account first_name  last_name           title
0          10      PEARL    SHERMAN       PRESIDENT
1          10      PEARL    SHERMAN       SECRETARY
2       10002     WALTER     MROZEK         PARTNER
3       10002     CELINA     BYRDAK         PARTNER
4       10005      IRENE  ROSENFELD       PRESIDENT

Start of the assignment

# Merge the licenses and biz_owners table on account
licenses_owners = licenses.merge(biz_owners, on='account')

LINE OF CODE THAT I DON'T UNDERSTAND:

# Group the results by title then count the number of accounts
counted_df = licenses_owners.groupby('title').agg({'account':'count'})

What does the .agg({'account':'count'}) part actually mean?

In other words, how does this format give you this output?


WHAT I DO UNDERSTAND:

I understand the .groupby() operation and that .agg() is used as an operator over multiple values, but I have never seen aggregation formatted like this before.


Output

counted_df:

                    account
title                      
ASST. SECRETARY         111
BENEFICIARY               4
CEO                     110
DIRECTOR                146
EXECUTIVE DIRECTOR       10
GENERAL PARTNER          21
INDIVIDUAL              268
LIMITED PARTNER          26
MANAGER                 134
MANAGING MEMBER         878

Thanks in advance!

CodePudding user response:

According to the documentation, .agg() can accept a dictionary as follows:

Parameters funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

  • [...]
  • dict of axis labels -> functions, function names or list of such.

If using a dict, the key-value-pairs specify the axis labels (column names) and the function name which should by applied to the specified column.

The dict {'account':'count'} provided in your code snippet therefore applies the count function to the column account on the grouped dataframe (grouped by title). It therefore counts the occurrences of each title.

CodePudding user response:

It's an aggregate function. It will give you count of accounts based on title column. It's similar to

licenses_owners.groupby('title')["account"].count()

.agg() gives control over different function on different column.

CEO 110 means there's 110 data with CEO as title

  • Related