I am currently learning Python through the DataCamp Data Scientist Course and I came across a line of code that doesn't make sense to me.
In this lesson, you will explore this further by finding out what is the most common business owner title. (i.e., secretary, CEO, or vice president)
Below contains the datasets that you will need:
licenses dataset:
account ward aid business address zip
0 307071 3 743 REGGIE'S BAR & GRILL 2105 S STATE ST 60616
1 10 10 829 HONEYBEERS 13200 S HOUSTON AVE 60633
2 10002 14 775 CELINA DELI 5089 S ARCHER AVE 60632
3 10005 12 NaN KRAFT FOODS NORTH AMERICA 2005 W 43RD ST 60609
4 10044 44 638 NEYBOUR'S TAVERN & GRILLE 3651 N SOUTHPORT AVE 60613
biz_owners dataset:
account first_name last_name title
0 10 PEARL SHERMAN PRESIDENT
1 10 PEARL SHERMAN SECRETARY
2 10002 WALTER MROZEK PARTNER
3 10002 CELINA BYRDAK PARTNER
4 10005 IRENE ROSENFELD PRESIDENT
Start of the assignment
# Merge the licenses and biz_owners table on account
licenses_owners = licenses.merge(biz_owners, on='account')
LINE OF CODE THAT I DON'T UNDERSTAND:
# Group the results by title then count the number of accounts
counted_df = licenses_owners.groupby('title').agg({'account':'count'})
What does the .agg({'account':'count'})
part actually mean?
In other words, how does this format give you this output?
WHAT I DO UNDERSTAND:
I understand the .groupby()
operation and that .agg()
is used as an operator over multiple values, but I have never seen aggregation formatted like this before.
Output
counted_df:
account
title
ASST. SECRETARY 111
BENEFICIARY 4
CEO 110
DIRECTOR 146
EXECUTIVE DIRECTOR 10
GENERAL PARTNER 21
INDIVIDUAL 268
LIMITED PARTNER 26
MANAGER 134
MANAGING MEMBER 878
Thanks in advance!
CodePudding user response:
According to the documentation, .agg()
can accept a dictionary as follows:
Parameters funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
- [...]
- dict of axis labels -> functions, function names or list of such.
If using a dict, the key-value-pairs specify the axis labels (column names) and the function name which should by applied to the specified column.
The dict {'account':'count'}
provided in your code snippet therefore applies the count
function to the column account
on the grouped dataframe (grouped by title
). It therefore counts the occurrences of each title
.
CodePudding user response:
It's an aggregate function. It will give you count of accounts based on title column. It's similar to
licenses_owners.groupby('title')["account"].count()
.agg() gives control over different function on different column.
CEO 110 means there's 110 data with CEO as title