I'd like to use Pandas to pivot a table into multiple columns, and get the count of their values.
In this example table:
LOCATION | ADDRESS | PARKING TYPE |
---|---|---|
AAA0001 | 123 MAIN | LARGE LOT |
AAA0001 | 123 MAIN | SMALL LOT |
AAA0002 | 456 TOWN | LARGE LOT |
AAA0003 | 789 AVE | MEDIUM LOT |
AAA0003 | 789 AVE | MEDIUM LOT |
How do I pivot out this table to show total counts of each string within "Parking Type"? Maybe my mistake is calling this a "pivot?"
Desired output:
LOCATION | ADDRESS | SMALL LOT | MEDIUM LOT | LARGE LOT |
---|---|---|---|---|
AAA0001 | 123 MAIN | 1 | 0 | 1 |
AAA0002 | 456 TOWN | 0 | 0 | 1 |
AAA0003 | 789 AVE | 0 | 2 | 0 |
Currently, I have a pivot going, but it is only counting the values of the first column, and leaving everything else as 0s. Any guidance would be amazing.
Current Code:
pivot = pd.pivot_table(df, index=["LOCATION"], columns=['PARKING TYPE'], aggfunc=len)
pivot = pivot.reset_index()
pivot.columns = pivot.columns.to_series().apply(lambda x: "".join(x))
CodePudding user response:
You could use pd.crosstab
:
out = (pd.crosstab(index=[df['LOCATION'], df['ADDRESS']], columns=df['PARKING TYPE'])
.reset_index()
.rename_axis(columns=[None]))
or you could use pivot_table
(but you have to pass "ADDRESS" into the index as well):
out = (pd.pivot_table(df, index=['LOCATION','ADDRESS'], columns=['PARKING TYPE'], values='ADDRESS', aggfunc=len, fill_value=0)
.reset_index()
.rename_axis(columns=[None]))
Output:
LOCATION ADDRESS LARGE LOT MEDIUM LOT SMALL LOT
0 AAA0001 123 MAIN 1 0 1
1 AAA0002 456 TOWN 1 0 0
2 AAA0003 789 AVE 0 2 0
CodePudding user response:
You can use get_dummies()
and then a grouped sum
to get a row per your groups:
>>> pd.get_dummies(df, columns=['PARKING TYPE']).groupby(['LOCATION','ADDRESS'],as_index=False).sum()
LOCATION ADDRESS PARKING TYPE_LARGE LOT PARKING TYPE_MEDIUM LOT PARKING TYPE_SMALL LOT
0 AAA0001 123 MAIN 1 0 1
1 AAA0002 456 TOWN 1 0 0
2 AAA0003 789 AVE 0 2 0