I would like to ask a question concerning pivot tables in Pandas.
I have been trying to make a pivot table for this kind of table:
sector | score |
---|---|
US | null |
US | null |
US | 1 |
EU | null |
EU | 2 |
EU | 2 |
EU | 4 |
UK | null |
UK | null |
UK | null |
UK | 4 |
UK | 4 |
Eventually, I would like this table to be a pivot table that would look like this:
null | 1 | 2 | 4 | |
---|---|---|---|---|
US | 2 | 1 | 0 | 0 |
EU | 1 | 0 | 2 | 1 |
UK | 3 | 0 | 0 | 2 |
In order to do so, I have been trying to do as follows:
import pandas as pd
import numpy as np
df = sql("SELECT sector, score FROM database")
df_piv = pd.pivot_table(
df,
index = 'sector',
columns = 'score',
values = 'score',
aggfunc = 'count'
)
However, by doing so, I keep getting the following error:
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Could you please help me?
Thanks :)
CodePudding user response:
I prefer using groupby
, you can use
>>> df.groupby('sector')['score'].value_counts(dropna=False).unstack(fill_value=0)
score NaN 1.0 2.0 4.0
sector
EU 1 0 2 1
UK 3 0 0 2
US 2 1 0 0
Sidenote:
If you did not want to count the missing values,
pd.pivot_table(df, index='sector', columns='score', aggfunc=len, fill_value=0)
would do the trick. Using len
as an actual function because for some reason pivot_table
's aggfunc
parameter does not seem to accept string shortcuts like groupby.agg
/transform
.