Here's the thing,
I need to pivot a table that is in the long format to its wide format.
Example:
d = {'col1': ['a','b', 'c', 'b', 'c', 'c'], 'col2': ['joe', 'mary', 'mary', 'joseph', 'allan', 'mary'],
'col3':[2020, 2022, 2019, 2020, 2022, 2019], 'values':[123,235,3456,213,423,52]}
df = pd.DataFrame(data=d)
print(df)
I would need something like
df = df.pivot(index=["col1"], columns=["col3", "col2"], values="values")
For generating this structure with values if exists and null if doesnt:
col1;2019_joe;2019_joe;2019_joe;2019_mary;2020_mary;2021_mary
a
b
c
d
e
f
But the command pivot() doesnt accept a list on column keyword argument.
CodePudding user response:
Problem is that you have two same c mary 2019
, in this case, you can try pivot_table
with aggfunc
argument
out = df.pivot_table(index=["col1"], columns=["col3", "col2"], values="values", aggfunc='mean')
out.columns = out.columns.map(lambda index: f'{index[0]}_{index[1]}')
print(out)
2019_mary 2020_joe 2020_joseph 2022_allan 2022_mary
col1
a NaN 123.0 NaN NaN NaN
b NaN NaN 213.0 NaN 235.0
c 1754.0 NaN NaN 423.0 NaN
CodePudding user response:
Does this get you the results you are expecting?
df['col4'] = df['col3'].astype(str) '_' df['col2'].astype(str)
df = df.drop(columns = ['col2', 'col3'])
pd.pivot_table(df, index = 'col1', values = 'values', columns = 'col4').reset_index()
CodePudding user response:
I don't get what is the logic to be applied on col1
. In your constructor you have 'col1': ['a','b', 'c', 'b', 'c', 'c']
but then in your expected output you have a, b, c, d, e and f
. Where are d, e and f
coming from? Assuming that you don't care about col1
, what about df.pivot(columns=['col3','col2'])['values']
?
col3 2020 2022 2019 2020 2022
col2 joe mary mary joseph allan
0 123.0 NaN NaN NaN NaN
1 NaN 235.0 NaN NaN NaN
2 NaN NaN 3456.0 NaN NaN
3 NaN NaN NaN 213.0 NaN
4 NaN NaN NaN NaN 423.0
5 NaN NaN 52.0 NaN NaN