I have a dataframe like this, which has about 40 unique in the key
column and some of them are integer. I do not want anything aggregated if possible
:
hash ft timestamp id key value
1ab 1 2022-01-02 12:21:11 1 aaa 121289
1ab 1 2022-01-02 12:21:11 1 bbb 13ADF
1ab 1 2022-01-02 12:21:11 2 aaa 13ADH
1ab 1 2022-01-02 12:21:11 2 bbb 13ADH
2ab 2 2022-01-02 12:21:11 3 aaa 121382
2ab 2 2022-01-02 12:21:11 3 bbb 121381
2ab 2 2022-01-02 12:21:11 3 ccc 121389
I am trying to pivot the data only on 2 on columns key
and value
while keeping the remaining same columns and index. Example:
When I run below code, the column names take on grouped values with columnname being for the following columns id, ft, value
. One of the actual column names, with parentheses : ('id', '1', '121289')
and I am forced to select an index, which I dont want do.
Code:
df.pivot_table(index='hash',columns=['ft','key'])
I am not sure what I am doing wrong, that I cant use the value
column for values
. I get an empty dataframe:
df.pivot_table(index='hash',columns=['ft','key'], values='value')
CodePudding user response:
A possible solution, using pandas.DataFrame.pivot
:
(df.pivot(index=['hash', 'ft', 'id', 'timestamp'],
columns=['key'], values='value')
.reset_index().rename_axis(None, axis=1))
Output:
hash ft id timestamp aaa bbb ccc
0 1ab 1 1 2022-01-02 12:21:11 121289 13ADF NaN
1 1ab 1 2 2022-01-02 12:21:11 13ADH 13ADH NaN
2 2ab 2 3 2022-01-02 12:21:11 121382 121381 121389
Data:
df = pd.DataFrame.from_dict(
{'hash': {0: '1ab',
1: '1ab',
2: '1ab',
3: '1ab',
4: '2ab',
5: '2ab',
6: '2ab'},
'ft': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2},
'timestamp': {0: '2022-01-02 12:21:11',
1: '2022-01-02 12:21:11',
2: '2022-01-02 12:21:11',
3: '2022-01-02 12:21:11',
4: '2022-01-02 12:21:11',
5: '2022-01-02 12:21:11',
6: '2022-01-02 12:21:11'},
'id': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3, 6: 3},
'key': {0: 'aaa', 1: 'bbb', 2: 'aaa', 3: 'bbb', 4: 'aaa', 5: 'bbb', 6: 'ccc'},
'value': {0: '121289',
1: '13ADF',
2: '13ADH',
3: '13ADH',
4: '121382',
5: '121381',
6: '121389'}}
)
EDIT
To overcome an error reported below, in a comment, by the OP, the OP himself suggests the following solution:
(pd.pivot_table(df,index=['hash','ft', 'id'] ,
columns = ['key'] ,
values = "value", aggfunc='sum')
.reset_index().rename_axis(None, axis=1))