Here's an example of my dataframe.
d = {'ids': [100, 200, 100, 200, 200, 100, 300, 300], 'col': [1, 2, 3, 4, 5, 6, 7, 8], 'col2': [6, 5, 4, 3, 2, 1, 10, 15]}
df = pd.DataFrame(data=d)
df
ids col col2
0 100 1 6
1 200 2 5
2 100 3 4
3 200 4 3
4 200 5 2
5 100 6 1
6 300 7 10
7 300 8 15
I want to calculate some value for each ids. For example, as in the example below.
groups = {key: df.loc[value] for key, value in df.groupby("ids").groups.items()}
for key, group in groups.items():
group['previous_col'] = group['col'].shift()
group['new_col'] = group['col2'] * group['previous_col']
print(group)
# Print out a value like this
ids col col2 previous_col new_col
0 100 1 6 NaN NaN
2 100 3 4 1.0 4.0
5 100 6 1 3.0 3.0
ids col col2 previous_col new_col
1 200 2 5 NaN NaN
3 200 4 3 2.0 6.0
4 200 5 2 4.0 8.0
ids col col2 previous_col new_col
6 300 7 10 NaN NaN
7 300 8 15 7.0 105.0
print(group.to_dict('records'))
[{'ids': 100, 'col': 1, 'col2': 6, 'previous_col': nan, 'new_col': nan}, {'ids': 100, 'col': 3, 'col2': 4, 'previous_col': 1.0, 'new_col': 4.0}, {'ids': 100, 'col': 6, 'col2': 1, 'previous_col': 3.0, 'new_col': 3.0}]
[{'ids': 200, 'col': 2, 'col2': 5, 'previous_col': nan, 'new_col': nan}, {'ids': 200, 'col': 4, 'col2': 3, 'previous_col': 2.0, 'new_col': 6.0}, {'ids': 200, 'col': 5, 'col2': 2, 'previous_col': 4.0, 'new_col': 8.0}]
[{'ids': 300, 'col': 7, 'col2': 10, 'previous_col': nan, 'new_col': nan}, {'ids': 300, 'col': 8, 'col2': 15, 'previous_col': 7.0, 'new_col': 105.0}]
You can see that after running the command to_dict('records') will get multiple list of dicts. But the result I want is one list but multiple dicts will look like this.
[{'ids': 100, 'col': 1, 'col2': 6, 'previous_col': nan, 'new_col': nan}, {'ids': 100, 'col': 3, 'col2': 4, 'previous_col': 1.0, 'new_col': 4.0}, {'ids': 100, 'col': 6, 'col2': 1, 'previous_col': 3.0, 'new_col': 3.0}
{'ids': 200, 'col': 2, 'col2': 5, 'previous_col': nan, 'new_col': nan}, {'ids': 200, 'col': 4, 'col2': 3, 'previous_col': 2.0, 'new_col': 6.0}, {'ids': 200, 'col': 5, 'col2': 2, 'previous_col': 4.0, 'new_col': 8.0}
{'ids': 300, 'col': 7, 'col2': 10, 'previous_col': nan, 'new_col': nan}, {'ids': 300, 'col': 8, 'col2': 15, 'previous_col': 7.0, 'new_col': 105.0}]
How to get results like this?
CodePudding user response:
I would just do something like this:
def func(group):
prev = group['col'].shift()
return group['col2'] * prev
df['new_col'] = df.groupby('ids').apply(func).reset_index('ids',drop=True)
output = df.to_dict('records')
That is for a generic function that func
represents. In this specific settings, just
df['new_col'] = df.groupby('ids')['col'].shift() * df['col2']
Output:
[{'ids': 100, 'col': 1, 'col2': 6, 'new_col': nan},
{'ids': 200, 'col': 2, 'col2': 5, 'new_col': nan},
{'ids': 100, 'col': 3, 'col2': 4, 'new_col': 4.0},
{'ids': 200, 'col': 4, 'col2': 3, 'new_col': 6.0},
{'ids': 200, 'col': 5, 'col2': 2, 'new_col': 8.0},
{'ids': 100, 'col': 6, 'col2': 1, 'new_col': 3.0},
{'ids': 300, 'col': 7, 'col2': 10, 'new_col': nan},
{'ids': 300, 'col': 8, 'col2': 15, 'new_col': 105.0}]