I am trying to add a synthetic data column to the existing the movies dataset. This new column is the gross revenue of an actor's second most recent movie.
For example:
Movie | Actor | Revenue | New Column* |
---|---|---|---|
A | Nic Cage | $7 | $5 |
B | Nic Cage | $6 | $4 |
C | Nic Cage | $5 | - |
D | Nic Cage | $4 | - |
E | Al Pacino | $3 | $1 |
F | Al Pacino | $2 | - |
G | Al Pacino | $1 | - |
What is the most efficient way to code this in python?
CodePudding user response:
Assuming the movies are sorted, use groupby.shift
df['New Column'] = df.groupby('Actor')['Revenue'].shift(-2, fill_value='-')
Output:
Movie Actor Revenue New Column
0 A Nic Cage $7 $5
1 B Nic Cage $6 $4
2 C Nic Cage $5 -
3 D Nic Cage $4 -
4 E Al Pacino $3 $1
5 F Al Pacino $2 -
6 G Al Pacino $1 -