For the next dataframe, I want to modify two columns for the last row of each cluster.
country | ID | counter | direction |
---|---|---|---|
ENG | 0 | 1 | OUT |
ENG | 0 | 0 | IN |
ESP | 0 | 6 | OUT |
ENG | 1 | 5 | IN |
ENG | 1 | 5 | IN |
FRA | 2 | 4 | OUT |
ENG | 3 | 4 | OUT |
I want to obtain the following result:
country | ID | counter | direction | exit_to |
---|---|---|---|---|
ENG | 0 | 1 | OUT | NaN |
ENG | 0 | 0 | IN | NaN |
ESP | 0 | 7 | OUT | E |
ENG | 1 | 5 | IN | NaN |
ENG | 1 | 5 | IN | NaN |
FRA | 2 | 4 | OUT | W |
ENG | 3 | 4 | OUT | E |
For the last row of each group, I increment the value of the counter column by 1 and create a new column exit_to, with value E if direction = OUT and country ESP. If the direction of the last row is OUT the column exit_to, takes the value of W if country = FRA and E if country = ENG and the counter in both cases stays the same.
For the behaviour of the exit_to column, I have implemented the following code.
def exit_to(country, direction):
zone = np.NaN
if direction == 'OUT' and country == 'ESP':
# 1 counter
zone = 'E'
elif direction == 'OUT' and country == 'FRA':
zone = 'W'
elif direction == 'OUT':
zone = 'E'
return zone
I was using the following function:
last_rows = df.drop_duplicates('ID', keep='last')
last_rows['exit_to'] = last_rows.apply(lambda a: exit_to(a['country'],a['direction']), axis=1)
The problem is that I want the change to affect the whole dataframe and I don't want to get only a dataframe with the last rows of each group. I hope you can help me.
CodePudding user response:
There is no need to use apply
, here we can use a vectorized solution with boolean indexing:
m = ~df.duplicated('ID', keep='last')
df.loc[m, 'counter'] = 1
df.loc[m, 'exit_to'] = np.where(df.loc[m, 'direction'] == 'OUT', 'E', 'W')
Result
print(df)
ID counter direction exit_to
0 0 1 OUT NaN
1 0 0 IN NaN
2 0 7 OUT E
3 1 5 IN NaN
4 1 6 IN W
5 2 5 OUT E