I have the following pandas dataframe:
issue_stat timestamp state
0 11:00:00 hi
1 12:40:00 lo
9 13:00:00 av
3 15:00:00 hi
8 18:00:00 hi
4 20:00:00 lo
I want to map the state
of the line above timestamp=18:00:00
to jazz
. I MUST use the timestamp=18:00:00
in my code. How would I do this?
I know how to map the state
of timestamp=18:00:00
:
dataframe.loc[dataframe['timestamp'] == '18:00:00', 'state'] = whatever
But I am having difficult pointing to the line above it. Again I emphasise, I MUST reference the timestamp = 18:00:00
in my code.
So the output looks like this:
issue_stat timestamp state
0 11:00:00 hi
1 12:40:00 lo
9 13:00:00 av
3 15:00:00 jazz
8 18:00:00 hi
4 20:00:00 lo
CodePudding user response:
The shift()
method moves the series in either direction. So to set the state of the cell where the following timestamp is 18:00:
df.loc[df["timestamp"].shift(-1) == '18:00:00', 'state'] = 'jazz'
Produces:
issue_stat timestamp state
0 0 11:00:00 hi
1 1 12:40:00 lo
2 9 13:00:00 av
3 3 15:00:00 jazz
4 8 18:00:00 hi
5 4 20:00:00 lo
CodePudding user response:
You can use the shift function to get the previous row of the dataframe, then you can use boolean indexing to select the rows where timestamp equals 18:00:00 and update the state column for those rows.
Here is an example of how you can do this:
df['prev_state'] = df['state'].shift(1)
df.loc[df['timestamp'] == '18:00:00', 'prev_state'] = 'jazz'
This will add a new column called prev_state to the dataframe, which contains the value of the state column for the previous row. Then, it will update the prev_state column for the rows where timestamp equals 18:00:00 to jazz.
If you want to update the state column instead of adding a new column, you can simply use the following code:
df.loc[df['timestamp'] == '18:00:00', 'state'] = 'jazz'
This will update the state column for the rows where timestamp equals 18:00:00 to jazz.
CodePudding user response:
You can use shift()
:
test_df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], \
'B' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], \
'state' : ['NY', 'NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'FL', 'FL', 'FL']})
A B state
0 1 A NY
1 2 B NY
2 3 C NY
3 4 D NY
4 5 E NY
5 6 F FL
6 7 G FL
7 8 H FL
8 9 I FL
9 10 J FL
I reference value C
but value gets changed for value B
- 1 line above that.
test_df.loc[test_df.index[test_df['B'] == 'C'] - 1, 'state'] = 'new'
test_df
A B state
0 1 A NY
1 2 B new
2 3 C NY
3 4 D NY
4 5 E NY
5 6 F FL
6 7 G FL
7 8 H FL
8 9 I FL
9 10 J FL
But in your case you would chose the column timestamp
instead of B
and a value of your choosing.