How can I reference the line above a particular line in a pandas dataframe?-CodePudding

I have the following pandas dataframe:

issue_stat     timestamp   state
    0          11:00:00     hi
    1          12:40:00     lo
    9          13:00:00     av
    3          15:00:00     hi 
    8          18:00:00     hi
    4          20:00:00     lo

I want to map the state of the line above timestamp=18:00:00 to jazz. I MUST use the timestamp=18:00:00 in my code. How would I do this?

I know how to map the state of timestamp=18:00:00:

dataframe.loc[dataframe['timestamp'] == '18:00:00', 'state'] = whatever

But I am having difficult pointing to the line above it. Again I emphasise, I MUST reference the timestamp = 18:00:00 in my code.

So the output looks like this:

issue_stat     timestamp   state
     0          11:00:00     hi
     1          12:40:00     lo
     9          13:00:00     av
     3          15:00:00    jazz 
     8          18:00:00     hi
     4          20:00:00     lo

CodePudding user response：

The shift() method moves the series in either direction. So to set the state of the cell where the following timestamp is 18:00:

df.loc[df["timestamp"].shift(-1) == '18:00:00', 'state'] = 'jazz'

Produces:

   issue_stat           timestamp    state
0           0            11:00:00       hi
1           1            12:40:00       lo
2           9            13:00:00       av
3           3            15:00:00     jazz
4           8            18:00:00       hi
5           4            20:00:00       lo

CodePudding user response：

You can use the shift function to get the previous row of the dataframe, then you can use boolean indexing to select the rows where timestamp equals 18:00:00 and update the state column for those rows.

Here is an example of how you can do this:

df['prev_state'] = df['state'].shift(1)
df.loc[df['timestamp'] == '18:00:00', 'prev_state'] = 'jazz'

This will add a new column called prev_state to the dataframe, which contains the value of the state column for the previous row. Then, it will update the prev_state column for the rows where timestamp equals 18:00:00 to jazz.

If you want to update the state column instead of adding a new column, you can simply use the following code:

df.loc[df['timestamp'] == '18:00:00', 'state'] = 'jazz'

This will update the state column for the rows where timestamp equals 18:00:00 to jazz.

CodePudding user response：

You can use shift():

test_df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], \
    'B' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], \
    'state' : ['NY', 'NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'FL', 'FL', 'FL']})

    A   B   state
0   1   A   NY
1   2   B   NY
2   3   C   NY
3   4   D   NY
4   5   E   NY
5   6   F   FL
6   7   G   FL
7   8   H   FL
8   9   I   FL
9   10  J   FL

I reference value C but value gets changed for value B - 1 line above that.

test_df.loc[test_df.index[test_df['B'] == 'C'] - 1, 'state'] = 'new'
test_df

    A   B   state
0   1   A   NY
1   2   B   new
2   3   C   NY
3   4   D   NY
4   5   E   NY
5   6   F   FL
6   7   G   FL
7   8   H   FL
8   9   I   FL
9   10  J   FL

But in your case you would chose the column timestamp instead of B and a value of your choosing.