Home > Software engineering >  Select rows by column value and include previous row by another column value
Select rows by column value and include previous row by another column value

Time:01-01

Here's an example of DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [0, "file_0", 0],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [2, "file_3", 0],
    [2, "file_4", 100],
    [2, "file_5", 0],
    [2, "file_6", 50],
    [2, "file_7", 0]
], columns=["case", "filename", "num"])

I wanna select num==0 rows and their previous rows with same case value.

For example, "file_5" and "file_7" meet the first condition. Then, we check their previous rows: "file_6" and "file_4". Because they have the same case number, we will also choose them no matter the value of num.

Finally, we should get

case    filename    num
0   file_0  0
0   file_1  0
1   file_2  0
2   file_3  0
2   file_4  100
2   file_5  0
2   file_6  50
2   file_7  0

I have got that I can select the previous row by

df[(df['num']==0).shift(-1).fillna(False)]

However, this doesn't consider the case value. One solution that came to my mind is group by case first and then filter data. I have no idea how to code it ...

CodePudding user response:

How about merging df ?

    df = pd.DataFrame([
    [0, "file_0", 0],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [2, "file_3", 0],
    [2, "file_4", 100],
    [2, "file_5", 0],
    [2, "file_6", 50],
    [2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]: 
   case_x filename  num_x  case_y  num_y
0       0   file_0      0       0      0
1       0   file_1      0       0      0
2       1   file_2      0       1      0
3       2   file_3      0       2      0
4       2   file_4    100       2    100
5       2   file_5      0       2      0
6       2   file_6     50       2     50
7       2   file_7      0       2      0

then you can rename columns back

df[['case_x', 'filename',  'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]: 
   case filename  num
0     0   file_0    0
1     0   file_1    0
2     1   file_2    0
3     2   file_3    0
4     2   file_4  100
5     2   file_5    0
6     2   file_6   50
7     2   file_7    0

CodePudding user response:

Do you mean:

df.join(df.groupby('case').shift(-1)
                .loc[df['num']==0]
                .dropna(how='all').add_suffix('_next'), 
        how='inner')

Output:

   case filename  num filename_next  num_next
0     0   file_0    0        file_1       0.0
3     2   file_3    0        file_4     100.0
5     2   file_5    0        file_6      50.0
  • Related