Here's an example of DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame([
[0, "file_0", 0],
[0, "file_1", 0],
[1, "file_2", 0],
[2, "file_3", 0],
[2, "file_4", 100],
[2, "file_5", 0],
[2, "file_6", 50],
[2, "file_7", 0]
], columns=["case", "filename", "num"])
I wanna select num==0
rows and their previous rows with same case
value.
For example, "file_5" and "file_7" meet the first condition. Then, we check their previous rows: "file_6" and "file_4". Because they have the same case
number, we will also choose them no matter the value of num
.
Finally, we should get
case filename num
0 file_0 0
0 file_1 0
1 file_2 0
2 file_3 0
2 file_4 100
2 file_5 0
2 file_6 50
2 file_7 0
I have got that I can select the previous row by
df[(df['num']==0).shift(-1).fillna(False)]
However, this doesn't consider the case
value. One solution that came to my mind is group by case
first and then filter data. I have no idea how to code it ...
CodePudding user response:
How about merging df ?
df = pd.DataFrame([
[0, "file_0", 0],
[0, "file_1", 0],
[1, "file_2", 0],
[2, "file_3", 0],
[2, "file_4", 100],
[2, "file_5", 0],
[2, "file_6", 50],
[2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]:
case_x filename num_x case_y num_y
0 0 file_0 0 0 0
1 0 file_1 0 0 0
2 1 file_2 0 1 0
3 2 file_3 0 2 0
4 2 file_4 100 2 100
5 2 file_5 0 2 0
6 2 file_6 50 2 50
7 2 file_7 0 2 0
then you can rename columns back
df[['case_x', 'filename', 'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]:
case filename num
0 0 file_0 0
1 0 file_1 0
2 1 file_2 0
3 2 file_3 0
4 2 file_4 100
5 2 file_5 0
6 2 file_6 50
7 2 file_7 0
CodePudding user response:
Do you mean:
df.join(df.groupby('case').shift(-1)
.loc[df['num']==0]
.dropna(how='all').add_suffix('_next'),
how='inner')
Output:
case filename num filename_next num_next
0 0 file_0 0 file_1 0.0
3 2 file_3 0 file_4 100.0
5 2 file_5 0 file_6 50.0