I have two pandas dataframes:
df1
col1 col2 col3
c 3 0
a 3 0
a 4 0
c 1 0
v 3 1
d 3 1
e 2 0
f 2 0
df2
col1 col2 col3
a 3 0
a 4 0
c 1 0
v 3 0
d 3 0
e 2 0
a 1 0
e 8 0
e 9 0
f 2 1
i 1 0
I need to create two sub-dataframes df1_sub
and df2_sub
, so that:
df1_sub
anddf2_sub
have the same number of rows, equal to 5.- the number of rows before
col3==1
is equal to 3 and the number of rows aftercol3==1
is equal to 1 (thus, totally 5 rows). Let's assume that this rule is always implementable.
- the number of rows before
The expected output:
df1_sub
col1 col2 col3
a 3 0
a 4 0
c 1 0
v 3 1
d 3 1
df2_sub
col1 col2 col3
a 1 0
e 8 0
e 9 0
f 2 1
i 1 0
CodePudding user response:
You could use a shift
and cummax
to get the start and subsequent values and head
to limit the total length:
def sub(df, before=3, after=1):
mask = df['col3'].eq(1).shift(-before, fill_value=True).cummax()
return df[mask].head(before after 1)
Example:
sub(df1)
col1 col2 col3
1 a 3 0
2 a 4 0
3 c 1 0
4 v 3 1
5 d 3 1
sub(df2)
col1 col2 col3
6 a 1 0
7 e 8 0
8 e 9 0
9 f 2 1
10 i 1 0