I have a dataframe (small sample) like this:
import pandas as pd
data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['A', False, 30], ['B', False, 4], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])
group indicator val
0 A False 2
1 A True 8
2 A False 25
3 A False 30
4 B False 4
5 B False 8
6 B True 2
7 B False 3
I would like to select n rows above and below the row with indicator == True
for each group
. For example I would like to get n = 1 rows which means that for group A it would return the rows with index: 0, 1, 2 and for group B rows with index: 5, 6, 7. I tried the following code:
# subset each group to list
dfs = [x for _, x in df.groupby('group')]
for i in dfs:
# select dataframe
df_sub = dfs[1]
# get index of row with indicator True
idx = df_sub.index[df_sub['indicator'] == True]
# select n rows above and below row with True
df_sub = df_sub.iloc[idx - 1: idx 1]
# combine each dataframe again
df_merged = pd.concat(df_sub)
print(df_merged)
But I get the following error:
TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype='int64')] of type Int64Index
This is the desired output:
data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])
group indicator val
0 A False 2
1 A True 8
2 A False 25
3 B False 8
4 B True 2
5 B False 3
I don't understand why this error happens and how to solve it. Does anyone know how to fix this issue?
CodePudding user response:
You can use a groupby.rolling
with a centered window of 2*n 1 to get the n rows before and after each True, then perform boolean indexing:
n = 1
mask = (df.groupby('group')['indicator']
.rolling(n*2 1, center=True, min_periods=1)
.max().droplevel(0)
.astype(bool)
)
out = df.loc[mask]
output:
group indicator val
0 A False 2
1 A True 8
2 A False 25
5 B False 8
6 B True 2
7 B False 3