Home > OS >  TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype=&#
TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype=&#

Time:09-23

I have a dataframe (small sample) like this:

import pandas as pd

data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['A', False, 30], ['B', False, 4], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
3     A      False   30
4     B      False    4
5     B      False    8
6     B       True    2
7     B      False    3

I would like to select n rows above and below the row with indicator == True for each group. For example I would like to get n = 1 rows which means that for group A it would return the rows with index: 0, 1, 2 and for group B rows with index: 5, 6, 7. I tried the following code:

# subset each group to list
dfs = [x for _, x in df.groupby('group')] 

for i in dfs:
    # select dataframe
    df_sub = dfs[1]
    # get index of row with indicator True
    idx = df_sub.index[df_sub['indicator'] == True]
    # select n rows above and below row with True
    df_sub = df_sub.iloc[idx - 1: idx   1]
    # combine each dataframe again
    df_merged = pd.concat(df_sub)
    
    print(df_merged)

But I get the following error:

TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype='int64')] of type Int64Index

This is the desired output:

data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
3     B      False    8
4     B       True    2
5     B      False    3

I don't understand why this error happens and how to solve it. Does anyone know how to fix this issue?

CodePudding user response:

You can use a groupby.rolling with a centered window of 2*n 1 to get the n rows before and after each True, then perform boolean indexing:

n = 1

mask = (df.groupby('group')['indicator']
          .rolling(n*2 1, center=True, min_periods=1)
          .max().droplevel(0)
          .astype(bool)
       )

out = df.loc[mask]

output:

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
5     B      False    8
6     B       True    2
7     B      False    3
  • Related