Home > Net >  Adding rows after groupby condition is met
Adding rows after groupby condition is met

Time:04-12

I am trying to find the number of consecutive negative values of 20 rows or more for a column in a dataframe. However, once it groups in chunks of 20 or more, I want to add corresponding 30 rows of the original dataframe after each chunk.

This is my attempt (got help from a posted question here):

n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g  in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]
df_cn = pd.concat(dfL)

I get the chunks of consecutive negative values, but I do not know how to now add corresponding 30 rows after each chunk.

CodePudding user response:

Please try to provide a minimal reproducible example next time and a small sample of the desired output

I created a random dfL of mine and it worked well

n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g  in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]

From here I created the code:

for x in range(len(dfL)): # dfL is reaturning a list of dfs with each chunk
  if len(dfL)>0: # here I want to be sure, that we have a chunk in the dfL
     df_cn= dfL[x] # selecting chunk from dfL
     print('Chunk: df_cn_'   str(x)   ' created') # feedback for testing
     idx=dfL[x].index # last index from chunk # since chunk size >=20, we need to be sure to get the last index of it.
     print('Chunk from '   str(min(idx))   ' to '   str(max(idx))   ' total '   str(len(dfL[x])) ' indexes in the chunk') # feedback with size of chunk
     df_rest=df.loc[max(idx) 1:max(idx) 31] # get the next 30 rows from original df based on max index from last chunk
     df_cn_ext = pd.concat([df_cn, df_rest]) # concatenate (join on Y-Achse) the chunk and 30rows of original df, if the 
     exec(f'df_cn_ext_{x}=df_cn_ext[:]') # creating separated dataframes trough suffixes for each chunk   30 rows groups
     print('Dataframe df_cn_ext_'   str(x)   ' created from index '   str(min(idx))   ' to '   str(max(idx) 31))
  else:
    print('no chunks in the df found')

Please note:

1- I've separated each chunk 30 rows in new dfs with suffixes (df_cn_ext_suffix)

2- If the last value of chunk is near the end of dfL, it won't add 30 rows, but the maximum number of rows available.

here some output of my code:

Chunk: df_cn_0 created
Chunk from 3 to 39 total 37 indexes in the chunk
Dataframe df_cn_ext_0 created from index 3 to 70
Chunk: df_cn_1 created
Chunk from 41 to 66 total 26 indexes in the chunk
Dataframe df_cn_ext_1 created from index 41 to 97
  • Related