How to get rows between a value until the value appears again-CodePudding

I need to separate the rows between when Level is 1 until it is 1 again. These groups of rows will be separate dfs and grouped by their part name at level 1. But I will take care of that later. For now, I can't seem to figure out how to get all the rows between the systems as the amount of levels in a system are inconsistent.

My current work in progress is getting all the indexes where level is 1, then retrieving the rows in between. Is is very time consuming as some dfs will have about 50 systems.

df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})


    Name    Level   Part
0   A          1    Upper System
1   A          1    Upper 2 System
2   A          2    Upper 2 stock
3   ABC        1    Upper System
4   ABC        3    Middle
5   AAB        1    Limits System
6   AAB        4    Deck
7   AAB        2    Ceiling

I need the dfs to look like something like these

    Name    Level   Part
0   A          1    Upper

    Name    Level   Part
0   A          1    Upper 2 System
1   A          2    Upper 2 stock

    Name    Level   Part
0   ABC        1    Upper System
1   ABC        3    Middle

    Name    Level   Part
0   AAB        1    Limits System
1   AAB        4    Deck
2   AAB        2    Ceiling

CodePudding user response：

df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})

from itertools import pairwise
import numpy as np

iloc_split = np.nonzero((df.Level.iloc[1:] == 1).values)[0]   1
iloc_split = [None]   list(iloc_split)   [None]  # fix the boundaries

# do the actual split
dfs = [df.iloc[start:stop].reset_index(drop=True) for start, stop in pairwise(iloc_split)]

for r in dfs:
    print(r)

You have to be careful about the first and last block, depending on exactly what you want to achieve.

CodePudding user response：

EDIT: added advice from Ruggero Turra's comment

import pandas as pd

df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})

split=[v.reset_index() for k, v in df.groupby((df['Level']==1).cumsum())]

for d in split.values():
    print(d,"\n")

>>>    index Name  Level          Part
>>> 0      0    A      1  Upper System 

>>>    index Name  Level            Part
>>> 0      1    A      1  Upper 2 System
>>> 1      2    A      2   Upper 2 stock

>>>    index Name  Level          Part
>>> 0      3  ABC      1  Upper System
>>> 1      4  ABC      3        Middle 

>>>    index Name  Level           Part
>>> 0      5  AAB      1  Limits System
>>> 1      6  AAB      4           Deck
>>> 2      7  AAB      2        Ceiling

How does it works: 1 finds the rows where Level is 1, equivalent to:

df["=1"]=1*(df["Level"]==1)

Then calculates the cummulative sum, equivalent to:

df["Cummmulative sum"]=df["=1"].cumsum()

Now, the column "Cummulative sum" has an unique identifier for each group:

Name  Level            Part  =1  Cummmulative sum
0    A      1    Upper System   1                 1
1    A      1  Upper 2 System   1                 2
2    A      2   Upper 2 stock   0                 2
3  ABC      1    Upper System   1                 3
4  ABC      3          Middle   0                 3
5  AAB      1   Limits System   1                 4
6  AAB      4            Deck   0                 4
7  AAB      2         Ceiling   0                 4

CodePudding user response：

df_abc = df.groupby("NAME").get_group("ABC").reset_index()
df_aab = df.groupby("NAME").get_group("AAB").reset_index()
df_a = df.groupby("NAME").get_group("A").reset_index()

# Slicing the 'A' group
df_a0 = df_a.loc[0, :]
df_a12 = df_a.loc[ 1:2, :].reset_index()

CodePudding user response：

Assign a group to the rows you want to group. As shown below, note you should use itertuples for faster dataframe loop, and update the group column once you have the final group_values list. Drop the group column if you don't need it after.

import pandas as pd
df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})

group = 0
group_values = []
for df_tuple in df.itertuples(index=None, name=None):
    level = df_tuple[1]
    if level == 1:
        group  = 1
    group_values.append(group)

df['group'] = group_values

for _, df_group in df.groupby('group'):
    print(df_group.reset_index(drop=True))

Easy and straight forward, instead of using some complex groupby's and/or slicing