I need to separate the rows between when Level is 1 until it is 1 again. These groups of rows will be separate dfs and grouped by their part name at level 1. But I will take care of that later. For now, I can't seem to figure out how to get all the rows between the systems as the amount of levels in a system are inconsistent.
My current work in progress is getting all the indexes where level is 1, then retrieving the rows in between. Is is very time consuming as some dfs will have about 50 systems.
df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})
Name Level Part
0 A 1 Upper System
1 A 1 Upper 2 System
2 A 2 Upper 2 stock
3 ABC 1 Upper System
4 ABC 3 Middle
5 AAB 1 Limits System
6 AAB 4 Deck
7 AAB 2 Ceiling
I need the dfs to look like something like these
Name Level Part
0 A 1 Upper
Name Level Part
0 A 1 Upper 2 System
1 A 2 Upper 2 stock
Name Level Part
0 ABC 1 Upper System
1 ABC 3 Middle
Name Level Part
0 AAB 1 Limits System
1 AAB 4 Deck
2 AAB 2 Ceiling
CodePudding user response:
df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})
from itertools import pairwise
import numpy as np
iloc_split = np.nonzero((df.Level.iloc[1:] == 1).values)[0] 1
iloc_split = [None] list(iloc_split) [None] # fix the boundaries
# do the actual split
dfs = [df.iloc[start:stop].reset_index(drop=True) for start, stop in pairwise(iloc_split)]
for r in dfs:
print(r)
You have to be careful about the first and last block, depending on exactly what you want to achieve.
CodePudding user response:
EDIT: added advice from Ruggero Turra's comment
import pandas as pd
df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})
split=[v.reset_index() for k, v in df.groupby((df['Level']==1).cumsum())]
for d in split.values():
print(d,"\n")
>>> index Name Level Part
>>> 0 0 A 1 Upper System
>>> index Name Level Part
>>> 0 1 A 1 Upper 2 System
>>> 1 2 A 2 Upper 2 stock
>>> index Name Level Part
>>> 0 3 ABC 1 Upper System
>>> 1 4 ABC 3 Middle
>>> index Name Level Part
>>> 0 5 AAB 1 Limits System
>>> 1 6 AAB 4 Deck
>>> 2 7 AAB 2 Ceiling
How does it works: 1 finds the rows where Level is 1, equivalent to:
df["=1"]=1*(df["Level"]==1)
Then calculates the cummulative sum, equivalent to:
df["Cummmulative sum"]=df["=1"].cumsum()
Now, the column "Cummulative sum" has an unique identifier for each group:
Name Level Part =1 Cummmulative sum
0 A 1 Upper System 1 1
1 A 1 Upper 2 System 1 2
2 A 2 Upper 2 stock 0 2
3 ABC 1 Upper System 1 3
4 ABC 3 Middle 0 3
5 AAB 1 Limits System 1 4
6 AAB 4 Deck 0 4
7 AAB 2 Ceiling 0 4
CodePudding user response:
df_abc = df.groupby("NAME").get_group("ABC").reset_index()
df_aab = df.groupby("NAME").get_group("AAB").reset_index()
df_a = df.groupby("NAME").get_group("A").reset_index()
# Slicing the 'A' group
df_a0 = df_a.loc[0, :]
df_a12 = df_a.loc[ 1:2, :].reset_index()
CodePudding user response:
Assign a group to the rows you want to group. As shown below, note you should use itertuples
for faster dataframe loop, and update the group
column once you have the final group_values
list. Drop the group
column if you don't need it after.
import pandas as pd
df = pd.DataFrame({'Name':["A","A","A","ABC","ABC","AAB","AAB","AAB"] ,'Level': [1,1,2,1,3,1,4,2],'Part':["Upper System","Upper 2 System","Upper 2 stock","Upper System","Middle","Limits System", "Deck", "Ceiling"]})
group = 0
group_values = []
for df_tuple in df.itertuples(index=None, name=None):
level = df_tuple[1]
if level == 1:
group = 1
group_values.append(group)
df['group'] = group_values
for _, df_group in df.groupby('group'):
print(df_group.reset_index(drop=True))
Easy and straight forward, instead of using some complex groupby's
and/or slicing