I have the following question:
My dataset contains rows that output different values. Thereby numeric as well as categorical values occur. I want to split the dataset as soon as the value "status 1" of the specific ID "B" changes. The split dataset should contain all rows until the status changes again. Each time the status changes, a new dataset should be generated. Other ID's could have the same value but the index should be
Input Data
data = {'ID': ['B', 'A', 'C', 'D', 'B','A','D'],
'value': ["status 1",125, "status1", 450, "status 2", 20 , 30],
'timestamp': [1632733508, 1632733508, 1632733511, 1632733512, 1632733513, 1632733515, 1632733518]
}
df1 = pd.DataFrame(data)
df1
ID value timestamp
0 B status 1 1632733508
1 A 125 1632733508
2 C status 1 1632733511
3 D 450 1632733512
4 B status 2 1632733513
5 A 20 1632733515
6 D 30 1632733518
Desired Output
ID value timestamp
0 B status 1 1632733508
1 A 125 1632733508
2 C status 1 1632733511
3 D 450 1632733512
ID value timestamp
1 B status 2 1632733513
2 A 20 1632733515
3 D 30 1632733518
CodePudding user response:
Create mask by comapre ID
by B
with cumulative sum for groups and then loop in groupby
:
for i, g in df1.groupby(df1['ID'].eq('B').cumsum()):
print (g)
ID value timestamp
0 B status 1 1632733508
1 A 125 1632733508
2 C 300 1632733511
3 D 450 1632733512
ID value timestamp
4 B status 2 1632733513
5 A 20 1632733515
6 D 30 1632733518
For list of DataFrames use:
dfs = [g for i,g in df1.groupby(df1['ID'].eq('B').cumsum())]
print (dfs)
[ ID value timestamp
0 B status 1 1632733508
1 A 125 1632733508
2 C 300 1632733511
3 D 450 1632733512, ID value timestamp
4 B status 2 1632733513
5 A 20 1632733515
6 D 30 1632733518]