Home > database >  How can a pandas dataframe be framed based on a row with a specific cell value until the next value
How can a pandas dataframe be framed based on a row with a specific cell value until the next value

Time:11-11

I have the following question:

My dataset contains rows that output different values. Thereby numeric as well as categorical values occur. I want to split the dataset as soon as the value "status 1" of the specific ID "B" changes. The split dataset should contain all rows until the status changes again. Each time the status changes, a new dataset should be generated. Other ID's could have the same value but the index should be

Input Data

data = {'ID': ['B', 'A', 'C', 'D', 'B','A','D'],
        'value': ["status 1",125, "status1", 450, "status 2", 20 , 30],
        'timestamp': [1632733508, 1632733508, 1632733511, 1632733512, 1632733513, 1632733515, 1632733518]
        }
df1 = pd.DataFrame(data)
df1

    ID  value       timestamp
0   B   status 1    1632733508
1   A   125         1632733508
2   C   status 1    1632733511
3   D   450         1632733512
4   B   status 2    1632733513
5   A   20          1632733515
6   D   30          1632733518

Desired Output

    ID  value       timestamp
0   B   status 1    1632733508
1   A   125         1632733508
2   C   status 1    1632733511
3   D   450         1632733512
    ID  value       timestamp
1   B   status 2    1632733513
2   A   20          1632733515
3   D   30          1632733518

CodePudding user response:

Create mask by comapre ID by B with cumulative sum for groups and then loop in groupby:

for i, g in df1.groupby(df1['ID'].eq('B').cumsum()):

   print (g)
  ID     value   timestamp
0  B  status 1  1632733508
1  A       125  1632733508
2  C       300  1632733511
3  D       450  1632733512

  ID     value   timestamp
4  B  status 2  1632733513
5  A        20  1632733515
6  D        30  1632733518

For list of DataFrames use:

dfs = [g for i,g in df1.groupby(df1['ID'].eq('B').cumsum())]

print (dfs)
[  ID     value   timestamp
0  B  status 1  1632733508
1  A       125  1632733508
2  C       300  1632733511
3  D       450  1632733512,   ID     value   timestamp
4  B  status 2  1632733513
5  A        20  1632733515
6  D        30  1632733518]
  • Related