Home > other >  Pandas delete rows and update index
Pandas delete rows and update index

Time:08-06

I've got data with two index columns (id, timestemp) and data columns (velocity). The data could look like this

Id timestemp velocity
0   0           0
0   1           0
0   2           0
1   1           1
1   2           1
1   3           0
1   4           1
1   5           1
2   4           1
2   5           1
2   6           1 

My goal is now to remove all rows, where velocity is zero. Additionally, all timesteps should be consecutive. If there is a jump in the timesteps, the rows should be split up with a new id.

So after removing the rows with velocity = 0, the data look like this:

Id timestemp velocity
1   1           1
1   2           1
1   4           1
1   5           1
2   4           1
2   5           1
2   6           1 

Now notice the jump in row (1,2) to (1,4). Therefore rows (1, 4) and rows (1,5) should get a new unique index (for instance: max index 1). The data then looks like this:

Id timestemp velocity
1   1           1
1   2           1
2   4           1
2   5           1
2   6           1 
3   4           1
3   5           1

I've got some code for removing the rows:

df = df.drop(df[velocity == 0].index)

But I don't know how to assign rows a new unique index if there is a jump in the second index.

CodePudding user response:

You can use groupby with custom groups:

# which rows are velocity=0?
m = df['velocity'].eq(0)
# which timestamps are consecutive per id?
# make groups
g1 = df.groupby('Id')['timestemp'].diff().ne(1).cumsum()
g2 = m.cumsum()

out = df[~m].assign(new_id=lambda d: d.groupby([g1, g2]).ngroup())

output:

    Id  timestemp  velocity  new_id
3    1          1         1       0
4    1          2         1       0
6    1          4         1       1
7    1          5         1       1
8    2          4         1       2
9    2          5         1       2
10   2          6         1       2
  • Related