Home > Back-end >  Get row when certain condition is changed in dataframe
Get row when certain condition is changed in dataframe

Time:11-01

I am looking for a solution to get row when certain condition is changed.

Here is example of my dataframe.

     ts  fdw     time_stamp
0     n  [0, 0]  1635211605896
1     n  [0, 0]  1635211606896
2     l  [0, 0]  1635211607896
3     l  [0, 0]  1635211608896
4     l  [0, 0]  1635211609896
5     l  [0, 0]  1635211609896
6     n  [0, 0]  1635211609896

On the above dataframe, I want to extract row when column name ts is changed such as n to l or l to n.

Here is my expected output.

     ts  fdw     time_stamp
1     n  [0, 0]  1635211606896
2     l  [0, 0]  1635211607896
5     l  [0, 0]  1635211609896
6     n  [0, 0]  1635211609896

CodePudding user response:

import pandas
import pdrle

# Data
df = pandas.DataFrame({"ts": ["n", "n", "l", "l", "l", "l", "n"]})
df["val"] = [1, 2, 3, 4, 5, 6, 7]

# Get runs of consecutive lengths in ts
rle = pdrle.encode(df.ts)
grp = rle.index.repeat(rle.runs)

# Get first and last row of each runs
ans = (
    df.groupby(grp)
    .apply(lambda x: x.iloc[[-1], :] if len(x) == 1 else x.iloc[[0, -1], :])
    .droplevel(0)
)

# If the first and last group have more than two rows, remove duplicates
if rle.runs.iloc[0] > 1:
    ans.drop(ans.head(1).index, inplace=True)
if rle.runs.iloc[-1] > 1:
    ans.drop(ans.tail(1).index, inplace=True)

ans
#       ts  val
# 1     n   2
# 2     l   3
# 5     l   6
# 6     n   7
  • Related