I am looking for a solution to get row when certain condition is changed.
Here is example of my dataframe.
ts fdw time_stamp
0 n [0, 0] 1635211605896
1 n [0, 0] 1635211606896
2 l [0, 0] 1635211607896
3 l [0, 0] 1635211608896
4 l [0, 0] 1635211609896
5 l [0, 0] 1635211609896
6 n [0, 0] 1635211609896
On the above dataframe, I want to extract row when column name ts
is changed such as n
to l
or l
to n
.
Here is my expected output.
ts fdw time_stamp
1 n [0, 0] 1635211606896
2 l [0, 0] 1635211607896
5 l [0, 0] 1635211609896
6 n [0, 0] 1635211609896
CodePudding user response:
import pandas
import pdrle
# Data
df = pandas.DataFrame({"ts": ["n", "n", "l", "l", "l", "l", "n"]})
df["val"] = [1, 2, 3, 4, 5, 6, 7]
# Get runs of consecutive lengths in ts
rle = pdrle.encode(df.ts)
grp = rle.index.repeat(rle.runs)
# Get first and last row of each runs
ans = (
df.groupby(grp)
.apply(lambda x: x.iloc[[-1], :] if len(x) == 1 else x.iloc[[0, -1], :])
.droplevel(0)
)
# If the first and last group have more than two rows, remove duplicates
if rle.runs.iloc[0] > 1:
ans.drop(ans.head(1).index, inplace=True)
if rle.runs.iloc[-1] > 1:
ans.drop(ans.tail(1).index, inplace=True)
ans
# ts val
# 1 n 2
# 2 l 3
# 5 l 6
# 6 n 7