Home > front end >  Filtering pandas series based on patterns
Filtering pandas series based on patterns

Time:01-05

Happy new year!

I have a quick question: Let's say I have a panda Series with 3 events like this

myEvents = pd.Series(['up', 'down', 'None', 'None, 'up', 'down', 'down', 'up', 'up'])

I would like to keep the 'valid' events only: A valid event is an up followed by a down or a down followed by an up even if there is one or many 'None' between the two. Also, all the 'up' following an 'up' should be ignored (same for consecutive 'down')

So here, by example, I would like to end up with something like

myEvents_filtered = pd.Series(['up', 'down', 'up', 'down', 'up'])

What would be the most elegant solution here ? Can I do it entirely with pandas or should I use a state machine or something like that given that I will have other similar use cases (like getting all the None following a 'down')?

Thanks !

CodePudding user response:

Algorithm:

  1. Remove all "None" rows from the series
  2. Get a column of the previous row's value
  3. Filter out rows where the current is equal to the previous row's value

Code:

# 1)
df = pd.DataFrame(
 {"my_events": my_events}
)

df = df[df["my_events" != "None"]]

# 2)
df["my_events_previous"] = df["my_events"].shift(1)

# 3)
final_result_df = df[
    df["my_events_previous"] != df["my_events"]
]

print(final_result_df["my_events"])

The key insight is to use .shift(), which can be applied to other use cases.

CodePudding user response:

This should get you your desired output. It basically drops the values of "None" and then drops consecutive duplicates.

import pandas as pd
myEvents = pd.Series(['up', 'down', 'None', 'None', 'up', 'down', 'down', 'up', 'up'])
myEvents_filtered = myEvents.mask(myEvents.eq('None')).dropna()
myEvents_filtered = myEvents_filtered.loc[myEvents_filtered.shift() != myEvents_filtered]
print(myEvents_filtered)
  •  Tags:  
  • Related