I want to filter my dataframe values based on the occurrence of '1' in my column events. When a 1 occurres, everything after the 1 should be removed.
This works when I have a singe list:
event = [5, 5, 5, 5, 1, 5]
index = event.index(1)
event[:index]
outputs:
[5, 5, 5, 5]
Now I want to do this for my whole dataframe, which looks like this:
| session_id | events |
|00000000000 | [4,5,5,3,2,1,5] |
|00000000001 | [4,5,5,1,2,1,5,5,5] |
|00000000002 | [4,5,1,3,2,1,5,5,5,1] |
import pandas as pd
df = pd.DataFrame([['00000000000 ', [4, 5, 5, 3, 2, 1, 5]],
['00000000001', [4, 5, 5, 1, 2, 1, 5, 5, 5]],
['00000000002 ', [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]]],
columns=['session_id', 'events'])
But I cannot seem to get it right. Is someone able to help me? My final solution was to do this:
for i, row in df.iterrows():
target_id = row['events'].index(1)
df['events_short'] = row['events'][:target_id]
But it gives me the following error:
ValueError: Length of values (4) does not match length of index (10)
CodePudding user response:
Fix
Your df['events_short'] =
means setting a new whole column, which isn't what you want, you need to set only one cell
df['events_short'] = ""
for i, row in df.iterrows():
df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]
or overwrite
for i, row in df.iterrows():
df.at[i, 'events'] = row['events'][:row['events'].index(1)]
Improve
You can also use Series.apply
to apply your logic on each cell of the columns
df['events'] = df['events'].apply(lambda x: x[:x.index(1)])