Home > front end >  Iterating over lists in pandas dataframe to remove everything after certain value in list
Iterating over lists in pandas dataframe to remove everything after certain value in list

Time:11-12

I want to filter my dataframe values based on the occurrence of '1' in my column events. When a 1 occurres, everything after the 1 should be removed.

This works when I have a singe list:

event = [5, 5, 5, 5, 1, 5]
index = event.index(1)
event[:index]

outputs:

[5, 5, 5, 5]

Now I want to do this for my whole dataframe, which looks like this:

| session_id | events |
|00000000000 | [4,5,5,3,2,1,5] |
|00000000001 | [4,5,5,1,2,1,5,5,5] |
|00000000002 | [4,5,1,3,2,1,5,5,5,1] |
import pandas as pd

df = pd.DataFrame([['00000000000 ', [4, 5, 5, 3, 2, 1, 5]],
                   ['00000000001', [4, 5, 5, 1, 2, 1, 5, 5, 5]],
                   ['00000000002 ', [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]]],
                  columns=['session_id', 'events'])

But I cannot seem to get it right. Is someone able to help me? My final solution was to do this:

for i, row in df.iterrows():
    target_id = row['events'].index(1)
    df['events_short'] = row['events'][:target_id]

But it gives me the following error:

ValueError: Length of values (4) does not match length of index (10)

CodePudding user response:

Fix

Your df['events_short'] = means setting a new whole column, which isn't what you want, you need to set only one cell

df['events_short'] = ""
for i, row in df.iterrows():
    df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]

or overwrite

for i, row in df.iterrows():
    df.at[i, 'events'] = row['events'][:row['events'].index(1)]

Improve

You can also use Series.apply to apply your logic on each cell of the columns

df['events'] = df['events'].apply(lambda x: x[:x.index(1)])
  • Related