Home > Net >  how to get rid of strings in each list of each row in pandas
how to get rid of strings in each list of each row in pandas

Time:07-28

Say I have a string column in pandas in which each row is made of a list of strings

Class Student
One [Adam, Kanye, Alice Stocks, Joseph Matthew]
Two [Justin Bieber, Selena Gomez]

I want to get rid of all the names in each class wherever the length of the string is more than 8 characters.

So the resulting table would be:

Class Student
One Adam, Kanye

Most of the data would be gone because only Adam and Kanye satisfy the condition of len(StudentName)<8

I tried coming up with a .applyfilter myself, but it seems that the code is running on each character level instead of word, can someone point out where I went wrong?

This is the code: [[y for y in x if not len(y)>=8] for x in df['Student']]

CodePudding user response:

Check Below code. Seems like you are not defining what you need to split at, hence things are automatically getting split a char level.

import pandas as pd 
df = pd.DataFrame({'Class':['One','Two'],'Student':['[Adam, Kanye, Alice Stocks, Joseph Matthew]', '[Justin Bieber, Selena Gomez]'],
                   })
df['Filtered_Student'] = df['Student'].str.replace("\[|\]",'').str.split(',').apply(lambda x: ','.join([i for i in x if len(i)<8]))
df[df['Filtered_Student'] != '']

Output:

enter image description here

CodePudding user response:

IIUC, this van be done in a oneliner np.where:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'Class': ['One', 'Two'], 
        'Student': [['Adam', 'Kanye', 'Alice Stocks', 'Joseph Matthew'], ['Justin Bieber', 'Selena Gomez']]
    }
)

df.explode('Student').iloc[np.where(df.explode('Student').Student.str.len() <= 8)].groupby('Class').agg(list).reset_index()

Output:

  Class        Student
0   One  [Adam, Kanye]

CodePudding user response:

# If they're not actually lists, but strings:
if isinstance(df.Student[0], str):
    df.Student = df.Student.str[1:-1].str.split(', ')

# Apply your filtering logic:
df.Student = df.Student.apply(lambda s: [x for x in s if len(x)<8])

Output:

  Class        Student
0   One  [Adam, Kanye]
1   Two             []
  • Related