Say I have a string column in pandas in which each row is made of a list of strings
Class | Student |
---|---|
One | [Adam, Kanye, Alice Stocks, Joseph Matthew] |
Two | [Justin Bieber, Selena Gomez] |
I want to get rid of all the names in each class wherever the length of the string is more than 8 characters.
So the resulting table would be:
Class | Student |
---|---|
One | Adam, Kanye |
Most of the data would be gone because only Adam and Kanye satisfy the condition of len(StudentName)<8
I tried coming up with a .apply
filter myself, but it seems that the code is running on each character level instead of word, can someone point out where I went wrong?
This is the code:
[[y for y in x if not len(y)>=8] for x in df['Student']]
CodePudding user response:
Check Below code. Seems like you are not defining what you need to split at, hence things are automatically getting split a char level.
import pandas as pd
df = pd.DataFrame({'Class':['One','Two'],'Student':['[Adam, Kanye, Alice Stocks, Joseph Matthew]', '[Justin Bieber, Selena Gomez]'],
})
df['Filtered_Student'] = df['Student'].str.replace("\[|\]",'').str.split(',').apply(lambda x: ','.join([i for i in x if len(i)<8]))
df[df['Filtered_Student'] != '']
Output:
CodePudding user response:
IIUC, this van be done in a oneliner np.where
:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'Class': ['One', 'Two'],
'Student': [['Adam', 'Kanye', 'Alice Stocks', 'Joseph Matthew'], ['Justin Bieber', 'Selena Gomez']]
}
)
df.explode('Student').iloc[np.where(df.explode('Student').Student.str.len() <= 8)].groupby('Class').agg(list).reset_index()
Output:
Class Student
0 One [Adam, Kanye]
CodePudding user response:
# If they're not actually lists, but strings:
if isinstance(df.Student[0], str):
df.Student = df.Student.str[1:-1].str.split(', ')
# Apply your filtering logic:
df.Student = df.Student.apply(lambda s: [x for x in s if len(x)<8])
Output:
Class Student
0 One [Adam, Kanye]
1 Two []