I would like drop the rows where a certain column has a list of length X. What is the most pythonic or efficient way? Instead of looping...
Code example:
import pandas as pd
data = {'column_1': ['1', '2', '3'] ,
'column_2': [['A','B'], ['A','B','C'], ['A']],
"column_3": ['a', 'b', 'c']}
df = pd.DataFrame.from_dict(data)
drop rows where length of list = 3. In this case, row 2 should be deleted since the length of the list is 3
CodePudding user response:
Use Series.str.len
to make a boolean indexing
new_df = df[df["column_2"].str.len().ne(3)]
column_1 column_2 column_3
0 1 [A, B] a
2 3 [A] c
Or if you want to remove rows where list length is equal or greater than 3:
new_df = df[df["column_2"].str.len().le(2)]
print(df["column_2"].str.len().ne(3))
#0 True
#1 False
#2 True
#Name: column_2, dtype: bool
CodePudding user response:
Use Series.apply
res = df[df["column_2"].apply(len).le(2)]
print(res)
Output
column_1 column_2 column_3
0 1 [A, B] a
2 3 [A] c