Home > OS >  How to remove empty list in pandas?
How to remove empty list in pandas?

Time:06-01

I am redaing data from csv I have a dataframe like this:

product_title   variatons_color          
T-shirt          ['yellow','ornage'] 
T-shirt          []
T-shirt          ['blue','green']

my expected dataframe will be look like this

product_title   variatons_color          
T-shirt          ['yellow','ornage'] 
T-shirt         
T-shirt          ['blue','green']

I want to remove empty list. How to do that in pandas?

update1 I applied Scott Boston,Ynjxsjmh and BENY solution. All solution filling None value for all of my row but I need to fill None value for only my empty list. when I run type(df.loc[0,'variations_color']) returning str

CodePudding user response:

You can try

df['variatons_color'] = df['variatons_color'].apply(lambda lst: lst if len(lst) else '')
print(df)

  product_title   variatons_color
0       T-shirt  [yellow, ornage]
1       T-shirt
2       T-shirt     [blue, green]

CodePudding user response:

Check assign with bool check

df.loc[~df['variatons_color'].astype(bool),'variatons_color'] = ''

Update

df.loc[df['variatons_color'].eq('[]'),'variatons_color'] = ''

CodePudding user response:

Just apply len:

df.loc[df['variations_color'].apply(len) == 0, 'variations_color'] = ''

or

df.loc[df['variations_color'].apply(len) == 0, 'variations_color'] =  np.nan

Output:

  product_title  variations_color
0       T-shirt  [yellow, orange]
1       T-shirt               NaN
2       T-shirt     [blue, green]

given df,

df = pd.DataFrame({'product_title':['T-shirt']*3,
                   'variations_color':[['yellow', 'orange'],[],['blue', 'green']]})

However, if your datafame structure is like this:

df = pd.DataFrame({'product_title':['T-shirt']*3,
                   'variations_color':['[yellow, orange]','[]','[blue, green]']})

Then, you can use the following:

df.loc[df['variations_color'] == '[]', 'variations_color'] = np.nan

Output:

  product_title  variations_color
0       T-shirt  [yellow, orange]
1       T-shirt               NaN
2       T-shirt     [blue, green]

Note the difference in the first example

type(df.loc[0,'variations_color']) returns a list

And, the second returns str. The string representation of the dataframe are identical, so you can't tell just by looking at it when printing. It is always important in python to know what kind (datatype) of the object you're working with.

CodePudding user response:

import pandas as pd
df = pd.DataFrame({'product_title':['T-shirt']*3,
                   'variations_color':[['yellow', 'orange'],[],['blue', 'green']]})
df['variations_color'] = df['variations_color'].apply(lambda x: None if any(eval(str(x))) == False else x)
df

CodePudding user response:

Look here!

import pandas as pd
from io import StringIO

data = '''
product_title   variatons_color          
T-shirt          ['yellow','ornage'] 
T-shirt          []
T-shirt          ['blue','green']
'''

df = pd.read_csv(StringIO(data), delim_whitespace=True)
df.variatons_color = df.variatons_color.apply(eval)
df
'''
  product_title   variatons_color
0       T-shirt  [yellow, ornage]
1       T-shirt                []
2       T-shirt     [blue, green]
'''



type(df.iat[0, 1])
# list


df.mask(df.applymap(len) == 0, None)
'''
  product_title   variatons_color
0       T-shirt  [yellow, ornage]
1       T-shirt              None
2       T-shirt     [blue, green]
'''

Done!

  • Related