Home > front end >  How to delete a certain value in a cell in columns of csv using pandas
How to delete a certain value in a cell in columns of csv using pandas

Time:03-29

I need help with deleting "None" along with extra comma in language columns that have one or more language

Here is the existing csv:

f = pd.DataFrame({'Movie': ['name1','name2','name3','name4'],
                  'Year': ['1905', '1905','1906','1907'],
                  'Id': ['tt0283985', 'tt0283986','tt0284043','tt3402904'],
                  'language':['Mandarin,None','None,Cantonese','Mandarin,None,Cantonese','None,Cantonese']})

Where f now looks like:

   Movie  Year         Id   language
0  name1  1905  tt0283985  Mandarin,None
1  name2  1905  tt0283986  None,Cantonese
2  name3  1906  tt0284043  Mandarin,None,Cantonese
3  name4  1907  tt3402904  None,Cantonese

And the result should be like this:

   Movie  Year         Id             language
0  name1  1905  tt0283985            Mandarian
1  name2  1905  tt0283986            Cantonese
2  name3  1906  tt0284043            Mandarin,Cantonese
3  name4  1907  tt3402904            Cantonese

There are also other columns that have only 'None' values in language column, so I can't just use the replace function in excel, and there's also a problem of extra "," after doing that. So I may need help with a new way using pandas or something. Thanks in advance!

CodePudding user response:

You could just remove all the None values as follows:

df['language'] = df['language'].str.replace('None,', '')

and then wherever the language column is empty, you could insert a 'None' value using regex.

df['language'] = df['language'].replace(r'^\s*$', 'None', regex=True)

CodePudding user response:

You can use replace method to remove 'None' and ','

for i in range(len(f)):
f.loc[i,"language"].replace('None','')
f.loc[i,"language"].replace(',','')
print(f)

CodePudding user response:

You could achieve it this way,

f["language"] = f.apply(
    lambda x: ",".join(filter(lambda y: y != "None", x.language.split(","))), axis=1
)

Or much better

f["language"] = f.apply(
    lambda x: ",".join([y for y in x.language.split(",") if y != "None"]), axis=1
)
  • Related