I need help with deleting "None" along with extra comma in language
columns that have one or more language
Here is the existing csv:
f = pd.DataFrame({'Movie': ['name1','name2','name3','name4'],
'Year': ['1905', '1905','1906','1907'],
'Id': ['tt0283985', 'tt0283986','tt0284043','tt3402904'],
'language':['Mandarin,None','None,Cantonese','Mandarin,None,Cantonese','None,Cantonese']})
Where f
now looks like:
Movie Year Id language
0 name1 1905 tt0283985 Mandarin,None
1 name2 1905 tt0283986 None,Cantonese
2 name3 1906 tt0284043 Mandarin,None,Cantonese
3 name4 1907 tt3402904 None,Cantonese
And the result should be like this:
Movie Year Id language
0 name1 1905 tt0283985 Mandarian
1 name2 1905 tt0283986 Cantonese
2 name3 1906 tt0284043 Mandarin,Cantonese
3 name4 1907 tt3402904 Cantonese
There are also other columns that have only 'None' values in language column, so I can't just use the replace function in excel, and there's also a problem of extra "," after doing that. So I may need help with a new way using pandas or something. Thanks in advance!
CodePudding user response:
You could just remove all the None values as follows:
df['language'] = df['language'].str.replace('None,', '')
and then wherever the language column is empty, you could insert a 'None' value using regex.
df['language'] = df['language'].replace(r'^\s*$', 'None', regex=True)
CodePudding user response:
You can use replace method to remove 'None' and ','
for i in range(len(f)):
f.loc[i,"language"].replace('None','')
f.loc[i,"language"].replace(',','')
print(f)
CodePudding user response:
You could achieve it this way,
f["language"] = f.apply(
lambda x: ",".join(filter(lambda y: y != "None", x.language.split(","))), axis=1
)
Or much better
f["language"] = f.apply(
lambda x: ",".join([y for y in x.language.split(",") if y != "None"]), axis=1
)