I am having a data which i read in python. In a column 'title', few of the rows have extra characters like 'new' which i want to remove. i tried to find proper code but i couldnt find any, when i tried my own i got error.could anyone please help!!! Thanks in advance. title data
if indeed['title'] == indeed.loc[indeed['title'].str.startswith('new')].copy():
indeed['title'].str[3:]
error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-486b16b22bea> in <module>
----> 1 if indeed['title'] == indeed.loc[indeed['title'].str.startswith('new')].copy():
2 indeed['title'].str[3:]
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
1325 def __nonzero__(self):
1326 raise ValueError(
-> 1327 f"The truth value of a {type(self).__name__} is ambiguous. "
1328 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1329 )
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
if i'm understanding correctly, i read the column and search for the first 3 characters =='new', if matched, return the column without those first 3 characters in string, else return the column as is.
import pandas as pd
import numpy as np
mydata = pd.DataFrame(['software developer','newgis developer','javascript developer','newhr partner'])
mydata.columns=['title']
mydata['newcol']=np.where(mydata['title'].str[:3]=='new', mydata['title'].str[3:], mydata['title'])
print(mydata)
title newcol
0 software developer software developer
1 newgis developer gis developer
2 javascript developer javascript developer
3 newhr partner hr partner