I have a large dataframe with (104959, 298) rows and columns
in the string column I have multiple substrings that I need to replace
I've tried
df.EVENT_DTL.replace(['SPOUSE_2','SPOUSE_nan','PARENT_2','PARENT_nan','GRANDPARENT_2','GRANDPARENT_nan','CHILD_2',
'CHILD_nan','RELATIVE_2','RELATIVE_nan','LOVER_2','LOVER_nan','FRIEND_2','FRIEND_nan',
'세부 대인관계문제 기타 상세_nan','세부 대인관계문제 기타 상세_','대인관계문제_1',
'애인 관련_2','애인 관련_nan','직장 내_2','직장 내_nan','소외 문제_2','소외 문제_nan',
'수면제_2','수면제_nan','진통제_2','진통제_nan','병원에서 처방 받은 약물_2',
'병원에서 처방 받은 약물_nan','기타약물_nan','농약_2','농약_nan','살충제_2','살충제_nan',
'제초제_2','제초제_nan','쥐약_2','쥐약_nan','화학약품_nan','목매달기_2','목매달기_nan',
'가스 질식_2','가스 질식_nan','물에 뛰어들기_2','물에 뛰어들기_nan','뛰어내림_2',
'뛰어내림_nan','칼, 송곳으로 찌르기_2','칼, 송곳으로 찌르기_nan','세부 동거자 기타 상세_nan'],"")
(I'm trying to delete all of the substrings above)
but it causes a memory error.
I've found a method to replace multiple substrings in a string but haven't found way to replace substrings in a dataframe
CodePudding user response:
Found the answer:
Replace multiple substrings in a Pandas series with a value
the trick is to avoid making dictionary and use regex
CodePudding user response:
You could iterate through the list of strings you want to replace as shown. Other ideas here
to_replace=['SPOUSE_2','SPOUSE_nan'...] #for example
for str_rep in to_replace:
df.EVENT_DTL.replace(str_rep,'')