Home > Enterprise >  How to replace multiple strings in pandas dataframe without memory issue?
How to replace multiple strings in pandas dataframe without memory issue?

Time:11-11

I have a large dataframe with (104959, 298) rows and columns

in the string column I have multiple substrings that I need to replace

I've tried

df.EVENT_DTL.replace(['SPOUSE_2','SPOUSE_nan','PARENT_2','PARENT_nan','GRANDPARENT_2','GRANDPARENT_nan','CHILD_2',
        'CHILD_nan','RELATIVE_2','RELATIVE_nan','LOVER_2','LOVER_nan','FRIEND_2','FRIEND_nan',
        '세부 대인관계문제 기타 상세_nan','세부 대인관계문제 기타 상세_','대인관계문제_1',
        '애인 관련_2','애인 관련_nan','직장 내_2','직장 내_nan','소외 문제_2','소외 문제_nan',
        '수면제_2','수면제_nan','진통제_2','진통제_nan','병원에서 처방 받은 약물_2',
        '병원에서 처방 받은 약물_nan','기타약물_nan','농약_2','농약_nan','살충제_2','살충제_nan',
        '제초제_2','제초제_nan','쥐약_2','쥐약_nan','화학약품_nan','목매달기_2','목매달기_nan',
        '가스 질식_2','가스 질식_nan','물에 뛰어들기_2','물에 뛰어들기_nan','뛰어내림_2',
        '뛰어내림_nan','칼, 송곳으로 찌르기_2','칼, 송곳으로 찌르기_nan','세부 동거자 기타 상세_nan'],"")

(I'm trying to delete all of the substrings above)

but it causes a memory error.

I've found a method to replace multiple substrings in a string but haven't found way to replace substrings in a dataframe

CodePudding user response:

Found the answer:

Replace multiple substrings in a Pandas series with a value

the trick is to avoid making dictionary and use regex

CodePudding user response:

You could iterate through the list of strings you want to replace as shown. Other ideas here

to_replace=['SPOUSE_2','SPOUSE_nan'...] #for example
for str_rep in to_replace:
    df.EVENT_DTL.replace(str_rep,'')
  • Related