Home > database >  Memory efficient alternative to str.replace()
Memory efficient alternative to str.replace()

Time:05-25

I have a csv file with 200k rows and about 40 columns. Specific column contains special character '|' that i want to replace with '_'. However while doing str.replace and then .append i encounter OOM error on my 16GB RAM, there must be a more efficient way.

My code:

import os
import pandas as pd
import numpy as np

archive_loc = ('pathname')
data = pd.read_csv(os.path.join(archive_loc,'sample.csv'))

category = data['category'].values
category = category.tolist()

for string in category:
     new_string = string.replace("|", "_")
     category.append(new_string)

CodePudding user response:

Don't convert to a list and loop, do the replacement directly in the dataframe.

data['category'] = data['category'].str.replace('|', '_', regex=False)
  • Related