I have a csv file with 200k rows and about 40 columns. Specific column contains special character '|' that i want to replace with '_'. However while doing str.replace and then .append i encounter OOM error on my 16GB RAM, there must be a more efficient way.
My code:
import os
import pandas as pd
import numpy as np
archive_loc = ('pathname')
data = pd.read_csv(os.path.join(archive_loc,'sample.csv'))
category = data['category'].values
category = category.tolist()
for string in category:
new_string = string.replace("|", "_")
category.append(new_string)
CodePudding user response:
Don't convert to a list and loop, do the replacement directly in the dataframe.
data['category'] = data['category'].str.replace('|', '_', regex=False)