Home > front end >  Replacing a string with NaN or 0
Replacing a string with NaN or 0

Time:11-26

I have a data file that I'm cleaning, and the source uses '--' to indicate missing data. I ultimately need to have this data field be either an integer or float. But I am not sure how to remove the string.

I specified the types in a type_dict statement before importing the csv file. 6 of my 8 variables correctly came in as an integer or float. Of course, the two that are still objects are the ones I need to fix.

I've tried using the df = df.var.str.replace('--', '') I've tried using the df.var.fillna(df.var.mode().values[0], inplace=True) (and I wonder if I need to just change the values '0' to '--')

My presumption is that if I can empty those cells in some fashion, I can define the variable as an int/float.

I'm sure I'm missing something really simple, have walked away and come back, but am just not figuring it out.

CodePudding user response:

try something like this cleaning input before antering into pandas

import sys
from io import StringIO
import pandas as pd

with open('data.txt', 'r') as file:
    data = StringIO(file.read().replace('--', '0'))

df = pd.read_csv(data)


CodePudding user response:

OK, we figured out two options to make this work:

solution 1: df = df.replace(r'^--$', np.nan, regex=True)

solution 2 (a simplified version of #1): df = df.replace(r'--', np.nan)

Both gave the expected output of empty cells when I exported the csv into a spreadsheet. And then when I reimported that intermediate file, I had floats instead of strings as expected.

  • Related