Home > Software engineering >  python pandas: attempting to replace value in row updates ALL rows
python pandas: attempting to replace value in row updates ALL rows

Time:07-11

I have a simple CSV file named input.csv as follows:

name,money
Dan,200
Jimmy,xd
Alice,15
Deborah,30

I want to write a python script that sanitizes the data in the money column: every value that has non-numerical characters needs to be replaced with 0

This is my attempt so far:

import pandas as pd

df = pd.read_csv(
       "./input.csv",
       sep = ","
)

# this line is the problem: it doesn't update on a row by row basis, it updates all rows
df['money'] = df['money'].replace(to_replace=r'[^0‐9]', value=0, regex=True)

df.to_csv("./output.csv", index = False)

The problem is that when the script runs, because the invalud money value xd exists on one of the rows, it will change ALL money values to 0 for ALL rows.

I want it to ONLY change the money value for the second data row (Jimmy) which has the invalid value.

this is what it gives at the end:

name,money
Dan,0
Jimmy,0
Alice,0
Deborah,0

but what I need it to give is this:

name,money
Dan,200
Jimmy,0
Alice,15
Deborah,30

What is the problem?

CodePudding user response:

You can use:

df['money'] = pd.to_numeric(df['money'], errors='coerce').fillna(0).astype(int)

The above assumes all valid values are integers. You can leave off the .astype(int) if you want float values.

Another option would be to use a converter function in the read_csv method. Again, this assumes integers. You can use float(x) in place of int(x) if you expect float money values:

def convert_to_int(x):
    try:
        return int(x)
    except ValueError:
        return 0

df = pd.read_csv(
    'input.csv', 
    converters={'money': convert_to_int}
)

CodePudding user response:

Some list comprehension could work for this (given the "money" column has no decimals):

df.money = [x if type(x) == int else 0 for x in df.money]

If you are dealing with decimals, then something like:

df.money = [x if (type(x) == int) or (type(x) == float) else 0 for x in df.money]

... will work. Just know that pandas will convert the entire "money" column to float (decimals).

  • Related