Home > Mobile >  How to remove quotes from Numeric data in Python
How to remove quotes from Numeric data in Python

Time:04-19

I have one numeric feature in a data frame but in excel some of the values contain quotes which need to be removed.

Below table is what my data appears to be in Excel file now I want to remove quotes from last 3 rows using python.

Col1 Col2
123 A
456 B
789 C
"123" D
"456" E
"789" F

I have used following code in Python:

df["Col1"] = df['Col1'].replace('"', ' ').astype(int)

But above code gives me error message: invalid literal for int() with base 10: '"123"'.

I have also tried strip() function but still it is not working.

If I do not convert the data type and use below code

df["Col1"] = df['Col1'].replace('"', ' ')

Then the code is getting executed without any error however while saving the file into CSV it is still showing quotes.

CodePudding user response:

One way is to use converter function while reading Excel file. Something along those lines (assuming that data provided is in Excel file in columns 'A' and 'B'):

import pandas as pd

def conversion(value):
    if type(value) == int:
        return value
    else:
        return value.strip('"')

df = pd.read_excel('remove_quotes_excel.xlsx', header=None,
                   converters={0: conversion})

# df
     0  1
0  123  A
1  456  B
2  789  C
3  123  D
4  456  E
5  789  F

Both columns are object type, but now (if needed) it is straightforward to convert to int:

df[0] = df[0].astype(int)

CodePudding user response:

You can do it by using this code. (Regex is if you get a warning)

df.Col1.replace('\"', '', regex = True, inplace = True)

CodePudding user response:

First convert the Col1 into series

df_Series = df['Col1']

Apply replace function on series

df_Series = df_Series.replace('"','').astype(int)

then append the Series into df data frame.

CodePudding user response:

Convert the values in the 'Col1' column of the dataframe 'df' to numeric values. If any of the values in the column cannot be converted to a numeric value, it will be set to NaN.

 #df['Col1'] = df['Col1'].astype(int)
 df['Col1'] = pd.to_numeric(df['Col1'], errors='coerce')
  • Related