Home > Blockchain >  Mixed Format Numbers
Mixed Format Numbers

Time:04-15

In work, we use oracle sql database, in times on times(rarely, but happens), the database is feed with data in wrong format, like this:

Sales Price
s1 10.00
s2 10,00
s3 10

All lines has same price, but in different formats, how can i standardize price column in same format using python?

Follow the code used:

import pandas as pd
import cx_Oracle
import numpy as np

cx_Oracle.init_oracle_client(path to oracle client)

def connect(user, password, host):
    connection = cx_Oracle.connect(user=user, password = password, dsn = host)
    cursor = connection.cursor()
    return cursor

def sql(query,cursor):
    cursor.execute(query)
    result = cursor.fetchall()
    cols = [i[0] for i in cursor.description]
    df = pd.DataFrame(result, columns=[cols])
    return df

query = """
querie
"""

df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)

CodePudding user response:

Looking at your code, the problem is that python is recognizing commas as decimal separator.

Therefore, you can change the comma from the cursor.fetchall() response and then construct the dataframe.

import pandas as pd
import cx_Oracle
import numpy as np

cx_Oracle.init_oracle_client(path to oracle client)

def connect(user, password, host):
    connection = cx_Oracle.connect(user=user, password = password, dsn = host)
    cursor = connection.cursor()
    return cursor

def sql(query,cursor):
    cursor.execute(query)
    result = cursor.fetchall()
    new_result = [[str(i).replace(',', '.') for i in r] for r in result]
    cols = [i[0] for i in cursor.description]
    df = pd.DataFrame(result, columns=[cols])
    return df

query = """
querie
"""

df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)

if is still recognize Price column as string, you can convert using:

df['Price'] = df['Price'].astype(float)

Hope it helps!

CodePudding user response:

# import
import pandas as pd

# test values
df = pd.DataFrame({'Sales': ['s1', 's2', 's3', 's4'], 'Price': ['10.00', '10,00', 10, 9]})

# convert all to string/object type for consistency
# can comment this out if all values are already string/object type
df['Price'] = df['Price'].astype(str)
# replace comma with period
df['Price'] = df['Price'].str.replace(',', '.')

# get index of values that do not have decimal places (period)
index = df[df['Price'].str.contains('\.') == False].index

# pad decimal to values that do not have decimal places
df.loc[index, 'Price'] = df.loc[index, 'Price']   '.00'

As a last step you can optionally convert the values back to float/decimal if needed

CodePudding user response:

Easiest would be to inject a type handler. The following swaps commas and periods, but you can adjust it as needed. In a pure cx_Oracle example:

def type_handler(cursor, name, default_type, size, precision, scale):
    if default_type == oracledb.DB_TYPE_NUMBER:
        return cursor.var(oracledb.DB_TYPE_VARCHAR, arraysize=cursor.arraysize,
                outconverter=lambda v: v.replace('.', ','))

conn.outputtypehandler = type_handler
cursor.execute("select 2.5 from dual")
  • Related