Pandas universal data structure conversion-CodePudding

Hello friends Faced a problem. As an input, I get a price column from csv or xlsx. The data contains str, int and float. And there are commas instead of periods. At the output, I need a float. When I try to remove commas, my data is overwritten with zeros. How to be?

df = pd.DataFrame({'price': ['21', 22.0, '23', 24.0, 25, 
                             '26,0', 27, '28', 29, 30, 31, 32, 33, 
                              34, 35.0, 36.0, 37],})

def data_structure_change(df):    
    """1.Change commas to points
    2.All empty values are assigned zero.
    3.Change data type to float
    4.Change data type to integer"""
    
    try:
        df['price'] = df['price'].str.replace(',', '.',)
    except AttributeError:
        pass
    finally:
        df.price = df.price.fillna(0)
        df.price = df.price.astype('float')
    return df

CodePudding user response：

I have added the apply method to a function (called it conv_type()) that identifies and edits the types correctly. This is fast as it is vectorised.

This would work:

import pandas as pd
df = pd.DataFrame({'price': ['21', 22.0, '23', 24.0, 25, 
                             '26,0', 27, '28', 29, 30, 31, 32, 33, 
                              34, 35.0, 36.0, 37],})

def data_structure_change(df: pd.DataFrame):    

    # apply the type conversion
    df['price'] = df['price'].apply(conv_type)
    return df


def conv_type(x):
    """1.Change commas to points
    2.All empty values are assigned zero.
    3.Change data type to float
    4.Change data type to integer"""
    if type(x) == str:
        x = x.replace(',','.')
        x = float(x)
        return x

    if type(x) == int:
        x = float(x)
        return x
    
    if type(x) == float:
        return x

x = data_structure_change(df)
print(x)

result: