Hello friends Faced a problem. As an input, I get a price column from csv or xlsx. The data contains str, int and float. And there are commas instead of periods. At the output, I need a float. When I try to remove commas, my data is overwritten with zeros. How to be?
df = pd.DataFrame({'price': ['21', 22.0, '23', 24.0, 25,
'26,0', 27, '28', 29, 30, 31, 32, 33,
34, 35.0, 36.0, 37],})
def data_structure_change(df):
"""1.Change commas to points
2.All empty values are assigned zero.
3.Change data type to float
4.Change data type to integer"""
try:
df['price'] = df['price'].str.replace(',', '.',)
except AttributeError:
pass
finally:
df.price = df.price.fillna(0)
df.price = df.price.astype('float')
return df
CodePudding user response:
I have added the apply
method to a function (called it conv_type()
) that identifies and edits the types correctly. This is fast as it is vectorised.
This would work:
import pandas as pd
df = pd.DataFrame({'price': ['21', 22.0, '23', 24.0, 25,
'26,0', 27, '28', 29, 30, 31, 32, 33,
34, 35.0, 36.0, 37],})
def data_structure_change(df: pd.DataFrame):
# apply the type conversion
df['price'] = df['price'].apply(conv_type)
return df
def conv_type(x):
"""1.Change commas to points
2.All empty values are assigned zero.
3.Change data type to float
4.Change data type to integer"""
if type(x) == str:
x = x.replace(',','.')
x = float(x)
return x
if type(x) == int:
x = float(x)
return x
if type(x) == float:
return x
x = data_structure_change(df)
print(x)
result:
price
0 21.0
1 22.0
2 23.0
3 24.0
4 25.0
5 26.0
6 27.0
7 28.0
8 29.0
9 30.0
10 31.0
11 32.0
12 33.0
13 34.0
14 35.0
15 36.0
16 37.0