Home > database >  Conversion of data types of multiple columns - Issue
Conversion of data types of multiple columns - Issue

Time:01-28

I have been trying to write the following code and it does not let me to convert the columns Amount currency,Amount in to float. Below is my query

import pandas as pd
from datetime import datetime
df= pd.read_csv(r'C:\Users\Desktop\CustomerData.csv')

parsed = pd.to_datetime(df["Date"], errors="coerce").fillna(pd.to_datetime(df["Date"],format="%Y-%d-%m",errors="coerce"))

ordinal = pd.to_numeric(df["Date"], errors="coerce").apply(lambda x: pd.Timestamp("1899-12-30") pd.Timedelta(x, unit="D"))
df["Date"] = parsed.fillna(ordinal)

newdf = df.loc[(df.Type == "Sales Invoice")]

df2 = newdf.groupby(['Date','Customer','Type'])["Amount currency", "Amount"].apply(lambda x : x.astype(float).sum())

But i am getting the below error

could not convert string to float: '10,084.80'

And in the csv file, I have checked the column contents. But in the number columns, it does not contain any special character.

the below is the general info of my data set

Data columns (total 5 columns):
   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Date             80 non-null     datetime64[ns]
 1   Customer         80 non-null     object        
 2   Amount currency  80 non-null     object        
 3   Type             80 non-null     object        
 4   Amount           80 non-null     object        
dtypes: datetime64[ns](1), object(4)

These columns, Amount currency and Amount cannot be converted in to float type with the above codes.

These are the steps I did -

1.Importing the data set

2.Cleaning the dte column and arranging the data

3.converting the date column to YYY-MM format

4.creating a group by after selecting the type in to "Sales Invoice"

5.Group by sum will generate a set of multiple data instead of sum of the values.

For that, need to convert the values in to float

Any one to help?

Thanks

CodePudding user response:

You can use the following code:

df['Amount currency'] = df['Amount currency'].str.replace(r'[^0-9\.]', '', regex=True)
df['Amount currency'] = pd.to_numeric(df['Amount currency'])

The first line will delete any characters that are not a digit or a dot. So, instead of 10,084.80, you will get '10084.80'. Then, by executing the second line, you will successfully convert that to a float.

Beware, if you have negative numbers (i.e. -10,084.80), the first line will delete the '-' and therefore you'll end up with a positive float. If you need to keep the '-' to have a negative float, just change the regex expression to r'[^0-9\.\-]', which will keep the minus and get you -10084.80.

  • Related