i'm using a dataset that contains a column "Streams" dtype: object and i just need to replace "," by "." to later use pandas.to_numeric() and convert String by float64. Is there a way to replace only the characters and keep the numbers?
Example: 48,633,449 to 48.633.449
Code:
import pandas as pd
import numpy as np
dados = pd.read_csv("spotify_dataset.csv")
dados.dropna()
dados['Streams'].replace(",", ".")
dados['Streams'] = pd.to_numeric(dados['Streams'])
dados.head()
and got this:
ValueError: Unable to parse string "48,633,449" at position 0
[Error]
CodePudding user response:
You are throwing away your replace
since you are not assigning it to anything. Unless you explicitly use inplace=True
arguments, Pandas methods do not change the current instance of an object (Series, Dataframes).
You can provide the result of replace
as the argument to the to_numeric
function
import pandas as pd
import numpy as np
dados = pd.read_csv("spotify_dataset.csv")
dados = dados.dropna()
dados['Streams'] = pd.to_numeric(dados['Streams'].replace(",", "."))
dados.head()
CodePudding user response:
You should be using .str.replace
instead of just .replace
.
dados['Streams'] = pd.to_numeric(dados['Streams'].str.replace(",", ""))
Also, I don't think your intention is to replace commas with decimals. That would result in the same error since multiple decimals are invalid.
CodePudding user response:
import pandas as pd
import numpy as np
dados = pd.read_csv("spotify_dataset.csv")
dados = dados.dropna()
dados['Streams'] = dados['Streams'].replace(",", ".")
dados['Streams'] = pd.to_numeric(dados['Streams'])
dados.head()