Home > other >  ValueError: could not convert string to float: '"152.7"'
ValueError: could not convert string to float: '"152.7"'

Time:12-06

I have a dataframe which I created from a dictionary like so: pd.DataFrame.from_dict(dict1, dtype=str)

however , the datatypes for all fields are showing up as "Object"

I want to convert some of the columns to int and/or float, but I am unable to do it even after trying several ways.

I have tried the following ways :

df['duration'].astype(int) 
df['duration'].astype(str).astype(int) 
df['duration'].replace('"','').astype(int) 

ValueError: invalid literal for int() with base 10: '"467900"'

df['cpu'].astype(float)
df['cpu'].astype(str).astype(float)
df['cpu'].replace('"','').astype(float)

ValueError: could not convert string to float: '"152.7"'

This is my dataframe :

    duration    realtime    cpu
0   "268641"    "46871" "152.7"
1   "208642"    "2709"  "107.1"
2   "208817"    "2163"  "108.2"
3   "238558"    "9307"  "141.1"
4   "208881"    "2729"  "106.7"

Please let me know how I can make this work.

Thanks in advance! Please let me know how I can get this to work.

Thanks in advance!

CodePudding user response:

df=df.replace(regex='[^\d\.]', value='')#Remove any non digits except the decimal point

#Then now convert as you want
df['realtime']=df['realtime'].astype(int)
df['cpu']=df['cpu'].astype(float)

CodePudding user response:

In Addition to @wwnde answer, you could also perfrom this operation in one line as follows:

df.replace(regex='[^\d\.]', value='').astype({
    'duration' : int,
    'realtime' : int,
    'cpu' : float
})

Output:

   duration  realtime    cpu
0    268641     46871  152.7
1    208642      2709  107.1
2    208817      2163  108.2
3    238558      9307  141.1
4    208881      2729  106.7

CodePudding user response:

the problem i can see here is that you have a " in the string. The correct represntation of your string is "268641". A dirty fix would be:

df[duration] = list(map(lambda x: x.replace('"',''),df[duration]))

df[realtime] = list(map(lambda x: x.replace('"',''),df[realtime]))

df[cpu] = list(map(lambda x: x.replace('"',''),df[cpu]))

then you can try

df[cpu].astype(int)

CodePudding user response:

One other way to make your code work (because you were almost correct).

When you want to use a string function on a column, you need to add .str before it.

Like this

df['duration'].str.replace('"','').astype(float) 
  • Related