I have a dataframe which I created from a dictionary like so:
pd.DataFrame.from_dict(dict1, dtype=str)
however , the datatypes for all fields are showing up as "Object"
I want to convert some of the columns to int and/or float, but I am unable to do it even after trying several ways.
I have tried the following ways :
df['duration'].astype(int)
df['duration'].astype(str).astype(int)
df['duration'].replace('"','').astype(int)
ValueError: invalid literal for int() with base 10: '"467900"'
df['cpu'].astype(float)
df['cpu'].astype(str).astype(float)
df['cpu'].replace('"','').astype(float)
ValueError: could not convert string to float: '"152.7"'
This is my dataframe :
duration realtime cpu
0 "268641" "46871" "152.7"
1 "208642" "2709" "107.1"
2 "208817" "2163" "108.2"
3 "238558" "9307" "141.1"
4 "208881" "2729" "106.7"
Please let me know how I can make this work.
Thanks in advance! Please let me know how I can get this to work.
Thanks in advance!
CodePudding user response:
df=df.replace(regex='[^\d\.]', value='')#Remove any non digits except the decimal point
#Then now convert as you want
df['realtime']=df['realtime'].astype(int)
df['cpu']=df['cpu'].astype(float)
CodePudding user response:
In Addition to @wwnde answer, you could also perfrom this operation in one line as follows:
df.replace(regex='[^\d\.]', value='').astype({
'duration' : int,
'realtime' : int,
'cpu' : float
})
Output:
duration realtime cpu
0 268641 46871 152.7
1 208642 2709 107.1
2 208817 2163 108.2
3 238558 9307 141.1
4 208881 2729 106.7
CodePudding user response:
the problem i can see here is that you have a " in the string. The correct represntation of your string is "268641". A dirty fix would be:
df[duration] = list(map(lambda x: x.replace('"',''),df[duration]))
df[realtime] = list(map(lambda x: x.replace('"',''),df[realtime]))
df[cpu] = list(map(lambda x: x.replace('"',''),df[cpu]))
then you can try
df[cpu].astype(int)
CodePudding user response:
One other way to make your code work (because you were almost correct).
When you want to use a string function on a column, you need to add .str before it.
Like this
df['duration'].str.replace('"','').astype(float)