Converting data type of values in a column of dataframe-CodePudding

I have implemented ANN regression on a dataset. The actual values and results are present in a dataframe. I want to calculate bias for each observation. However, the predictions are collected as given below. Consider df (after adding the results i.e., column predicted) is the dataframe that I have been working on for your reference.

import pandas as pd
actual=[[11.4],[32.46],[66.37]]
df = pd.DataFrame(actual,columns=['actual'])
#some code for ann
#following are predictions
predicted=['[11.14]','[33.6]','[66.7]']
df['predicted']=predicted
print(df.info())
x= df['predicted'].values.flatten()
print(x)
print(type(x))
print(type(x[1]))
#bias calculation
#bias= df['actual']-df['predicted']
#print(bias)

following is the out put of above code.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   actual     3 non-null      float64
 1   predicted  3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 176.0  bytes
None
actual predicted
0   11.40   [11.14]
1   32.46    [33.6]
2   66.37    [66.7]
<class 'numpy.ndarray'>
<class 'str'>

Is there any way I can calculate the bias, assuming I have only final dataframe df (i.e., after adding the ann results).

CodePudding user response：

If in case you ANN is giving you a result as a list strings predicted=['[11.14]','[33.6]','[66.7]'] then you will need to use ast to convert list of strings to literal list of lists.

import pandas as pd
import ast
actual=[[11.4],[32.46],[66.37]]
df = pd.DataFrame(actual,columns=['actual'])
#some code for ann
#following are predictions
predicted=['[11.14]','[33.6]','[66.7]']
df['predicted']=predicted
df['predicted']=df['predicted'].apply(ast.literal_eval) # convert list of strings to list of list

df['predicted']=df['predicted'].apply(lambda x:x[0]) #get the 0th element (the value) from the list
# the reason for the above statement is to ensure 
# that we get the actual value in dataframe instead of a list because our actual column also have a value

bias=df['actual']-df['predicted']
print (bias)


### OUTPUT
'''
0    0.26
1   -1.14
2   -0.33
dtype: float64
'''