I have implemented ANN regression on a dataset. The actual values and results are present in a dataframe. I want to calculate bias for each observation. However, the predictions are collected as given below. Consider df (after adding the results i.e., column predicted) is the dataframe that I have been working on for your reference.
import pandas as pd
actual=[[11.4],[32.46],[66.37]]
df = pd.DataFrame(actual,columns=['actual'])
#some code for ann
#following are predictions
predicted=['[11.14]','[33.6]','[66.7]']
df['predicted']=predicted
print(df.info())
x= df['predicted'].values.flatten()
print(x)
print(type(x))
print(type(x[1]))
#bias calculation
#bias= df['actual']-df['predicted']
#print(bias)
following is the out put of above code.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 actual 3 non-null float64
1 predicted 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0 bytes
None
actual predicted
0 11.40 [11.14]
1 32.46 [33.6]
2 66.37 [66.7]
<class 'numpy.ndarray'>
<class 'str'>
Is there any way I can calculate the bias, assuming I have only final dataframe df (i.e., after adding the ann results).
CodePudding user response:
If in case you ANN is giving you a result as a list strings predicted=['[11.14]','[33.6]','[66.7]']
then you will need to use ast
to convert list of strings to literal list of lists.
import pandas as pd
import ast
actual=[[11.4],[32.46],[66.37]]
df = pd.DataFrame(actual,columns=['actual'])
#some code for ann
#following are predictions
predicted=['[11.14]','[33.6]','[66.7]']
df['predicted']=predicted
df['predicted']=df['predicted'].apply(ast.literal_eval) # convert list of strings to list of list
df['predicted']=df['predicted'].apply(lambda x:x[0]) #get the 0th element (the value) from the list
# the reason for the above statement is to ensure
# that we get the actual value in dataframe instead of a list because our actual column also have a value
bias=df['actual']-df['predicted']
print (bias)
### OUTPUT
'''
0 0.26
1 -1.14
2 -0.33
dtype: float64
'''