I am trying to evaluate two floating point values in a loop and for some reason the evaluation returns 1/0 instead of True/False.
def new_row(item1, item2):
new_row = {
'lister': item1,
'metric': item2
}
return new_row
final_df = pd.DataFrame()
lister = ['a', 'b', 'c']
position = [1.1, 2.3, 4.5]
evaluation_metric = [0, 0.5, 0.2]
for b1 in lister:
print(abs(position) > evaluation_metric)
metric = (abs(position) > evaluation_metric)
nr = new_row(lister, metric)
final_df = final_df.append(nr, ignore_index=True)
For some reason when I print I get True but when I append it to the final df
I get 1.0. Any thoughts on how to get True in the final_df
instead of 1.0?
CodePudding user response:
You created a dataframe without columns so pandas had to guess what to do when a row was appended. In a similar experiment, it chose float64
:
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> final = df.append({"lister":"a", "metric":False}, ignore_index=True)
>>> final
lister metric
0 a 0.0
>>> final.dtypes
lister object
metric float64
dtype: object
You could fix the dtype after you've done the appends
>>> final["metric"] = final["metric"].astype(bool)
>>> final
lister metric
0 a False
But you likely shouldn't be appending in the first place. pandas
lets you perform operations on entire columns. Create columns from your lists first, then do the operation in a single step, as in
import pandas as pd
lister = ['a', 'b', 'c']
position = [1.1, 2.3, 4.5]
evaluation_metric = [0, 0.5, 0.2]
df = pd.DataFrame({"lister":lister, "position":position,
"evaluation_metric":evaluation_metric})
df["metric"] = df["position"] > df["evaluation_metric"]
print(df)
Output
lister position evaluation_metric metric
0 a 1.1 0.0 True
1 b 2.3 0.5 True
2 c 4.5 0.2 True
If you don't need those other columns any more, you can drop them
df.drop(["position", "evaluation_metric"], axis=1, inplace=True)
CodePudding user response:
Although the code that you posted won't run (and it would be advised you fix this so your question isn't closed), the issue is that appending rows with type bool
to an empty DataFrame will lead to conversion to float64
(according to this answer):
For example:
for l,p,e in zip(lister,position,evaluation_metric):
metric = (abs(p) > e)
nr = new_row(l, metric)
final_df = final_df.append(nr, ignore_index=True)
>>> final_df.dtypes
lister object
metric float64
You can fix this by modifying your new_row
function to return a DataFrame, then concatenating this your to final_df in each loop iteration:
def new_row(item1, item2):
new_row = {
'lister': [item1],
'metric': [item2]
}
return pd.DataFrame(new_row)
final_df = pd.DataFrame()
lister = ['a', 'b', 'c']
position = [1.1, 2.3, 4.5]
evaluation_metric = [0, 0.5, 0.2]
for l,p,e in zip(lister,position,evaluation_metric):
metric = (abs(p) > e)
nr = new_row(l, metric)
final_df = pd.concat([final_df,nr])
Output:
>>> final_df
lister metric
0 a True
0 b True
0 c True