I have a number of objects that I get via API. The object consists of several boolean fields.
I am struggling with counting the euclidian distance between my dataframe (df_survey) and each object that I get from API (df is the dataframe with all objects, df_first - first of them)
df_survey = pd.DataFrame([["True", "True", "False", "True", "True"]], columns=columns, index=["survey"])
similarities = np.zeros((data["count"], 1))
dataset = pd.json_normalize(data["results"])
df = pd.DataFrame(dataset, columns=columns, index=dataset.id-1)
df_first = pd.DataFrame(dataset.head(1), columns=columns, index=[0])
euclidean = scipy.spatial.distance.cdist(df_survey, df_first, metric='euclidean')
distance = pd.DataFrame(euclidean, columns=df_survey.index.values, index=df_first.index.values)
In this solution I get an error: ValueError: Unsupported dtype object
I also tried using scipy.spatial.distance.euclidean but it expects integer values, not boolean or str, maybe I can change every value to int but I don't know if there are better solutions.
Thanks in advance!
CodePudding user response:
You are declaring the booleans as strings and not actual booleans, since you're doing ["True","False"]
. You should declare them as [True, False]
without the quotes. In pandas, the string type is interpreted as a generic object type. That's why you see this error.
I suggest you to fix this and try to calculate the distance again. In case it doesn't work, just convert them to 0s and 1s.