I am trying to make a matrix with my data:
matrix_model1 matrix_model2
7.0 2.0
4.0 4.0
30.0 20.0
4.0 8.0
I am trying to calculate the value of intersection:
m = []
for i in range(0, len(df2)):
m.append(len(set(df2['matrix_model1'].iloc[i]).intersection(df2['matrix_model2'].iloc[i])))
df2['Model1_Intersection'] = m
df2
But I am getting an error:
TypeError: 'numpy.float64' object is not iterable
I tried to change it to int
by .astype('Int64')
but it does not work, can anyone tell where am I going wrong?
CodePudding user response:
You're passing a single value into set()
which causes this error
>>> set(5.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object is not iterable
Your loop currently tries to create a set
for each value in matrix_model1
and matrix_model2
(which fails)
If you're trying to find the intersection between matrix_model1
to matrix_model2
, you may try
set(df2['matrix_model1']) & set(df2['matrix_model2'])
Alternatively, if you're trying to create a new column that denotes whether values in matrix_model1
and matrix_model2
match, then you may try
df2['intersection'] = [row['matrix_model2'] == row['matrix_model1'] for index, row in df2.iterrows()]
CodePudding user response:
As mentioned by @Sanidhya Singh, you cannot get the intersection of two arrays like that. Either convert both array values to sets or use np.intersect1d
if you want to work with arrays and do not want to convert them to sets:
import pandas as pd
import numpy as np
df = pd.DataFrame({"matrix_model1" : [7.0, 4.0, 30.0, 4.0], "matrix_model2": [2.0, 4.0, 20.0, 8.0]})
m = []
for i in range(len(df)):
m.append(len(set(np.intersect1d(df['matrix_model1'].iloc[i], df['matrix_model2'].iloc[i]))))
df['Model1_Intersection'] = m
df
Since you are only comparing floats
, I highly recommend using np.where
instead of looping through your data. It is much faster.
df = pd.DataFrame({"matrix_model1" : [7.0, 4.0, 30.0, 4.0], "matrix_model2": [2.0, 4.0, 20.0, 8.0]})
df['Model1_Intersection'] = np.where(df['matrix_model1'].eq(df['matrix_model2']), 1, 0)
df
--------------------------------------------------------
matrix_model1 matrix_model2 Model1_Intersection
0 7.0 2.0 0
1 4.0 4.0 1
2 30.0 20.0 0
3 4.0 8.0 0
--------------------------------------------------------