TypeError: 'numpy.float64' object is not iterable: for dataframe-CodePudding

I am trying to make a matrix with my data:

matrix_model1   matrix_model2

7.0             2.0   
4.0             4.0
30.0            20.0
4.0             8.0

I am trying to calculate the value of intersection:

m = []
for i in range(0, len(df2)):
    m.append(len(set(df2['matrix_model1'].iloc[i]).intersection(df2['matrix_model2'].iloc[i])))
df2['Model1_Intersection'] = m
df2

But I am getting an error: TypeError: 'numpy.float64' object is not iterable I tried to change it to int by .astype('Int64') but it does not work, can anyone tell where am I going wrong?

CodePudding user response：

You're passing a single value into set() which causes this error

>>> set(5.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'float' object is not iterable

Your loop currently tries to create a set for each value in matrix_model1 and matrix_model2 (which fails)

If you're trying to find the intersection between matrix_model1 to matrix_model2, you may try

set(df2['matrix_model1']) & set(df2['matrix_model2'])

Alternatively, if you're trying to create a new column that denotes whether values in matrix_model1 and matrix_model2 match, then you may try

df2['intersection'] = [row['matrix_model2'] == row['matrix_model1'] for index, row in df2.iterrows()]

CodePudding user response：

As mentioned by @Sanidhya Singh, you cannot get the intersection of two arrays like that. Either convert both array values to sets or use np.intersect1d if you want to work with arrays and do not want to convert them to sets:

import pandas as pd
import numpy as np

df = pd.DataFrame({"matrix_model1" : [7.0, 4.0, 30.0, 4.0], "matrix_model2": [2.0, 4.0, 20.0, 8.0]})

m = []
for i in range(len(df)):
    m.append(len(set(np.intersect1d(df['matrix_model1'].iloc[i], df['matrix_model2'].iloc[i]))))
    
df['Model1_Intersection'] = m
df

Since you are only comparing floats, I highly recommend using np.where instead of looping through your data. It is much faster.

df = pd.DataFrame({"matrix_model1" : [7.0, 4.0, 30.0, 4.0], "matrix_model2": [2.0, 4.0, 20.0, 8.0]})
df['Model1_Intersection'] = np.where(df['matrix_model1'].eq(df['matrix_model2']), 1, 0)
df

--------------------------------------------------------
    matrix_model1   matrix_model2   Model1_Intersection
0   7.0             2.0             0
1   4.0             4.0             1
2   30.0            20.0            0
3   4.0             8.0             0
--------------------------------------------------------