Home > Back-end >  Compare size of lists in multiple columns
Compare size of lists in multiple columns

Time:01-06

I have the following data frame:

df = pd.DataFrame([([40.33, 40.34, 40.22],[-71.11, -71.21, -71.14],[12, 45, 10]), ([41.23, 41.40, 41.22],[-72.01, -72.01, -72.01],[11, 23, 15]), ([43.33, 43.34],[-70.11, -70.21],[12, 40]), ([41.23, 41.40], [-72.01, -72.01, -72.01], [11, 23, 15])], columns=['long', 'lat', 'accuracy'])

long                       lat                           accuracy
[40.33, 40.34, 40.22]      [-71.11, -71.21, -71.14]     [12, 45, 10]
[41.23, 41.40, 41.22]      [-72.01, -72.01, -72.01]     [11, 23, 15]
[43.33, 43.34]             [-70.11, -70.21]             [12, 40]
[41.23, 41.40]             [-72.01, -72.01, -72.01]     [11, 23, 15]
...

Each column contains a list of floats. I want to check if in each row in all three columns, the sizes of these lists are the same. What is the best way to do this, return another column named sanity with TRUE if all lists have the same size, FALSE if at least one list has a different size compared to the rest?

The expected output is:

long                       lat                           accuracy      sanity
[40.33, 40.34, 40.22]      [-71.11, -71.21, -71.14]     [12, 45, 10]    TRUE
[41.23, 41.40, 41.22]      [-72.01, -72.01, -72.01]     [11, 23, 15]    TRUE
[43.33, 43.34]             [-70.11, -70.21]             [12, 40]        TRUE
[41.23, 41.40]             [-72.01, -72.01, -72.01]     [11, 23, 15]    FALSE

CodePudding user response:

You can approach this with applymap and nunique :

df["sanity"] = df.applymap(len).nunique(axis=1).eq(1)

# Output :

print(df) 

                    long                       lat      accuracy  sanity
0  [40.33, 40.34, 40.22]  [-71.11, -71.21, -71.14]  [12, 45, 10]    True
1   [41.23, 41.4, 41.22]  [-72.01, -72.01, -72.01]  [11, 23, 15]    True
2         [43.33, 43.34]          [-70.11, -70.21]      [12, 40]    True
3          [41.23, 41.4]  [-72.01, -72.01, -72.01]  [11, 23, 15]   False

CodePudding user response:

 df['new_col'] = df.stack().str.len().unstack().nunique(axis=1)
  • Related