Home > Blockchain >  Effective way of finding difference between two lists of integers in python
Effective way of finding difference between two lists of integers in python

Time:08-04

Is there are some effective ways to find a difference between two lists of integers in python? I need to compare a big number of integer lists of the same length with each other and calculation time is critical. I tried to use pandas, but it had one thing with probably slowed down my calculations: after comparing two series of integers in python it return a list of floats! As an example:

import pandas as pd
from numpy.random import randint

val_series1 = pd.Series(randint(0, 20, 10))
val_series2 = pd.Series(randint(0, 20, 10))

comp_series = val_series1.compare(val_series2)
comp_series

Output:

    self    other
0   6.0     12.0
1   1.0     12.0
2   17.0    15.0
3   3.0     15.0
5   10.0    5.0
6   17.0    6.0
7   7.0     17.0
8   7.0     14.0
9   18.0    9.0

comp_series.iloc[0]
self      6.0
other    12.0
Name: 0, dtype: float64

After that my future comparison should act with type float or loose time for function .astype(dtype='uint64').

CodePudding user response:

set(val_series1) - set(val_series2)

set runs really fast in finding differences.

CodePudding user response:

This can be done in numpy, consider following simple example

import numpy as np
arr1 = np.array([1,2,3,4,5])
arr2 = np.array([1,1,3,2,5])
arr = np.vstack([arr1,arr2])[:,arr1!=arr2].T
print(arr)

ooutput

[[2 1]
 [4 2]]

Explanation: I use 1D arrays rather than Series, I stack them to get 2D array with height 2, then use indexing to get such pair where upper element is not equal lower, then transpose (.T) to get output with 2 columns, 1st column hold self, 2nd columns hold other.

  • Related