Is there are some effective ways to find a difference between two lists of integers in python? I need to compare a big number of integer lists of the same length with each other and calculation time is critical. I tried to use pandas, but it had one thing with probably slowed down my calculations: after comparing two series of integers in python it return a list of floats! As an example:
import pandas as pd
from numpy.random import randint
val_series1 = pd.Series(randint(0, 20, 10))
val_series2 = pd.Series(randint(0, 20, 10))
comp_series = val_series1.compare(val_series2)
comp_series
Output:
self other
0 6.0 12.0
1 1.0 12.0
2 17.0 15.0
3 3.0 15.0
5 10.0 5.0
6 17.0 6.0
7 7.0 17.0
8 7.0 14.0
9 18.0 9.0
comp_series.iloc[0]
self 6.0
other 12.0
Name: 0, dtype: float64
After that my future comparison should act with type float or loose time for function .astype(dtype='uint64').
CodePudding user response:
set(val_series1) - set(val_series2)
set
runs really fast in finding differences.
CodePudding user response:
This can be done in numpy
, consider following simple example
import numpy as np
arr1 = np.array([1,2,3,4,5])
arr2 = np.array([1,1,3,2,5])
arr = np.vstack([arr1,arr2])[:,arr1!=arr2].T
print(arr)
ooutput
[[2 1]
[4 2]]
Explanation: I use 1D arrays rather than Series, I stack them to get 2D array with height 2, then use indexing to get such pair where upper element is not equal lower, then transpose (.T
) to get output with 2 columns, 1st column hold self, 2nd columns hold other.