I would like to identify the index between two equal length lists that gives the second maximum absolute value of the difference between each row.
import random
import pandas as pd
random.seed(2)
l1 = pd.DataFrame([random.randrange(100) for _ in range(10)])
l2 = pd.DataFrame([random.randrange(100) for _ in range(10)])
l1-l2
0
0 -20
1 -66
2 6
3 -28
4 -66
5 74
6 30
7 -42
8 -18
9 -15
Now, I can use idxmax()
to get the index giving me the largest absolute value difference, which is row 5. My question is how can I get the index giving the second largest difference value?
(l1 - l2).abs().idxmax()
0 5
dtype: int64
CodePudding user response:
Option 1: The easy way: sort, then slice (complexity O(n log n)
)
(l1 - l2).abs().sort_values([0], ascending=False).index[1]
Option 2: nlargest
, then idxmin
(complexity O(n)
):
(l1 - l2).abs().nlargest(2, columns=[0]).idxmin()
Note your data actually have two rows with value 66
so you might get random answer between 1
and 4
.
CodePudding user response:
You could identify the largest absolute difference with idxmax()
then remove it from the list via its index and use idxmax()
again, which then would give you the index of the second-largest absolute difference.
l = (l1 - l2)
largest_index = l.abs().idxmax()
del l[largest_index]
l.idxmax()
Since it is not quite clear if you want the index of the second-largest absolute difference in the original (l1 - l2)
this option will achieve this.
l = (l1 - l2)
largest_index = l.abs().idxmax()
l[largest_index] = 0
l.idxmax()
By setting the larges absolute difference to zero, a second call will give you the index of the second-largest absolute difference, but not change the size of (l1 - l2)
nor alter its order.