I have a data frame consisting of lists as elements. I want to subtract a value from each list and find the index of the minimum. I want to find the value corresponding to each list in another column.
My code:
df = pd.DataFrame({'A':[[1,2,3],[1,3,5,6]]})
df
A B
0 [1, 2, 3] [10, 20, 30]
1 [1, 3, 5, 6] [10, 30, 50, 60]
# lets subtract 2 from A, find index of minimum in this result and find corresponding element in the B column
val = 2
df['A_new_min'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)
df['B_new'] = df[['A_new_min','B']].apply(lambda x: x[1][x[0]],axis=1)
Present solution: It produces a correct solution but I don't to want to store the A_new_min
and it is unnecessary. I am looking if it is possible to get this result in one line of code?
df =
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
Expected solution:
How can I obtain the below solution directly without having to create an additional and unnecessary column A_new_min
? In simple words, I would like to
df =
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
CodePudding user response:
With apply
:
df["B_new"] = df.apply(lambda row: row["B"][np.argmin(abs(np.array(row["A"])-val))], axis=1)
>>> df
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
CodePudding user response:
IMO, the most efficient is to use only a list comprehension.
B_new only:
df['B_new'] = [b[min(range(len(a)), key=lambda x: abs(a[x]-val))]
for a,b in zip(df['A'], df['B'])]
output:
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
both columns:
df2 = pd.DataFrame([[(i:=min(range(len(a)), key=lambda x: abs(a[x]-val))), b[i]]
for a,b in zip(df['A'], df['B'])], columns=['A_new_min', 'B_new'])
df.join(df2)
output:
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
timings (on 200k rows)
# @mozway (option #1)
290 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke (list comprehension)
340 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke list comprehension numpy
968 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @not_speshal
4.12 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
CodePudding user response:
One option is to use a list comprehension:
df['new'] = [arr[i] for i, arr in zip(df['A'].map(np.array).sub(2).abs().map(np.argmin), df['B'])]
Another option is to never convert to numpy array at all and stick with lists:
df['new'] = [b[min(enumerate([abs(x-2) for x in a]), key=lambda x:x[1])[0]] for a,b in zip(df['A'], df['B'])]
Output:
A B new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10