I have a dataframe called test_df (below) and I am trying to search for the ID based on two criteria: the Factor and the name.
Here's my code:
test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()
But I get the following error:
Traceback (most recent call last):
File "C:/Users/v3.py", line 508, in <module>
test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()
File "C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\base.py", line 331, in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar
I thought the issue was the Factor column being a float format but I have converted it to a string with the same result.
Can anyone see what the problem is?
FIELD1 | ID | Factor | Status | Speed | Name |
---|---|---|---|---|---|
0 | 49513622 | -4 | ACTIVE | II | |
1 | 11193741 | 4 | ACTIVE | AP | |
2 | 49513622 | -3.75 | ACTIVE | II | |
3 | 11193741 | 3.75 | ACTIVE | AP | |
4 | 49513622 | -3.5 | ACTIVE | II | |
5 | 11193741 | 3.5 | ACTIVE | AP | |
6 | 49513622 | -3.25 | ACTIVE | II | |
7 | 11193741 | 3.25 | ACTIVE | AP | |
8 | 49513622 | -3 | ACTIVE | II | |
9 | 11193741 | 3 | ACTIVE | AP | |
10 | 49513622 | -2.75 | ACTIVE | II | |
11 | 11193741 | 2.75 | ACTIVE | AP | |
12 | 49513622 | -2.5 | ACTIVE | II | |
13 | 11193741 | 2.5 | ACTIVE | AP | |
14 | 49513622 | -2.25 | ACTIVE | II | |
15 | 11193741 | 2.25 | ACTIVE | AP | |
16 | 49513622 | -2 | ACTIVE | II | |
17 | 11193741 | 2 | ACTIVE | AP | |
18 | 49513622 | -1.75 | ACTIVE | II | |
19 | 11193741 | 1.75 | ACTIVE | AP | |
20 | 49513622 | -1.5 | ACTIVE | II | |
21 | 11193741 | 1.5 | ACTIVE | AP | |
22 | 49513622 | -1.25 | ACTIVE | II | |
23 | 11193741 | 1.25 | ACTIVE | AP | |
24 | 49513622 | -1 | ACTIVE | II | |
25 | 11193741 | 1 | ACTIVE | AP | |
26 | 49513622 | -0.75 | ACTIVE | II | |
27 | 11193741 | 0.75 | ACTIVE | 1.02 | AP |
28 | 49513622 | -0.5 | ACTIVE | II | |
29 | 11193741 | 0.5 | ACTIVE | AP | |
30 | 49513622 | -0.25 | ACTIVE | II | |
31 | 11193741 | 0.25 | ACTIVE | AP | |
32 | 49513622 | 0 | ACTIVE | 2.68 | II |
33 | 11193741 | 0 | ACTIVE | 1.03 | AP |
34 | 49513622 | 0.25 | ACTIVE | II | |
35 | 11193741 | -0.25 | ACTIVE | 1.99 | AP |
36 | 49513622 | 0.5 | ACTIVE | II | |
37 | 11193741 | -0.5 | ACTIVE | 2.3 | AP |
38 | 49513622 | 0.75 | ACTIVE | II | |
39 | 11193741 | -0.75 | ACTIVE | AP | |
40 | 49513622 | 1 | ACTIVE | II | |
41 | 11193741 | -1 | ACTIVE | AP | |
42 | 49513622 | 1.25 | ACTIVE | II | |
43 | 11193741 | -1.25 | ACTIVE | AP | |
44 | 49513622 | 1.5 | ACTIVE | II | |
45 | 11193741 | -1.5 | ACTIVE | AP | |
46 | 49513622 | 1.75 | ACTIVE | II | |
47 | 11193741 | -1.75 | ACTIVE | AP | |
48 | 49513622 | 2 | ACTIVE | II | |
49 | 11193741 | -2 | ACTIVE | AP | |
50 | 49513622 | 2.25 | ACTIVE | II | |
51 | 11193741 | -2.25 | ACTIVE | AP | |
52 | 49513622 | 2.5 | ACTIVE | II | |
53 | 11193741 | -2.5 | ACTIVE | AP | |
54 | 49513622 | 2.75 | ACTIVE | II | |
55 | 11193741 | -2.75 | ACTIVE | AP | |
56 | 49513622 | 3 | ACTIVE | II | |
57 | 11193741 | -3 | ACTIVE | AP | |
58 | 49513622 | 3.25 | ACTIVE | II | |
59 | 11193741 | -3.25 | ACTIVE | AP | |
60 | 49513622 | 3.5 | ACTIVE | II | |
61 | 11193741 | -3.5 | ACTIVE | AP | |
62 | 49513622 | 3.75 | ACTIVE | II | |
63 | 11193741 | -3.75 | ACTIVE | AP | |
64 | 49513622 | 4 | ACTIVE | II | |
65 | 11193741 | -4 | ACTIVE | AP |
CodePudding user response:
Problem is no match any value, so DataFrame.loc
return empty Series
.
Possible solution is use next
iter
for assign default value if no match:
mask = test_df['Name'].str.contains(test_name, case=False) &
test_df['Factor'].str.contains(test_factor, case=False)
#if no match assign 'no match'
test_id = next(iter(test_df.loc[mask, 'ID']), 'no match')
#if no match assign None
test_id = next(iter(test_df.loc[mask, 'ID']), None)
Or use if-else
with test if at least one value match:
test_id = test_df.loc[mask, 'ID'].item() if m.any() else 'no match'
Or:
if m.any():
test_id = test_df.loc[mask, 'ID'].item()
EDIT: For testing si possible create helper columns:
m1 = test_df['Name'].str.contains(test_name, case=False)
m2 = test_df['Factor'].str.contains(test_factor, case=False)
test_df = test_df.assign(name_mask = m1, factor_mask = m1, both = m1 & m2)