Table 1
Date | Prescribed |
---|---|
16-05-2017 | Amlodipine [ Amlodipine | 10 mg | Tablet | OD | For 60 Days ], Cetirizine [ Cetirizine | 10 mg | Tablet | OD | For 5 Days ] |
15-05-2017 | CEFUROXIME[ ZINNAT | 500MG | Tablet | BID | For 7 Days ] |
17-05-2017 | Cetirizine [Cetirizine | 5 mg/5 mL | Syrup | BID | For 5 Days] |
Table 2
Name | Category |
---|---|
Lisinopril (Lisinopril | 10 mg | Tablet) | CARDIOVASCULAR AGENT |
Amlodipine (Amlodipine | 10 mg | Tablet) | CARDIOVASCULAR AGENT |
Enoxaparin Sodium (80mg)(clexane 8000 iu | 80mg | 0) | CARDIOVASCULAR AGENT |
I want to be able to compare each item or row within the column['Prescribed'] with the whole of Table 2 column['Name'] to be able to create a column['category']row each row in Table 1. Using pandas dataframes or any possible python method
More Clarification(or Example)
Table 1(From above)
test_text = "Amlodipine [ Amlodipine | 10 mg | Tablet | OD | For 60 Days ], Cetirizine [ Cetirizine | 10 mg | Tablet | OD | For 5 Days ]"
Table 2(From above)
comparison_list = [ 'Amlodipine (Amlodipine | 10 mg | Tablet)' , 'Acetaminophen(Tylenol | 500mg| Tablet)' , 'Ibuprofen(Advil | 400mg | Tablet)']
Expected outcome:
Return True if 'Amlodipine' is in test_text
To have a final table like below | Date | Prescribed | Result | |:---- |:------:|:------:| | 16-05-2017| Amlodipine [ Amlodipine | 10 mg | Tablet | OD | For 60 Days ], Cetirizine [ Cetirizine | 10 mg | Tablet | OD | For 5 Days ]|True| | 15-05-2017 | CEFUROXIME[ ZINNAT | 500MG | Tablet | BID | For 7 Days ] |False| | 17-05-2017 | Cetirizine [Cetirizine | 5 mg/5 mL | Syrup | BID | For 5 Days]|False|
Below are some methods i have tried.
for i in table1['Prescribed']:
split_data = i.split(",")
for b in split_data:
if any(str(b) in s for s in table2['Name']):
print('true')
elif str(b) in table2['Name']:
print('perfect')
else:
print('false')
Output:
false
false
false
false
false
false
false
false
Without splitting text:
for i in table1['Prescribed']:
if any(str(i) in s for s in table2['Name']):
print('true')
elif str(i) in table2['Name']:
print('perfect')
else:
print('false')
Outcome:
false
false
false
false
false
false
false
If there is any solution to this, i would be happy to know. Suggestions on how to do it neater is also appreciated. And if there is a link or book to read concerning how to go about this too, i would be happy to know about them .
CodePudding user response:
IIUC, you want to extract the drug names from the table2['Name']
and then use that as a comparison list to find if ANY of those occur in the table1['Prescription'].
If this is what you want, then try this -
- Use vectorized
str
functions likereplace
,split
andstrip
to extract the unique drug names for your comparison list. - Next use
'|'.join()
to connect these unique drugs with aOR
connector to find if any of those exist in thetable1['Prescription]'
with the use of another vectorizedstr
functionstr.contains
- NOTE 1: Using apply functions for working with string is not as efficient as using
str
methods in pandas.
- NOTE 2: The regex
[\(\[].*?[\)\]]
is for removing the text inside the()
or[]
brackets and returning only the text outside, which is this case is the name of the drugs. Feel free to replace it with anything else.
#STEP 1: Get unique drugs from the table2
unique_drugs = table2['Name'].str.replace('[\(\[].*?[\)\]]','',regex=True)\
.str.split(',')\
.explode()\
.str.strip()\
.unique()
## unique_drugs : array(['Lisinopril', 'Amlodipine', 'Enoxaparin Sodium'], dtype=object)
# STEP 2: FIND MATCHING DRUGS IN THE DATA
table1['flag'] = table1['Prescribed'].str.contains('|'.join(unique_drugs))
print(table1)
Date Prescribed flag
0 16-05-2017 Amlodipine [ Amlodipine | 10 mg | Tablet | OD ... True
1 15-05-2017 CEFUROXIME[ ZINNAT | 500MG | Tablet | BID | Fo... False
2 17-05-2017 Cetirizine [Cetirizine | 5 mg/5 mL | Syrup | B... False