I Have a very long text and it contains the following paragragraph
"MEDICATIONS: 1. Versed 2 mg IV. 2. Fentanyl 100 mcg IV. 3. Heparin 5000 units IA. 4. Nitroglycerin 200 mcg IA. 5. Verapamil 5 mg IA. 6. Protamine 50 mg IV. CONTRAST: 61 mL Visipaque RADIATION DOSE: 10.1 min; 318 mGy
IMPRESSION: Status post left pterional craniotomy for clipping of a left middle cerebral artery trifurcation aneurysm with no evidence of residual aneurysm"
I would like to split into 2 or more columns Phrases to split on
- Starts with “Medication:”
- Starts with “ IMPRESSION:”
is there a way to do that using regex or spaCy in pandas?
MEDICATIONS | IMPRESSION |
---|---|
1. Versed 2 mg IV. 2. Fentanyl 100 mcg IV. 3. Heparin 5000 units IA. 4. Nitroglycerin 200 mcg IA. 5. Verapamil 5 mg IA. 6. Protamine 50 mg IV. CONTRAST: 61 mL Visipaque RADIATION DOSE: 10.1 min; 318 mGy | Status post left pterional craniotomy for clipping of a left middle cerebral artery trifurcation aneurysm with no evidence of residual aneurysm |
CodePudding user response:
You could use pandas extract
and Python named groups
to extract only the desired parts of the paragraph.
import pandas as pd
import re
paragraphs = """MEDICATIONS: 1. Versed 2 mg IV. 2. Fentanyl 100 mcg IV. 3. Heparin 5000 units IA. 4. Nitroglycerin 200 mcg IA. 5. Verapamil 5 mg IA. 6. Protamine 50 mg IV. CONTRAST: 61 mL Visipaque RADIATION DOSE: 10.1 min; 318 mGy
IMPRESSION: Status post left pterional craniotomy for clipping of a left middle cerebral artery trifurcation aneurysm with no evidence of residual aneurysm"""
df = pd.DataFrame({'paragraphs':paragraphs}, index=[0])
print(df)
df1 = df['paragraphs'].str.extract(
r'(?:^MEDICATIONS:)(?P<MEDICATIONS>. ?)\n'
r'(?:^IMPRESSION:)(?P<IMPRESSION>. ?)$', flags=re.M, expand=True)
Output from df1
index | MEDICATIONS | IMPRESSION |
---|---|---|
0 | 1. Versed 2 mg IV. 2. Fentanyl 100 mcg IV. 3. Heparin 5000 units IA. 4. Nitroglycerin 200 mcg IA. 5. Verapamil 5 mg IA. 6. Protamine 50 mg IV. CONTRAST: 61 mL Visipaque RADIATION DOSE: 10.1 min; 318 mGy | Status post left pterional craniotomy for clipping of a left middle cerebral artery trifurcation aneurysm with no evidence of residual aneurysm |