text = "Page 1 of 28 Medical Policies Archived Policies - Radiology Print Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Mechanical Page 2/3 Percutaneous radiofrequency kyphoplasty or percutaneous mechanical vertebral augmentation using any other device, including but not limited. Page 38 Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Me... While radiotherapy and chemotherapy are frequently "
adm = re.sub("(?:(?:Page" [0-9] "of" [0-9] | Page [0-9] | Page [0-9] "/" [0-9] ))", text, re.IGNORECASE)
print(adm)
Is there any solution to remove Page 1 of 28 , Page 2/3 , Page 38 from the text
CodePudding user response:
I would use this approach:
text = "Page 1 of 28 Medical Policies Archived Policies - Radiology Print Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Mechanical Page 2/3 Percutaneous radiofrequency kyphoplasty or percutaneous mechanical vertebral augmentation using any other device, including but not limited. Page 38 Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Me... While radiotherapy and chemotherapy are frequently "
output = re.sub(r'\s*Page \d (?:/\d )?(?: of \d )?\s*', ' ', text).strip()
print(output)
This prints:
Medical Policies Archived Policies - Radiology Print Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Mechanical Percutaneous radiofrequency kyphoplasty or percutaneous mechanical vertebral augmentation using any other device, including but not limited. Percutaneous Balloon Kyphoplasty, Radiofrequency Kyphoplasty, and Me... While radiotherapy and chemotherapy are frequently
The regex pattern used above matches all 3 page variants seen in Page 1 of 28 , Page 2/3 , Page 38
.