Home > front end >  Using pandas to extract text between two words
Using pandas to extract text between two words

Time:11-10

I am struggling to extract the text between two works. Specifically, I would like to extract the text between Example and Constraints. Here is a sample

"Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.\nYou can return the answer in any order.\n Example 1:\nInput: nums = [2,7,11,15], target = 9\nOutput: [0,1]\nExplanation: Because nums[0]   nums[1] == 9, we return [0, 1].\nExample 2:\nInput: nums = [3,2,4], target = 6\nOutput: [1,2]\nExample 3:\nInput: nums = [3,3], target = 6\nOutput: [0,1]\n Constraints:\n2 <= nums.length <= 104\n-109 <= nums[i] <= 109\n-109 <= target <= 109\nOnly one valid answer exists.\n Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?"

This is a row in a pandas dataframe

This is what I have tried:

def extract(example):
    return example.str.extract('(Example.*(?=.Constraints))')

this returns null.

CodePudding user response:

Did you try the regex package ? Using regex101.com, it seems your pattern works.

With :

import re

re.search(r'(Example.*(?=.Constraints))', example)

will do the trick

CodePudding user response:

Why don't you use python's native str.find method? something like s[s.find("Example"):s.find("Constraints")] could work, perhaps with some trimming if you want to get rid of the word 'example' in the resultant string

EDIT: Here's some sample code:

example = "Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.\nYou can return the answer in any order.\n Example 1:\nInput: nums = [2,7,11,15], target = 9\nOutput: [0,1]\nExplanation: Because nums[0]   nums[1] == 9, we return [0, 1].\nExample 2:\nInput: nums = [3,2,4], target = 6\nOutput: [1,2]\nExample 3:\nInput: nums = [3,3], target = 6\nOutput: [0,1]\n Constraints:\n2 <= nums.length <= 104\n-109 <= nums[i] <= 109\n-109 <= target <= 109\nOnly one valid answer exists.\n Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?"
#To get everything between the first 'Example' and first 'Constraint':
s1 = example[example.find("Example") 7:example.find("Constraints")]

#To get each example separately from this:
s2 = s1.split("Example")
  • Related