Home > Blockchain >  Search and get row from large single string
Search and get row from large single string

Time:09-22

Hi I have single large string and i need to search set of string from this string and get that row create a data frame with this rows.
large String:
enter image description here


This is democracy’s day.

A day of history and hope.

Of renewal and resolve.

Through a crucible for the ages America has been tested anew and America has risen to the challenge.

Today, we celebrate the triumph not of a candidate, but of a cause, the cause of democracy.

The will of the people has been heard and the will of the people has been heeded.

We have learned again that democracy is precious.


Now i want to search few set of strings from above. and my final output dataframe should look like below

enter image description here


Searching string
democracy’s day
America has been tested
celebrate the triumph
democracy is precious


Thanks in advance

CodePudding user response:

You can create a regex out of your search strings and compare them for a match against the Large String column using extract. Where there's a match, the match string will be the value in the Searching String column, otherwise it will be null. The dataframe can then be filtered on the Searching String value being not null:

import re

df = pd.DataFrame({ 'Large String': ["This is democracy's day.", "A day of history and hope.","Of renewal and resolve.","Through a crucible for the ages America has been tested anew and America has risen to the challenge.","Today, we celebrate the triumph not of a candidate, but of a cause, the cause of democracy.","The will of the people has been heard and the will of the people has been heeded.","We have learned again that democracy is precious."] })

search_strings = ["democracy's day", "America has been tested", "celebrate the triumph", "democracy is precious"]

regex = '|'.join(map(re.escape, search_strings))

df['Searching String'] = df['Large String'].str.extract(f'({regex})')

df = df[~df['Searching String'].isna()]

print(df)

Output:

                                        Large String         Searching String
0                           This is democracy's day.          democracy's day
3  Through a crucible for the ages America has be...  America has been tested
4  Today, we celebrate the triumph not of a candi...    celebrate the triumph
6  We have learned again that democracy is precious.    democracy is precious

Note:

  • we use re.escape on the search strings in case they contain special characters for regex e.g. . or ( etc.
  • if one of the search strings is a subset of another, the list should be sorted by order of decreasing length to ensure the longer matches are captured
  • Related