How to find equivalent text in another with aggregated text?-CodePudding

Given:

const textToFind = 'Lorem Ipsum has been the industry&apos;s standard dummy text ever since the 1500s, '

const paragraph = 'Lorem Ipsum has been the industry&apos;s [standard](wwww.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.'

I require to output:

Lorem Ipsum has been the industry's [standard](wwww.meh.com) dummy text ever since the 1500s,

i.e. match the textToFind on paragraph and then extract it.

I have figured out this regex to find markdown links: /\[([^\]] )\]\([^)"] \)/g, but I'm not sure what else to do after that.

textToFind is derived from paragraph in the beginning, and I need it to calculate the width of each line, thus why I'm not considering modifying standard to some unique identifier (so as to replace it later with the real text), because if the characters change, then so will the width.

Additional Info:

I am using React Native Text's

<Text onTextLayout={....} numberOfLines={x} />

to obtain the lines rendered in a paragraph x, but this text has not been converted from markdown (if so, the links are lost, since it only parses pure text, not Views, not Text properties, etc.)

Currently:

I am thinking of encrypting the [plainText](url)

(e.g. reversePlainText().QueueShiftTwoCharacters()),

and save this encryption in a parallel

recordedLinks = Queue<Record<encryptedPlainText, originalUnparsedMarkdown>>()`

and consults it in order.

This way, when going from [plainText](url) to encryptedPlainText (and almost losing the url and positioning), we can match recordedLinks in order as the screen renders each ~~of these pieces of cryptic runics~~ line of text, it will give encryptedPlainTexts their links in a FIFO way.

CodePudding user response：

The following is a rough and ready solution. It assumes no regex special characters in the textToFind (if there are they can be escaped simply enough).

A regex is created from textToFind where every word has the option to be the link text of a markdown link, for example Ipsum becomes (?:Ipsum|\[Ipsum\]\([^)"] \)) in the regex string.

const textToFind = 'Lorem Ipsum has been the industry&apos;s standard dummy text ever since the 1500s, ';

const paragraph = 'Lorem Ipsum has been the industry&apos;s [standard](wwww.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.'

const regexString = textToFind.replace(/\w /g, '(?:$&|\\[$&\\]\\([^)"] \\))');

const match = paragraph.match(new RegExp(regexString, 'g'));

console.log(match);

If you explain how textToFind is being used to calculate the width of each line then a more robust solution may be forthcoming.

CodePudding user response：

I'll lay down the "Pythonic" approach I would have to this problem. Following I'll describe the steps I'd take:

apply your regex for link matching on the paragraph, retrieve triple given by (<link_match>, <word_match>, <start_index>)
transform the paragraph into its shape without links and update the triples with the new <start_index> value
if text_to_find can be found inside the updated paragraph then for each triple
update text_to_find hot word with its link-like version.

Here's the code:

import re

text_to_find = 'Lorem Ipsum has been the industry&apos;s standard dummy text ever since the 1500s, '

paragraph = 'Lorem [Ipsum](lalala) has been the industry&apos;s [standard](www.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to [make](hello?) a type [specimen](yes) book.'

# generate triple (<link_match>, <word_match>, <start_index>)
lst = [(m.group(0), m.group(1), m.start(0)) for m in re.finditer(r'\[([^\]] )\]\([^)"] \)', paragraph)]

# update paragraph and triple
subtract = 0
for idx, (link, match, i) in enumerate(lst):
    paragraph = paragraph.replace(link, match, 1)
    lst[idx] = (link, match, i-1 subtract)
    subtract  = len(match) - len(link) 

# update text with links
if not paragraph.find(text_to_find) 1:
    print('no reference')
else:
    for (link, match, i) in lst[::-1]:
        if i < len(text_to_find):
            text_to_find = text_to_find[:i 1]   link   text_to_find[i 1 len(match):]

print(text_to_find)

If you have more than one text_to_find in different paragraphs, you can store the paragraph conversion and the text_to_find translation into two different functions and call them within a cycle over paragraphs and texts to be found accordingly.