how to find unknown string from a known pattern ? python re.findall-CodePudding

I have an html text with strings such as sentence-transformers/paraphrase-MiniLM-L6-v2

I want to extract all the strings that appear after "sentence-transformers/".

I tried models = re.findall("sentence-transformers/" "(\w )", text) but it only output the first word (paraphrase) while I want the full "paraphrase-MiniLM-L6-v2 "

Also I don't know the len(paraphrase-MiniLM-L6-v2 ) a priori.

How can I extract the full string?

Many thanks, Ele

CodePudding user response：

The problem with your regex is that - is not considered a word character, and you are only searching for word characters. The following regex works on your example:

text = 'sentence-transformers/paraphrase-MiniLM-L6-v2'
models = re.findall(r'sentence-transformers/([\w-] )', text)

assert models[0] == 'paraphrase-MiniLM-L6-v2'