I have an html text with strings such as sentence-transformers/paraphrase-MiniLM-L6-v2
I want to extract all the strings that appear after "sentence-transformers/".
I tried models = re.findall("sentence-transformers/" "(\w )", text)
but it only output the first word (paraphrase) while I want the full "paraphrase-MiniLM-L6-v2 "
Also I don't know the len(paraphrase-MiniLM-L6-v2 ) a priori.
How can I extract the full string?
Many thanks, Ele
CodePudding user response:
The problem with your regex is that -
is not considered a word character, and you are only searching for word characters. The following regex works on your example:
text = 'sentence-transformers/paraphrase-MiniLM-L6-v2'
models = re.findall(r'sentence-transformers/([\w-] )', text)
assert models[0] == 'paraphrase-MiniLM-L6-v2'