I'm trying to combine two strings in a list, where the combination of the terms have a different meaning compared to when they are tokenized individually.
An example of this would be:
['I', 'want','to','join','a','football','team','in','2022']
The goal is to join the strings 'football'
and 'team'
, with a _
if the two terms occur after one another, resulting in this string football_team
.
The final list would looks like this:
['I', 'want','to','join','a','football_team','in','2022']
Any help is appreciated as I only get to the point where I can join the whole list.
EDIT:
I have been trying to join terms using this: Is there a way to combine two tokens from a list of tokens?
Also I have tried this, but it joins every element in the list: How to concatenate items in a list to a single string?
EDIT 2:
In answer to a question in the comments, "How would one understand which words would hold some meaning?"
I have a pre-defined list of string combinations that need to be merged.
CodePudding user response:
" ".join(['I', 'want','to','join','a','football','team','in','2022']).replace("football team","football_team").split(" ")
CodePudding user response:
By this way you can define an arbitrary set of pairs.
pairs = {"football" : "team"}
a_list = ['I', 'want','to','join','a','football','team','in','2022']
list_copy = a_list.copy()
for one, another in zip(list_copy.copy()[:-1], list_copy.copy()[1:]):
if one in pairs and another == pairs[one]:
i = a_list.index(one)
del a_list[i:i 2]
a_list.insert(i, one "_" another)