Suppose we have the following list:
['O', 'O', 'O', 'O', 'I-INS', 'I-INS', 'I-INS', B-PER, I-PER]
I want to change this list, so that anytime there are multiple members of a subgroup (like INS) starting with I- without a B- member behind them, the first element changes to B-, for example:
O,I-INS,I-INS,I-INS,B-PER, I-PER => O,B-INS,I-INS,I-INS,B-PER, I-PER
If a subgroup already starts with a B- or anything else other than I, then it should remain unchanged. So far, I have written this code:
temp = []
for i in range(len(iobTags)):
if iobTags[i].startswith('I'):
if iobTags[i-1].startswith('I'):
temp = iobTags[i-1].split('-')
temp[0] = 'B'
mem = temp[0] '-' temp[1]
iobTags[i-1] = mem
else:
continue
The problem is that this code keeps changing every I- member that it sees to B- after the first element like:
I-INS,I-INS,I-INS => B-INS,B-INS,I-INS
While I just want the first element to change and then move on to checking the first element of other subgroups. How can I change this code?
CodePudding user response:
You can use itertools.groupby
for the task:
from itertools import groupby
l = ["O", "I-INS", "I-INS", "I-INS", "B-PER", "I-PER"]
out = []
for v, g in groupby(l, lambda k: k.split("-")[-1]):
g = list(g)
if g[0].startswith("I-"):
if not any(v.startswith("B-") for v in g):
g[0] = g[0].replace("I-", "B-")
out.extend(g)
print(out)
Prints:
['O', 'B-INS', 'I-INS', 'I-INS', 'B-PER', 'I-PER']
CodePudding user response:
list = ['O', 'O', 'O', 'O', 'I-INS', 'I-INS', 'I-INS', 'B-PER', 'I-PER']
number_of_items_in_subgroup = 0
output_list = []
for index in range(len(list)):
#First case
if index == 0:
if list[index][0] == "I":
output_list.append("B" list[index][1:])
else:
output_list.append(list[index])
else:
if (list[index][0] == "I") & ((list[index-1][0] != "B") & (list[index-1] != list[index])) & (output_list[-1][0] != "B"):
output_list.append("B" list[index][1:])
else:
output_list.append(list[index])
print(output_list)
Check this one. For the example you provided, it works.
Also works for another random list I've created.