I have 2 lists:
tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]
I also have a dictionary that maps the labels to their meaning:
labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'
The goal is to place each string in their corresponding labels_list
(ignoring na
):
subjects = []
relations = []
objects = []
There are 3 conditions:
- Combining tokens that have consecutive label (e.g., 0, 0, 0) into one string. e.g., the first 5 labels are 0s, hence the first string should be
"[CLS] Thinking historically is ,"
, which should be appended to the correspondinglabels_list
:subjects.append(string)
- If a token has the string
"##"
in it, it should be concatenated with the previous string without spaces. e.g.,"meaningful", "##ly" --> "meaningfully"
. Assuming they have the same label. Otherwise the"##"
should be removed and the string should be appended to the correspondinglabels_list
:subjects.append("ly")
- A few tokens should be ignored:
[CLS], [SEP], [PAD]
Update:
Adding my attempt, but I'm stuck on combining the consecutive tokens
labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'
ignore = ['[CLS]', '[SEP]', '[PAD]']
def get_sentence_triples_from_token_labels(tokens, token_labels):
for tok, label in zip(tokens, token_labels):
current_label = label
if tok == '[CLS]': # initialize
previous_label = current_label
prev = False
current_string = ''
if tok not in ignore:
if previous_label != current_label and prev==True:
current_string = f'{tok} '
pass
else:
pass
prev = True
break
get_sentence_triples_from_token_labels(tokens, labels)
CodePudding user response:
solution
Not sure if this is what you wan.
labels_meaning = { 0:'subject', 1:'relation', 2:'object', 3:'na' }
ignore = ['[CLS]', '[SEP]', '[PAD]']
tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]
subjects = []
relations = []
objects = []
def get_sentence_triples_from_token_labels(tokens, token_labels):
dicRlt = {lab:[] for lab in [0,1,2,3]}
last_label = token_labels[0]
for tok, label in zip(tokens, token_labels):
if tok not in ignore:
if last_label != label:
if label == 0:
subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
elif label == 1:
relations.append(" ".join(dicRlt[1]).replace(" ##",""))
elif label == 2:
objects.append(" ".join(dicRlt[2]).replace(" ##",""))
dicRlt[label]=[]
dicRlt[label].append(tok)
last_label = label
subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
relations.append(" ".join(dicRlt[1]).replace(" ##",""))
objects.append(" ".join(dicRlt[2]).replace(" ##",""))
return
Test
print(subjects) print(relations) print(objects)
Outpu:
['Thinking historically is ,', 'attitude', 'that every event can be meaningfully understood only', 'relation', 'previous events ,', ', second', 'methodical', 'attitude', 'entails', 'events contextually -', 'as having occurred in', 'circumstances -', 'comprehend', 'them from', 'actors'] ['', ', an', 'acknowledging', 'in', 'to', 'and', ', the', 'application of this', ', which', 'both analyzing', '-', 'the midst of pre - existing', '- and', '##ing', 'historical'] ['', 'first']