How Do I Debug Inverted Index?-CodePudding

Trying to create an inverted index from a subset but not getting the appropriate return values. return value for experiment d1 should be just [0] instead I am getting a list of both experiment and studi. When I try to clear the new list I get empty lists as a return.enter code here

subset={'d1': ['experiment','studi','wing', 'propel', 'slipstream',  'made', 'order', 'determin', 'spanwis',
'distribut','lift', 'increas',  'due','slipstream', 'differ'],'d2':['studi','high-spe','viscou', 'flow', 
'past', 'two-dimension', 'bodi','usual','necessari','consid', 'curv', 'shock', 'wave', 'emit', 'nose', 
'lead', 'studi', 'bodi','.','consequ']}

set_set =['experiment','studi']

new=[]
inv_index={}
final={}
for word in set_set:
    for key, values in subset.items():
        for value in values:
            if word == value:
                new.append(values.index(word))
                inv_index[key]=new
        final[word]=inv_index
final

###Output
#{'experiment': {'d1': [0, 1, 0, 0], 'd2': [0, 1, 0, 0]},
 #'studi': {'d1': [0, 1, 0, 0], 'd2': [0, 1, 0, 0]}}

#should be {'experiment':{'d1':[0]},'studi':{'d1':[1],'d2':[0,16]}}
#

CodePudding user response：

You're tracking a lot of stuff you don't need. Also remember that index does not work if there are duplicates. index always returns the index of the FIRST match.

This does what you ask:

subset={'d1': ['experiment','studi','wing', 'propel', 'slipstream',  'made', 'order', 'determin', 'spanwis',
'distribut','lift', 'increas',  'due','slipstream', 'differ'],'d2':['studi','high-spe','viscou', 'flow', 
'past', 'two-dimension', 'bodi','usual','necessari','consid', 'curv', 'shock', 'wave', 'emit', 'nose', 
'lead', 'studi', 'bodi','.','consequ']}

set_set =['experiment','studi']

final={}
for word in set_set:
    final[word] = {}
    for key, values in subset.items():
        found = [idx for idx,value in enumerate(values) if word == value]
        if found:
            final[word][key] = found
print(final)

Output:

{'experiment': {'d1': [0]}, 'studi': {'d1': [1], 'd2': [0, 16]}}