Home > Software design >  How can I get a dictionary that contains a word from a sentence in a given list as key and list of i
How can I get a dictionary that contains a word from a sentence in a given list as key and list of i

Time:11-03

I have a list containing a number of sentences. My task is to create a dictionary where the key is a word from these sentences and the value is a list of the indexes of sentences where that word exists.

For example: ['writefile file txt', 'Hello IDSD Good morning How', 'This first exercice', 'This second exercice TP Hello easy exercice'] should return {'writefile':[0],'file:[0] .....}

Here's my code:

index_dict = {}
for token in after_token:
  print('current line:', after_token.index(token))
  for new in new_after_token:
    print('current word:', new)
    list_of_occurences = []
    if new in token:
      print(new, 'exists in line', after_token.index(token))
      list_of_occurences.append(after_token.index(token))
      print(list_of_occurences)
    else:
      print(new, "doesn't exist in line", after_token.index(token))
    print('*'*10)
print(index_dict)

The problem is that once the program moves to the next sentences the dictionary resets and only outputs the values for that senteces.

current line: 0
current word: writefile
writefile exists in line 0
[0]
current word: file
file exists in line 0
[0]
current word: txt
txt exists in line 0
[0]
current word: Hello
Hello doesn't exist in line 0
current word: IDSD
IDSD doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: morning
morning doesn't exist in line 0
current word: How
How doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: first
first doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: second
second doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: TP
TP doesn't exist in line 0
current word: Hello
Hello doesn't exist in line 0
current word: easy
easy doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: third
third doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: luck
luck doesn't exist in line 0
current word: guys
guys doesn't exist in line 0
current word: Have
Have doesn't exist in line 0
current word: nice
nice doesn't exist in line 0
current word: day
day doesn't exist in line 0
current word: see
see doesn't exist in line 0
current word: next
next doesn't exist in line 0
current word: time
time doesn't exist in line 0
current word: exercices
exercices doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: bye
bye doesn't exist in line 0
{'writefile': [0], 'file': [0], 'txt': [0], 'Hello': [], 'IDSD': [], 'Good': [], 'morning': [], 'How': [], 'This': [], 'first': [], 'exercice': [], 'second': [], 'TP': [], 'easy': [], 'third': [], 'luck': [], 'guys': [], 'Have': [], 'nice': [], 'day': [], 'see': [], 'next': [], 'time': [], 'exercices': [], 'bye': []}
current line: 1
current word: writefile
writefile doesn't exist in line 1
current word: file
file doesn't exist in line 1
current word: txt
txt doesn't exist in line 1
current word: Hello
Hello exists in line 1
[1]
current word: IDSD
IDSD exists in line 1
[1]
current word: Good
Good exists in line 1
[1]
current word: morning
morning exists in line 1
[1]
current word: How
How exists in line 1
[1]
current word: This
This doesn't exist in line 1
current word: first
first doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: This
This doesn't exist in line 1
current word: second
second doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: TP
TP doesn't exist in line 1
current word: Hello
Hello exists in line 1
[1]
current word: easy
easy doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: This
This doesn't exist in line 1
current word: third
third doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: Good
Good exists in line 1
[1]
current word: luck
luck doesn't exist in line 1
current word: guys
guys doesn't exist in line 1
current word: Have
Have doesn't exist in line 1
current word: nice
nice doesn't exist in line 1
current word: day
day doesn't exist in line 1
current word: see
see doesn't exist in line 1
current word: next
next doesn't exist in line 1
current word: time
time doesn't exist in line 1
current word: exercices
exercices doesn't exist in line 1
current word: Good
Good exists in line 1
[1]
current word: bye
bye doesn't exist in line 1
{'writefile': [], 'file': [], 'txt': [], 'Hello': [1], 'IDSD': [1], 'Good': [1], 'morning': [1], 'How': [1], 'This': [], 'first': [], 'exercice': [], 'second': [], 'TP': [], 'easy': [], 'third': [], 'luck': [], 'guys': [], 'Have': [], 'nice': [], 'day': [], 'see': [], 'next': [], 'time': [], 'exercices': [], 'bye': []}

CodePudding user response:

This should do the trick. If you don't want multiple indexes for words which occurs multiple times in a sentence (eg. exercice occuring twice in the 4th sentence) you could do for word in set(sentence.split()) instead

from collections import defaultdict
# List of sample sentences
l = ['writefile file txt', 'Hello IDSD Good morning How', 'This first exercice', 'This second exercice TP Hello easy exercice'] 
# Initiate default dict with list
d = defaultdict(list)

# Loop over each sentence
for n, sentence in enumerate(l):
    # Loop over each word
    for word in sentence.split():
        # Append the index of the sentence to the word-list in the dict
        d[word].append(n)
  • Related