I have a list containing a number of sentences. My task is to create a dictionary where the key is a word from these sentences and the value is a list of the indexes of sentences where that word exists.
For example: ['writefile file txt', 'Hello IDSD Good morning How', 'This first exercice', 'This second exercice TP Hello easy exercice'] should return {'writefile':[0],'file:[0] .....}
Here's my code:
index_dict = {}
for token in after_token:
print('current line:', after_token.index(token))
for new in new_after_token:
print('current word:', new)
list_of_occurences = []
if new in token:
print(new, 'exists in line', after_token.index(token))
list_of_occurences.append(after_token.index(token))
print(list_of_occurences)
else:
print(new, "doesn't exist in line", after_token.index(token))
print('*'*10)
print(index_dict)
The problem is that once the program moves to the next sentences the dictionary resets and only outputs the values for that senteces.
current line: 0
current word: writefile
writefile exists in line 0
[0]
current word: file
file exists in line 0
[0]
current word: txt
txt exists in line 0
[0]
current word: Hello
Hello doesn't exist in line 0
current word: IDSD
IDSD doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: morning
morning doesn't exist in line 0
current word: How
How doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: first
first doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: second
second doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: TP
TP doesn't exist in line 0
current word: Hello
Hello doesn't exist in line 0
current word: easy
easy doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: This
This doesn't exist in line 0
current word: third
third doesn't exist in line 0
current word: exercice
exercice doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: luck
luck doesn't exist in line 0
current word: guys
guys doesn't exist in line 0
current word: Have
Have doesn't exist in line 0
current word: nice
nice doesn't exist in line 0
current word: day
day doesn't exist in line 0
current word: see
see doesn't exist in line 0
current word: next
next doesn't exist in line 0
current word: time
time doesn't exist in line 0
current word: exercices
exercices doesn't exist in line 0
current word: Good
Good doesn't exist in line 0
current word: bye
bye doesn't exist in line 0
{'writefile': [0], 'file': [0], 'txt': [0], 'Hello': [], 'IDSD': [], 'Good': [], 'morning': [], 'How': [], 'This': [], 'first': [], 'exercice': [], 'second': [], 'TP': [], 'easy': [], 'third': [], 'luck': [], 'guys': [], 'Have': [], 'nice': [], 'day': [], 'see': [], 'next': [], 'time': [], 'exercices': [], 'bye': []}
current line: 1
current word: writefile
writefile doesn't exist in line 1
current word: file
file doesn't exist in line 1
current word: txt
txt doesn't exist in line 1
current word: Hello
Hello exists in line 1
[1]
current word: IDSD
IDSD exists in line 1
[1]
current word: Good
Good exists in line 1
[1]
current word: morning
morning exists in line 1
[1]
current word: How
How exists in line 1
[1]
current word: This
This doesn't exist in line 1
current word: first
first doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: This
This doesn't exist in line 1
current word: second
second doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: TP
TP doesn't exist in line 1
current word: Hello
Hello exists in line 1
[1]
current word: easy
easy doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: This
This doesn't exist in line 1
current word: third
third doesn't exist in line 1
current word: exercice
exercice doesn't exist in line 1
current word: Good
Good exists in line 1
[1]
current word: luck
luck doesn't exist in line 1
current word: guys
guys doesn't exist in line 1
current word: Have
Have doesn't exist in line 1
current word: nice
nice doesn't exist in line 1
current word: day
day doesn't exist in line 1
current word: see
see doesn't exist in line 1
current word: next
next doesn't exist in line 1
current word: time
time doesn't exist in line 1
current word: exercices
exercices doesn't exist in line 1
current word: Good
Good exists in line 1
[1]
current word: bye
bye doesn't exist in line 1
{'writefile': [], 'file': [], 'txt': [], 'Hello': [1], 'IDSD': [1], 'Good': [1], 'morning': [1], 'How': [1], 'This': [], 'first': [], 'exercice': [], 'second': [], 'TP': [], 'easy': [], 'third': [], 'luck': [], 'guys': [], 'Have': [], 'nice': [], 'day': [], 'see': [], 'next': [], 'time': [], 'exercices': [], 'bye': []}
CodePudding user response:
This should do the trick. If you don't want multiple indexes for words which occurs multiple times in a sentence (eg. exercice occuring twice in the 4th sentence) you could do for word in set(sentence.split())
instead
from collections import defaultdict
# List of sample sentences
l = ['writefile file txt', 'Hello IDSD Good morning How', 'This first exercice', 'This second exercice TP Hello easy exercice']
# Initiate default dict with list
d = defaultdict(list)
# Loop over each sentence
for n, sentence in enumerate(l):
# Loop over each word
for word in sentence.split():
# Append the index of the sentence to the word-list in the dict
d[word].append(n)