I created a function below to get the trigram based on the part of speech tag of the reviews.
def get_trigram(pos_1, pos_2, pos_3):
all_trigram = []
for j in range(len(df)):
trigram = []
for i in range(len(df['pos'][j]['pos'])):
if [value for value in df['pos'][j]['pos']][i-2] == pos_1 and [value for value in df['pos'][j]['pos']][i-1] == pos_2 and [value for value in df['pos'][j]['pos']][i] == pos_3:
trigram.append([value for value in df['pos'][j]['word']][i-2] " " [value for value in df['pos'][j]['word']][i-1] " " [value for value in df['pos'][j]['word']][i])
all_trigram.append(trigram)
return all_trigram
There is no error when running the function but when I call my function
tri_adv_adj_noun = get_trigram('ADV', 'ADJ', 'NOUN')
it gives an error: IndexError: list index out of range
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-149-12b4d4ffff3d> in <module>()
----> 1 tri_adv_adj_noun = get_trigram('ADV', 'ADJ', 'NOUN')
2 tri_noun_adv_adj = get_trigram('NOUN', 'ADV', 'ADJ')
3
4 trigram = tri_adv_adj_noun tri_noun_adv_adj
<ipython-input-148-60ed39e749d0> in get_trigram(pos_1, pos_2, pos_3)
8 for i in range(len(df_long['pos'][j]['pos'])):
9
---> 10 if [value for value in df_long['pos'][j]['pos']][i-2] == pos_1 and [value for value in df_long['pos'][j]['pos']][i-1] == pos_2 and [value for value in df_long['pos'][j]['pos']][i] == pos_3:
11 trigram.append([value for value in df_long['pos'][j]['word']][i-2] " " [value for value in df_long['pos'][j]['word']][i-1] " " [value for value in df_long['pos'][j]['word']][i])
12
IndexError: list index out of range
Fyi,
df['pos'][0] returns a dictionary of 2 lists
CodePudding user response:
I'd assume that your problem resides in the part
[value for value in df_long['pos'][j]['pos']][i-2]
First of all, it may be the case that some of your 'pos' dictionary data in your 'pos' column is missing, in which case you should put a condition that first verifies if the dictionary is populated with data. Otherwise, when accessing a list with fewer elements than the value of the index that you're searching, you'll get that error (for example, i-2 will go back 2 places from the end of the list, and when it doesn't find enough elements to go back, it throws the "list index out of range" error) Ex:
if len(df['pos'][j]['pos']) >= 3:
for i in range(len(df['pos'][j]['pos']):
...
Second of all, writing your code like this is redundant, since you're making a list with the data from a list. You could jsut write:
if df_long['pos'][j]['pos'][i-2] == pos_1 and df_long['pos'][j]['pos'][i-1] == pos_2 etc..
Or enhance it's visibility even more by adding a variable with a descriptive name :
for j in range(len(df)):
trigram = []
pos_list = df['pos'][j]['pos']
if len(post_list) >= 3:
for i in range(len(pos_list)):
if pos_list[i-2] == pos_1 and pos_list[i-1] == pos_2 ...
Hope this helps!