I have defined a function which extracts words from sentences based on it's pos_tag.
def get_trigram(pos_1, pos_2, pos_3):
all_trigram = []
for j in range(len(df)):
trigram = []
if len(df['pos'][j]) >= 2:
for i in range(len(df['pos'][j])):
if df['pos'][j][i-2][1] == pos_1 and df['pos'][j][i-1][1] == pos_2 and df['pos'][j][i][1] == pos_3:
trigram.append(df['pos'][j][i-2][0] " " df['pos'][j][i-1][0] " " df['pos'][j][i][0])
all_trigram.append(trigram)
return all_trigram
The function runs and can work but the len of thelist all_trigram is less than the original len of dataframe which I am running on. I suspect it is because of this line of code
if len(df['pos'][j]) >= 2:
inside my function and thus for the rows that have less than 2, they are not captured inside as blanks. How can i construct the else statement and where can i place it such that the all_trigrams list can contain a blank list too for those rows that have less than 2?
CodePudding user response:
Append an empty list to all_trigram
when the condition len(df['pos'][j]) >= 2:
is not satisfied. This will ensure that the size of all_trigram
is same as the size of your dataframe.
def get_trigram(pos_1, pos_2, pos_3):
all_trigram = []
for j in range(len(df)):
trigram = []
if len(df['pos'][j]) >= 2:
for i in range(len(df['pos'][j])):
if df['pos'][j][i-2][1] == pos_1 and df['pos'][j][i-1][1] == pos_2 and df['pos'][j][i][1] == pos_3:
trigram.append(
df['pos'][j][i-2][0] " " df['pos'][j][i-1][0] " " df['pos'][j][i][0])
all_trigram.append(trigram)
else:
all_trigram.append([])
return all_trigram