I have a array that contains many sentences. I have split this sentences into words and make another array. I want that the words that id start with "[" and end with "]" are removed from my array.
ex.
from nltk import sent_tokenize
sentences = sent_tokenize(text)
print(sentences[0])
z= np.array(sentences)
sentence: [42] On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India.
words = z[0].split()
words= list(words)
print(words)
after split into words : ['[42]', 'On', '20', 'January', '1987,', 'he', 'also', 'turned', 'out', 'as', 'substitute', 'for', 'Imran', "Khan's", 'side', 'in', 'an', 'exhibition', 'game', 'at', 'Brabourne', 'Stadium', 'in', 'Bombay,', 'to', 'mark', 'the', 'golden', 'jubilee', 'of', 'Cricket', 'Club', 'of', 'India.']
Now I want to remove [42] from my array. and then join this words into sentence. How can I do that? I tried this way. but this is not working. it remove whole array and print None.
for i in words:
if i[0]=="[":
b=words.remove(i)
print(b)
else:
print("")
CodePudding user response:
You may consider using list comprehension as below:
sentence = "[42] On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India."
words = sentence.split()
words = [ w for w in words if w[0]!='[' and w[-1]!= ']' ]
filtered = ' '.join(words)
print(filtered)
"On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India."
CodePudding user response:
don't do else case, because after another "i" in loop the loop go to else for not any good reason
CodePudding user response:
Use a regex (you won't need to split the sentence):
import re
sentence = "[42] On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India."
re.sub(r'\[. \]','',sentence)