Home > Net >  Remove word from string if it's not in List
Remove word from string if it's not in List

Time:01-01

I have a list of "tags" and want only words in this list to be in the output string

tags = ['S', 'WHAVP', 'POS', 'RBR', 'TO', 'JJR', 'WDT', 'INTJ', 'PP', 'SINV', 'VBZ', 'NX', 'WP', 'WHADJP', 'RP', 'IN', 'VBN', 'RB', 'UH', 'PRP', 'SBAR', 'LST', 'SBARQ', 'FRAG', 'EX', 'NP', 'NN', 'VP', 'NNPS', 'PRT', 'PDT', 'QP', 'VBG', 'ADJP', 'CONJP', 'VB', 'CD', 'WHPP', 'JJ', 'SYM', 'JJS', 'NNP', 'WHNP', 'WRB', 'FW', 'NNS', 'RBS', 'MD', 'PRN', 'DT', 'LS', 'X', 'ADVP', 'VBD', 'SQ', 'NAC', 'CC', 'UCP', 'RRC', 'VBP', 'WP$', '(',')']

input = "(SBARQ (WHNP (WP What)) (SQ (VBP do) (NP (PRP you)) (VP (VB want)))"

This is the expected output:

(SBARQ(WHNP(WP))(SQ(VBP)(NP(PRP))(VP(VB)))

How do I get this to work?

CodePudding user response:

A brute-force method using list comprehension:

out = ''.join((s if s.strip('()') in tags else s.lower().strip('abcdefghijklmnopqrstuvwxyz') for s in my_string.split() ))

Output:

'(SBARQ(WHNP(WP))(SQ(VBP)(NP(PRP))(VP(VB)))'

CodePudding user response:

Using re:

tags = ['S', 'WHAVP', 'POS', 'RBR', 'TO', 'JJR', 'WDT', 'INTJ', 'PP', 'SINV', 'VBZ', 'NX', 'WP', 'WHADJP', 'RP', 'IN',
        'VBN', 'RB', 'UH', 'PRP', 'SBAR', 'LST', 'SBARQ', 'FRAG', 'EX', 'NP', 'NN', 'VP', 'NNPS', 'PRT', 'PDT', 'QP',
        'VBG', 'ADJP', 'CONJP', 'VB', 'CD', 'WHPP', 'JJ', 'SYM', 'JJS', 'NNP', 'WHNP', 'WRB', 'FW', 'NNS', 'RBS', 'MD',
        'PRN', 'DT', 'LS', 'X', 'ADVP', 'VBD', 'SQ', 'NAC', 'CC', 'UCP', 'RRC', 'VBP', 'WP$',
        # '(', ')'
        ]

str_input = "(SBARQ (WHNP (WP What)) (SQ (VBP do) (NP (PRP you)) (VP (VB want)))"

out = ''.join(re.findall(r'[\(\)]|'   '|'.join(fr'\b{re.escape(tag)}\b' for tag in tags), str_input))

Output:

(SBARQ(WHNP(WP))(SQ(VBP)(NP(PRP))(VP(VB)))
  • Related