how to use a regex to remove all letters in a string until a tabulation \t-CodePudding

I try to remove all English words in a list that contains 1000 lines written on the same pattern :
Englishword Vietnamese word \t
...
Word từ \t
...
Young trẻ tuổi \t this is the last line

i tried :

     dix_words = """Ability    có khả năng
    About    Về
    Above    ở trên
    Abuse    lạm dụng
    Accept    Chấp nhận
    Access    tới gần
    Achieve    Hoàn thành
    Acknowledge    thừa nhận
    Acquire    giành được
    Across    băng qua"""    
    
    lines = dix_words
    list_of_words = lines.splitlines()
    print(list_of_words)    
    
    list_of_vn_words = ""           #  create new final string
          for word in list_of_words:    
                word_vn = re.sub(r'.*\t$', '', word)        #  create new word_vn
                list_of_vn_words = list_of_vn_words.append(word_vn) # create new string

My regex is supposed to replace all letters (.*) before the end \t ($) by 'nothing'
Hard for me to see that regex and me are not really on the same wavelength
because my word_vn is same as word
i will find a way for append which doesn't work with string...

CodePudding user response：

Your logic is correct, it is just that the space between the words are normal white space (\s) character and not \t. Here is the working code:

import re

dix_words = """Ability    có khả năng
About    Về
Above    ở trên
Abuse    lạm dụng
Accept    Chấp nhận
Access    tới gần
Achieve    Hoàn thành
Acknowledge    thừa nhận
Acquire    giành được
Across    băng qua"""
lines = dix_words
list_of_words = lines.splitlines()
print(list_of_words)    
    
list_of_vn_words = []       #  create new final string
for word in list_of_words:
    worn_vn = re.sub(r'.*\t$', '', word)
    if worn_vn == word:
        word_vn = re.sub('.*\s{4}', '', word)        #  create new word_vn
    list_of_vn_words.append(word_vn) # create new string