Home > Net >  how to use a regex to remove all letters in a string until a tabulation \t
how to use a regex to remove all letters in a string until a tabulation \t

Time:03-14

I try to remove all English words in a list that contains 1000 lines written on the same pattern :
Englishword Vietnamese word \t
...
Word từ \t
...
Young trẻ tuổi \t this is the last line

i tried :

     dix_words = """Ability    có khả năng
    About    Về
    Above    ở trên
    Abuse    lạm dụng
    Accept    Chấp nhận
    Access    tới gần
    Achieve    Hoàn thành
    Acknowledge    thừa nhận
    Acquire    giành được
    Across    băng qua"""    
    
    lines = dix_words
    list_of_words = lines.splitlines()
    print(list_of_words)    
    
    list_of_vn_words = ""           #  create new final string
          for word in list_of_words:    
                word_vn = re.sub(r'.*\t$', '', word)        #  create new word_vn
                list_of_vn_words = list_of_vn_words.append(word_vn) # create new string    

My regex is supposed to replace all letters (.*) before the end \t ($) by 'nothing'
Hard for me to see that regex and me are not really on the same wavelength
because my word_vn is same as word
i will find a way for append which doesn't work with string...

CodePudding user response:

Your logic is correct, it is just that the space between the words are normal white space (\s) character and not \t. Here is the working code:

import re

dix_words = """Ability    có khả năng
About    Về
Above    ở trên
Abuse    lạm dụng
Accept    Chấp nhận
Access    tới gần
Achieve    Hoàn thành
Acknowledge    thừa nhận
Acquire    giành được
Across    băng qua"""
lines = dix_words
list_of_words = lines.splitlines()
print(list_of_words)    
    
list_of_vn_words = []       #  create new final string
for word in list_of_words:
    worn_vn = re.sub(r'.*\t$', '', word)
    if worn_vn == word:
        word_vn = re.sub('.*\s{4}', '', word)        #  create new word_vn
    list_of_vn_words.append(word_vn) # create new string    

  • Related