Home > Blockchain >  Python regex to get certain pattern
Python regex to get certain pattern

Time:10-25

['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html', '/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']

I want to get only /allstar/NBA_2017.html ,/allstar/NBA_2018.html ,/allstar/NBA_2019.html using re.compile().

Does anyone have an idea?

CodePudding user response:

I'm no expert in regex, but this works.

import re

li = [
    '/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html',
    '/allstar/NBA_2022.html', '/allstar/NBA_2021.html',
    '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html',
    '/allstar/NBA_2021.html', '/allstar/NBA_2020.html',
    '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
    '/allstar/NBA_2020.html', '/allstar/NBA_2019.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2017.html',
    '/allstar/NBA_2017.html'
]
prog = r'.*201[789].html'
def match(x):
    return prog.match(x)

prog = re.compile(prog)
res = list(filter(match, li))
print(res)

And this yields the following:

[
    '/allstar/NBA_2019.html', '/allstar/NBA_2019.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2017.html', '/allstar/NBA_2017.html'
]

Hope this is what you want!

CodePudding user response:

It's well known that compiling regular expressions in Python is unnecessary unless you have very large numbers of expressions being used in the same program. However, as it seems that you have to compile the expression, you could do this:

li = ['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
      '/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']
m = re.compile('.*NBA_201[789].html')
print(list(set(filter(m.match, li))))
  • Related