['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html', '/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']
I want to get only /allstar/NBA_2017.html
,/allstar/NBA_2018.html
,/allstar/NBA_2019.html
using re.compile().
Does anyone have an idea?
CodePudding user response:
I'm no expert in regex, but this works.
import re
li = [
'/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html',
'/allstar/NBA_2022.html', '/allstar/NBA_2021.html',
'/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html',
'/allstar/NBA_2021.html', '/allstar/NBA_2020.html',
'/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
'/allstar/NBA_2020.html', '/allstar/NBA_2019.html',
'/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html',
'/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
'/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html',
'/allstar/NBA_2018.html', '/allstar/NBA_2017.html',
'/allstar/NBA_2017.html'
]
prog = r'.*201[789].html'
def match(x):
return prog.match(x)
prog = re.compile(prog)
res = list(filter(match, li))
print(res)
And this yields the following:
[
'/allstar/NBA_2019.html', '/allstar/NBA_2019.html',
'/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
'/allstar/NBA_2018.html', '/allstar/NBA_2018.html',
'/allstar/NBA_2017.html', '/allstar/NBA_2017.html'
]
Hope this is what you want!
CodePudding user response:
It's well known that compiling regular expressions in Python is unnecessary unless you have very large numbers of expressions being used in the same program. However, as it seems that you have to compile the expression, you could do this:
li = ['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
'/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']
m = re.compile('.*NBA_201[789].html')
print(list(set(filter(m.match, li))))