I have a list of strings, I want to filter out these strings base on a given year. for example, in the below list, I only want strings with years above 2018 and also strings that don't contain years. My solution is current, I just need a better way to do this.
data = [
'/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2020/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/',
'/soccer/zimbabwe/premier-soccer-league-2018/results/',
'/soccer/zimbabwe/premier-soccer-league-2017/results/']
my script
import re
for i in data:
match = re.match(r".*([1-3][0-9]{3})",i)
if match is not None:
if match.group(1) > '2018':
print(i)
else:
print(i)
expected output:
data = [
'/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2017/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/']
CodePudding user response:
You need to append the values to a list (result
in the below code). You can do like this,
import re
result = []
for i in data:
match = re.match(r'.*(\d{4})', i)
if match:
if int(match.group(1)) > 2018:
result.append(i)
else:
result.append(i)
Output:
['/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2020/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/']
EDIT:
The approach without using the loop.
def is_match(s, year):
match = re.match(r'.*(\d{4})', s)
return match is None or int(match.group(1)) > year
result = list(filter(lambda seq: is_match(seq, 2018), data))