Home > Back-end >  Efficient way to select strings from a list based on a certain condition
Efficient way to select strings from a list based on a certain condition

Time:10-23

I have a list of strings, I want to filter out these strings base on a given year. for example, in the below list, I only want strings with years above 2018 and also strings that don't contain years. My solution is current, I just need a better way to do this.

data = [
    '/soccer/zimbabwe/premier-soccer-league/results/',
    '/soccer/zimbabwe/premier-soccer-league-2020/results/',
    '/soccer/zimbabwe/premier-soccer-league-2019/results/',
    '/soccer/zimbabwe/premier-soccer-league-2018/results/',
    '/soccer/zimbabwe/premier-soccer-league-2017/results/']

my script

import re

for i in data:
    match = re.match(r".*([1-3][0-9]{3})",i)
    if match is not None: 
        if match.group(1) > '2018':
            print(i)
    else:
        print(i)

expected output:

data = [
    '/soccer/zimbabwe/premier-soccer-league/results/',
    '/soccer/zimbabwe/premier-soccer-league-2017/results/',
    '/soccer/zimbabwe/premier-soccer-league-2019/results/']

CodePudding user response:

You need to append the values to a list (result in the below code). You can do like this,

import re

result = []
for i in data:
    match = re.match(r'.*(\d{4})', i)
    if match:
        if int(match.group(1)) > 2018:
            result.append(i)
    else:
        result.append(i)

Output:

['/soccer/zimbabwe/premier-soccer-league/results/',
 '/soccer/zimbabwe/premier-soccer-league-2020/results/',
 '/soccer/zimbabwe/premier-soccer-league-2019/results/']

EDIT:

The approach without using the loop.

def is_match(s, year):
    match = re.match(r'.*(\d{4})', s)
    return match is None or int(match.group(1)) > year

result = list(filter(lambda seq: is_match(seq, 2018), data))
  • Related