Home > Net >  Finding a substring in a list of lists
Finding a substring in a list of lists

Time:05-19

I have an issue and can't solve it, I have a dynamic list data (request data from web) that contains multiple lists, and each one contains strings, integers, etc., but I need the one which contains a specific text StreamCache. and just one and only one list in data contains the string StreamCache, which I store it in a new list. Almost all time my code works perfectly, but when it finds a list with a string like StreamCache@abnsdj12 or StreamCache*mljsgfn525, that are essentially the lists I need, my code doesn't work, just because StreamCache doesn't match exactly with StreamCache@kahsgsgh5 or so, I tried with list comprehension, regular expressions, but nothing works. Can someone help me? These are my solutions:

# Works only if 'StreamCache' matchs exactly with the iterable
temp1 = [i for i in data if 'StreamCache' in i]
################ Solution 2 that doesn't work at all
search = 'StreamCache'
for element in data:
    if isinstance(element, list):
        new = [i for i in element]
        z = re.compile('|'.join(re.escape(k) for k in new))
        result = re.findall(z, search)

Hope you can help me with this.

CodePudding user response:

You need to check if StreamCache is part of any string in the list, which you could do with something like this:

[l for l in data if any('StreamCache' in s for s in l)]

If StreamCache always occurs at the beginning of the string, this would be more efficient:

[l for l in data if any(s.startswith('StreamCache') for s in l)]

CodePudding user response:

The 2nd approach you attempted only returns [StreamCache] because the content you searching in is only StreamCache and regex object is <element 1>|<element 2>|...., did you mean to find the StreamCache.* string inside an string like below example?

a|abc|StreamCache*mljsgfn777|123|StreamCache|aweafwfa|asfwqwdq|StreamCache@abnsdj12|somestring|StreamCache*mljsgfn525

If so, I think you got the argument reverse by mistake, which the regex object is the first parameter and the search content is the second parameter. Below is an example that seems to provide the expected result for me

search = 'a|abc|StreamCache*mljsgfn777|123|StreamCache|aweafwfa|asfwqwdq|StreamCache@abnsdj12|somestring|StreamCache*mljsgfn525' # search content
z = re.compile('StreamCache[^|]*|') # regex object
search_result = list(filter(lambda x: x, re.findall(z, search))) # use filter to remove empty strings
# search_result here would contain ['StreamCache*mljsgfn777', 'StreamCache', 'StreamCache@abnsdj12', 'StreamCache*mljsgfn525']
  • Related