Home > Back-end >  Use repeat function in map alongside lambda function
Use repeat function in map alongside lambda function

Time:06-29

I have a function named tmp that only returns two strings. In addition, I have 2 iterables that I want to pass to the tmp function, one of them has 88000 lengths and another one has 50 lengths. I want to change the second one on every 200 iterates, but the problem is I can not iterate over the second iterable. Here is what I've done so far.

Code:

from itertools import repeat


url_list = [] # contains over 80000 urls 
files = [] # contains 50 files

def tmp(url, file):
    return url, file
    
# I want to use the file for only 200 URLs and then change it and use the next one in the list(files) provided
list(map(tmp, url_list, map(lambda x: repeat(x, 200), files)))

Expected output:

url1, file1
url2, file1
url3, file1
.
.
url201, file2
url202, file2
.
.
.
url401 file3
url402 file3
.
.

Any help would be highly appreciated.

CodePudding user response:

Rather than repeating files 200 times, split url_list into chunks of 200. See How do I split a list into equally-sized chunks? for various ways to code this.

Use itertools.cycle() to go back to the beginning of files when you reach the end.

result = []

for url_chunk, file in zip(chunks(url_list, 200), itertools.cycle(files)):
    result.extend([url, file for url in url_chunk])

CodePudding user response:

You can try for example this:

import pprint

url_list = ["url"   str(i 1) for i in range(20)]
file_list = ["file"   str(i 1) for i in range(5)]
every_n = 3

result = [ (url_list[i], file_list[min(i // every_n, len(file_list)-1)])
           for i in range(len(url_list)) ]

pprint.pprint(result)

The output of above script:

[('url1', 'file1'),
 ('url2', 'file1'),
 ('url3', 'file1'),
 ('url4', 'file2'),
 ('url5', 'file2'),
 ('url6', 'file2'),
 ('url7', 'file3'),
 ('url8', 'file3'),
 ('url9', 'file3'),
 ('url10', 'file4'),
 ('url11', 'file4'),
 ('url12', 'file4'),
 ('url13', 'file5'),
 ('url14', 'file5'),
 ('url15', 'file5'),
 ('url16', 'file5'),
 ('url17', 'file5'),
 ('url18', 'file5'),
 ('url19', 'file5'),
 ('url20', 'file5')]

Remark:

  • // integer division 5 // 2 = 2
  • Related