I have a function named tmp
that only returns two strings. In addition, I have 2 iterables that I want to pass to the tmp
function, one of them has 88000 lengths and another one has 50 lengths. I want to change the second one on every 200 iterates, but the problem is I can not iterate over the second iterable. Here is what I've done so far.
Code:
from itertools import repeat
url_list = [] # contains over 80000 urls
files = [] # contains 50 files
def tmp(url, file):
return url, file
# I want to use the file for only 200 URLs and then change it and use the next one in the list(files) provided
list(map(tmp, url_list, map(lambda x: repeat(x, 200), files)))
Expected output:
url1, file1
url2, file1
url3, file1
.
.
url201, file2
url202, file2
.
.
.
url401 file3
url402 file3
.
.
Any help would be highly appreciated.
CodePudding user response:
Rather than repeating files
200 times, split url_list
into chunks of 200. See How do I split a list into equally-sized chunks? for various ways to code this.
Use itertools.cycle()
to go back to the beginning of files
when you reach the end.
result = []
for url_chunk, file in zip(chunks(url_list, 200), itertools.cycle(files)):
result.extend([url, file for url in url_chunk])
CodePudding user response:
You can try for example this:
import pprint
url_list = ["url" str(i 1) for i in range(20)]
file_list = ["file" str(i 1) for i in range(5)]
every_n = 3
result = [ (url_list[i], file_list[min(i // every_n, len(file_list)-1)])
for i in range(len(url_list)) ]
pprint.pprint(result)
The output of above script:
[('url1', 'file1'),
('url2', 'file1'),
('url3', 'file1'),
('url4', 'file2'),
('url5', 'file2'),
('url6', 'file2'),
('url7', 'file3'),
('url8', 'file3'),
('url9', 'file3'),
('url10', 'file4'),
('url11', 'file4'),
('url12', 'file4'),
('url13', 'file5'),
('url14', 'file5'),
('url15', 'file5'),
('url16', 'file5'),
('url17', 'file5'),
('url18', 'file5'),
('url19', 'file5'),
('url20', 'file5')]
Remark:
//
integer division 5 // 2 = 2