Home > Software design >  Combine a list with a list of varied length within a list
Combine a list with a list of varied length within a list

Time:10-01

I am trying to combine historical data, which comes from an ancient custom made email system, to create a database with python. One list (b) contains the email id, and another list (a) contains filenames of attachments. An email may have zero, one, or many attachments. There are thousands of records to process.

I have extracted the data in the following format:

a = [[], ['a'], ['b', 'c', 'd']]
b = ['c1', 'c2', 'c3']

I want the empty data in 'a' removed and the remaining data in the following format, but don't care if it is a list or tuple.

x = [[['c2', 'a'], [['c3', 'b'], ['c3', 'c'], ['c4', 'd']]]

I have tried using zip

x = zip(b, a)

But that added to the start of each

(('c1', []), ('c2', ['a']), ('c3', ['b', 'c', 'd']))

I tried itertools chain

op = [list(itertools.chain(*i))
    for i in zip(b, a)]

But that yielded

[['c', '1'], ['c', '2', 'a'], ['c', '3', 'b', 'c', 'd']]

I have also tried using re.findall() to get the data into a more usable format, but there will usually be a mismatched number of email ids to filenames. There is lots of stuff about lists and joining, etc., but I haven't found anything useful regarding a list within a list where there is variable length.

Thanks in advance

CodePudding user response:

I hope I've understood your question right (in your output you have c4 but I think it should be c3):

a = [[], ["a"], ["b", "c", "d"]]
b = ["c1", "c2", "c3"]

out = [[[v, s] for s in l] for v, l in [t for t in zip(b, a) if t[1]]]
print(out)

Prints:

[[["c2", "a"]], [["c3", "b"], ["c3", "c"], ["c3", "d"]]]
  • Related