Home > database >  Flatten tuple containing list to multiple tuples
Flatten tuple containing list to multiple tuples

Time:02-05

I have a list of tuples that each contain a "key" and a list "value":

[('ABE', ['ORD', 'ATL', 'DTW'])]

Here, ABE is the "key" and the list ['ORD', 'ATL', 'DTW'] is the "value".

How can I "flatten" this RDD structure by mapping each of the original key-value tuples to three tuples, all with same "key" each with a different element of the value list?

My desired output is

[('ABE', 'ORD'), ('ABE','ATL'), ('ABE','DTW')] 

CodePudding user response:

This can be accomplished in a single list comprehension:

data = [('ABE', ['ORD', 'ATL', 'DTW'])]

flattened = [
    (key, elem) 
    for key, value in data 
    for elem in value
]
print(flattened)

outputs

[('ABE', 'ORD'), ('ABE', 'ATL'), ('ABE', 'DTW')]

CodePudding user response:

With itertools.zip_longest with key as fillvalue.

from itertools import zip_longest


lst = [('ABE', ['ORD', 'ATL', 'DTW']), ('1', ['A', 'B', 'C'])]


res = []
for key, sublist in lst:
    res.append(tuple(zip_longest([key], sublist, fillvalue=key)))

print(res)

CodePudding user response:

I have tested Brian solution and got "PipelinedRDD' object is not iterable". Therefore I added collect() function for the data.

data = [('ABE', ['ORD', 'ATL', 'DTW'])]

flattened = [
    (key, elem)
    for key, value in data.collect()
    for elem in value
]

print(flattened)
  • Related