I have a list of tuples that each contain a "key" and a list "value":
[('ABE', ['ORD', 'ATL', 'DTW'])]
Here, ABE
is the "key" and the list ['ORD', 'ATL', 'DTW']
is the "value".
How can I "flatten" this RDD structure by mapping each of the original key-value tuples to three tuples, all with same "key" each with a different element of the value list?
My desired output is
[('ABE', 'ORD'), ('ABE','ATL'), ('ABE','DTW')]
CodePudding user response:
This can be accomplished in a single list comprehension:
data = [('ABE', ['ORD', 'ATL', 'DTW'])]
flattened = [
(key, elem)
for key, value in data
for elem in value
]
print(flattened)
outputs
[('ABE', 'ORD'), ('ABE', 'ATL'), ('ABE', 'DTW')]
CodePudding user response:
With itertools.zip_longest
with key as fillvalue
.
from itertools import zip_longest
lst = [('ABE', ['ORD', 'ATL', 'DTW']), ('1', ['A', 'B', 'C'])]
res = []
for key, sublist in lst:
res.append(tuple(zip_longest([key], sublist, fillvalue=key)))
print(res)
CodePudding user response:
I have tested Brian solution and got "PipelinedRDD' object is not iterable"
.
Therefore I added collect()
function for the data.
data = [('ABE', ['ORD', 'ATL', 'DTW'])]
flattened = [
(key, elem)
for key, value in data.collect()
for elem in value
]
print(flattened)