I have lists like this in a column of a data frame:
list1 = [[Petitioner Jae Lee,his,he],[]]
list2 = [[lee],[federal officials]]
list3 = [[],[lawyer]]
But I want to turn into
list1 = ['Petitioner Jae Lee' , 'his','he']
list2 = ['lee' , 'federal officials']]
list3 = ['lawyer']
and I want to do it for a column in a data frame. How can I do it?
CodePudding user response:
list1 = [['Petitioner Jae Lee','his','he'],[]]
list2 = [['lee'],['federal officials']]
list3 = [[],['lawyer']]
flat_list1 = [item for sublist in list1 for item in sublist]
flat_list2 = [item for sublist in list2 for item in sublist]
flat_list3 = [item for sublist in list3 for item in sublist]
print(flat_list1)
print(flat_list2)
print(flat_list3)
CodePudding user response:
Use Series.map
to apply the logic row-wise.
To concatenate the sublists into a single list you can use the built-in sum
function
df['col'] = df['col'].map(lambda list_i : sum(list_i, []))
A better alternative is to unpack the sublists and pass them to itertools.chain
import itertools as it
df['col'] = df['col'].map(lambda list_i : list(it.chain(*list_i)))
Or use a nested list comprehension
df['col'] = df['col'].map(lambda list_i : [string for sublist in list_i for string in sublist])