Home > Blockchain >  Pandas: How to merge the ith elements of lists from different DataFrame columns?
Pandas: How to merge the ith elements of lists from different DataFrame columns?

Time:03-20

I have a Panadas dataframe that looks like this:

df = pd.DataFrame({
'a':[['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']],
'b':[['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']],
'c':[['jump'],['skip', 'throw', 'stab'],['walk','sleep']]
})

I want to merge the columns of this dataframe such that we have a single column that has each row with a list of tuples. The length of each row's list varies.

         Merged_Column

0        [('Often','j0003','jump')]
1        [('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),('Often','j0006','stab')]
2        [('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]

I've tried the following code, with the same data sourcing from lists:

lst1 = [['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']]
lst2 = [['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']]
lst3 = [['jump'],['skip', 'throw', 'stab'],['walk','sleep']]

merged = []
while x<len(lst1):
    for i in range(len(lst1[x])):
        merged.append((lst1[x][i], lst2[x][i], lst3[x][i]))
    x =1

which results in the following structure (when we call merged):

[('Often','j0003','jump'), ('Not Often', 'j0002','skip'),('Not Often','j0005','throw'),
('Often','j0006','stab'), ('Somewhat Often','j0009','walk'),('Never','j0010','sleep')]

Thing is, I need an extra level of structure in here, so that instead of getting a list of length 6, I get a list of length 3.

[[('Often','j0003','jump')],[('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),
('Often','j0006','stab')],[('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]]

I figure if I can get a data structure looking like this I can pretty easily do pd.DataFrame() and change my list of lists of tuples into a dataframe/series. But I'm having a lot of trouble getting there. Any tip/suggestions/pointers would be very much appreciated.

CodePudding user response:

This can be done very easily with explode. Just explode all the columns, then convert each row into a tuple, then re-combine the tuples into lists:

merged_df = df.explode(df.columns.tolist()).apply(tuple, axis=1).groupby(level=0).agg(list).to_frame('Merged_Column')

Output:

>>> merged_df
                                                                 Merged_Column
0                                                       [(Often, j0003, jump)]
1  [(Not Often, j0002, skip), (Not Often, j0005, throw), (Often, j0006, stab)]
2                       [(Somewhat Often, j0009, walk), (Never, j0010, sleep)]
  • Related