I have a Panadas dataframe that looks like this:
df = pd.DataFrame({
'a':[['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']],
'b':[['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']],
'c':[['jump'],['skip', 'throw', 'stab'],['walk','sleep']]
})
I want to merge the columns of this dataframe such that we have a single column that has each row with a list of tuples. The length of each row's list varies.
Merged_Column
0 [('Often','j0003','jump')]
1 [('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),('Often','j0006','stab')]
2 [('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]
I've tried the following code, with the same data sourcing from lists:
lst1 = [['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']]
lst2 = [['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']]
lst3 = [['jump'],['skip', 'throw', 'stab'],['walk','sleep']]
merged = []
while x<len(lst1):
for i in range(len(lst1[x])):
merged.append((lst1[x][i], lst2[x][i], lst3[x][i]))
x =1
which results in the following structure (when we call merged):
[('Often','j0003','jump'), ('Not Often', 'j0002','skip'),('Not Often','j0005','throw'),
('Often','j0006','stab'), ('Somewhat Often','j0009','walk'),('Never','j0010','sleep')]
Thing is, I need an extra level of structure in here, so that instead of getting a list of length 6, I get a list of length 3.
[[('Often','j0003','jump')],[('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),
('Often','j0006','stab')],[('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]]
I figure if I can get a data structure looking like this I can pretty easily do pd.DataFrame()
and change my list of lists of tuples into a dataframe/series. But I'm having a lot of trouble getting there. Any tip/suggestions/pointers would be very much appreciated.
CodePudding user response:
This can be done very easily with explode
. Just explode all the columns, then convert each row into a tuple, then re-combine the tuples into lists:
merged_df = df.explode(df.columns.tolist()).apply(tuple, axis=1).groupby(level=0).agg(list).to_frame('Merged_Column')
Output:
>>> merged_df
Merged_Column
0 [(Often, j0003, jump)]
1 [(Not Often, j0002, skip), (Not Often, j0005, throw), (Often, j0006, stab)]
2 [(Somewhat Often, j0009, walk), (Never, j0010, sleep)]