I want to merge the columns that have the list objects.
The problem is, I need to remove duplicate parts.
How am I able to get the columns that have the merged list like below?
Source:
col_0 col_a col_b col_c
0 aa [1] NaN [2,3]
1 bb [a, b] [b, c] [c]
2 cc NaN NaN NaN
Expected:
col_0 col_a col_b col_c merged_a_to_c
0 aa [1] NaN [2,3] [1,2,3]
1 bb [a, b] [b, c] [c] [a, b, c]
2 cc NaN NaN NaN NaN
CodePudding user response:
def merge(df):
merged_a_to_c = []
for row in range(len(df)):
merge_tmp = []
for columns in range(len(df.columns)):
if type(df.iloc[row, columns]) == list:
for element in df.iloc[row, columns]:
if element not in merge_tmp:
merge_tmp.append(element)
if merge_tmp != []:
merged_a_to_c.append(merge_tmp)
else:
merged_a_to_c.append(np.nan)
df['merged_a_to_c'] = merged_a_to_c
return(df)
col_0 col_a col_b col_c merged_a_to_c
0 aa [1] NaN [2, 3] [1, 2, 3]
1 bb [a, b] [b, c] [c] [a, b, c]
2 cc NaN NaN NaN NaN
You can use this code regardless of the size(column lengths, row lengths) of dataframes.
Hope this helps you.
I edited some codes cuz I didn't realize that I should concern the duplicate problems.
CodePudding user response:
You can just concatenate them with
:
df['merged_a_to_c'] = df['col_a'] df['col_c']
Or for multiple columns (filling NaNs with empty lists first):
for col in ['col_a', 'col_b', 'col_c'...]:
df['merged_a_to_c'] = df[col].apply(lambda d: d if isinstance(d, list) else [])
If you want to remove duplicates, an efficient way is to filter out NaNs and then apply set
:
mask = ~df['merged_a_to_c'].isna()
df.loc[mask, 'merged_a_to_c'] = df.loc[mask, 'merged_a_to_c'].map(set)