Home > Mobile >  Pandas how to mege the list of columns with NaN?
Pandas how to mege the list of columns with NaN?

Time:08-17

I want to merge the columns that have the list objects.

The problem is, I need to remove duplicate parts.

How am I able to get the columns that have the merged list like below?

Source:

     col_0      col_a      col_b      col_c

0      aa        [1]        NaN       [2,3]
1      bb       [a, b]     [b, c]      [c]
2      cc        NaN        NaN        NaN

Expected:

     col_0      col_a      col_b      col_c     merged_a_to_c

0      aa        [1]        NaN       [2,3]        [1,2,3]
1      bb       [a, b]     [b, c]      [c]        [a, b, c]
2      cc        NaN        NaN        NaN           NaN

CodePudding user response:

def merge(df):
    merged_a_to_c = []
    for row in range(len(df)):
        merge_tmp = []
        for columns in range(len(df.columns)):
            if type(df.iloc[row, columns]) == list: 
                for element in df.iloc[row, columns]:
                    if element not in merge_tmp:
                        merge_tmp.append(element)
                
        if merge_tmp != []:
            merged_a_to_c.append(merge_tmp)
        else:
            merged_a_to_c.append(np.nan)
    
    df['merged_a_to_c'] = merged_a_to_c
    return(df)
  col_0   col_a   col_b   col_c merged_a_to_c
0    aa     [1]     NaN  [2, 3]     [1, 2, 3]
1    bb  [a, b]  [b, c]     [c]     [a, b, c]
2    cc     NaN     NaN     NaN           NaN


You can use this code regardless of the size(column lengths, row lengths) of dataframes.

Hope this helps you.
I edited some codes cuz I didn't realize that I should concern the duplicate problems.

CodePudding user response:

You can just concatenate them with :

df['merged_a_to_c'] = df['col_a']   df['col_c']

Or for multiple columns (filling NaNs with empty lists first):

for col in ['col_a', 'col_b', 'col_c'...]:
    df['merged_a_to_c']  = df[col].apply(lambda d: d if isinstance(d, list) else [])

If you want to remove duplicates, an efficient way is to filter out NaNs and then apply set:

mask = ~df['merged_a_to_c'].isna()
df.loc[mask, 'merged_a_to_c'] = df.loc[mask, 'merged_a_to_c'].map(set)
  • Related