Merge 2 dataframes and update column with lists using condtions-CodePudding

I have 2 dataframes with same columns and indexes.

     a              a

1 []             1 [5,2,7]

2 [1,2,3]        2 [1,2,3,4]

3 [7]            3 [7,5]

I want to merge them using condition, when length of list is <=1 then take value and add it to 1st data frame, else left old value.

So after that result is:

What is the best way to do this?

CodePudding user response：

for i, (x,y) in enumerate(zip(dfa['a'], dfb['b'])):
    # apply your logic - 'when length of list is <=1 then take value...' and save it in dfa['a'][i]
    if len(x) <= 1:
        dfa.loc[i]['a'] = y

CodePudding user response：

Here is an approach using pandas.DataFrame.mask.

First, make sure that the values of each dataframe/column are lists :

df1["a"]= df1["a"].str.strip("[]").str.split(",") #skip if already a list
df2["a"]= df2["a"].str.strip("[]").str.split(",") #skip if already a list

Then, use pandas.Series.str.len :

out = df1.mask(df1["a"].str.len().le(1), other=df2["a"], axis=0)

Or use pandas.Series.transform :

out = df1.mask(df1["a"].transform(len).le(1), other=df2["a"], axis=0)

# Output :

print(out)
           a
0  [5, 2, 7]
1  [1, 2, 3]
2     [7, 5]

CodePudding user response：

You can use numpy.where() to evaluate the criteria and return which version you want.

In this solution, I combine the two lists and then convert to a set to remove duplicates, and then convert back to a list because that is what you want at the end, I believe. Note that this does change the elements (you can [2,5,7] instead of [5,2,7])

new_df = np.where(
    df1['a'].apply(len)<=1,
    (df1['a']   df2['a']).apply(set).apply(list),
    df1['a'])