I have 2 dataframes with same columns and indexes.
a a
1 [] 1 [5,2,7]
2 [1,2,3] 2 [1,2,3,4]
3 [7] 3 [7,5]
I want to merge them using condition, when length of list is <=1 then take value and add it to 1st data frame, else left old value.
So after that result is:
a
1 [5,2,7]
2 [1,2,3]
3 [7,5]
What is the best way to do this?
CodePudding user response:
for i, (x,y) in enumerate(zip(dfa['a'], dfb['b'])):
# apply your logic - 'when length of list is <=1 then take value...' and save it in dfa['a'][i]
if len(x) <= 1:
dfa.loc[i]['a'] = y
CodePudding user response:
Here is an approach using pandas.DataFrame.mask
.
First, make sure that the values of each dataframe/column are lists :
df1["a"]= df1["a"].str.strip("[]").str.split(",") #skip if already a list
df2["a"]= df2["a"].str.strip("[]").str.split(",") #skip if already a list
Then, use pandas.Series.str.len
:
out = df1.mask(df1["a"].str.len().le(1), other=df2["a"], axis=0)
Or use pandas.Series.transform
:
out = df1.mask(df1["a"].transform(len).le(1), other=df2["a"], axis=0)
# Output :
print(out)
a
0 [5, 2, 7]
1 [1, 2, 3]
2 [7, 5]
CodePudding user response:
You can use numpy.where()
to evaluate the criteria and return which version you want.
In this solution, I combine the two lists and then convert to a set to remove duplicates, and then convert back to a list because that is what you want at the end, I believe. Note that this does change the elements (you can [2,5,7]
instead of [5,2,7]
)
new_df = np.where(
df1['a'].apply(len)<=1,
(df1['a'] df2['a']).apply(set).apply(list),
df1['a'])