Home > Back-end >  fill only unique value from another dataframe based on condition
fill only unique value from another dataframe based on condition

Time:01-22

How to fill '0' value in df1 from unique value from another dataframe (df2). the expected output is no duplicate in df1.

any reference links for this. thank for helping out.

data1 = {'test' :['b',0,'e',0,0,'f']}
df1 = pd.DataFrame(data=data1)

data2 = {'test' :['a','b','c','d','e','f']}
df2 = pd.DataFrame(data=data2)

df1

   test
0   b  
1   0
2   e  
3   0  
4   0  
5   f  

df2

   test
0   a
1   b
2   c
3   d
4   e
5   f

expected output:

   test
0   b  -- df1
1   a  -- fill with a from df2
2   e  -- df1
3   c  -- fill with c from df2
4   d  -- fill with d from df2
5   f  -- df1

CodePudding user response:

Assuming you have enough unique values in df2 to fill the 0s in df1, extract those unique values, and assign them with boolean indexing:

# which rows are 0?
m = df1['test'].eq(0)

# extract values of df2 that are not in df1
vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].tolist()
# ['b', 'e', 'f']

# assign the values in the limit of the needed number
df1.loc[m, 'test'] = vals[:m.sum()]

print(df1)

Output:

  test
0    b
1    a
2    e
3    c
4    d
5    f

If there is not always enough values in df2 and you want to fill the first possible 0s:

m = df1['test'].eq(0)

vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].unique()
# ['b', 'e', 'f']

m2 = m.cumsum().le(len(vals))

df1.loc[m&m2, 'test'] = vals[:m.sum()]

print(df1)

CodePudding user response:

Solution Assumptions:

  1. number of '0' == unique values in df2
  2. have a column like 'test' to be manipulated
# get the unique values in df1
uni = df1['test'].unique()

# get the unique values in df2 which are not in df1
unique_df2 = df2[~df2['test'].isin(uni)]

# get the index of all the '0' in df1 in a list
index_df1_0 = df1.index[df1['test'] == 0].tolist()

# replace the '0' in df1 with unique values from df1 (assumption #1 imp!)
for val_ in range(len(index_df1_0)):
    df1.iloc[index_df1_0[val_]] = unique_df2.iloc[val_]

print(df1)
  • Related