Home > Back-end >  how compare 2 DataFrames and give unigvalue from one df column to the other
how compare 2 DataFrames and give unigvalue from one df column to the other

Time:12-31

I will need your help please in the following problem I am facing:

I have 2 dataframes (df_big and df_small) with many columns each one. In the df_small each row have unique combination for the columns ‘a’, ‘b’,’c’, ‘d’,’e’ in the df_big exist rows have same values with df_small at ‘a’, ‘b’,’c’, ‘d’,’e’ but exist and rows with same ‘a’, ‘b’,’c’, ‘d’ and different value in ‘e’ column.

I would like in df_big the rows with same values with df_small in columns ‘a’, ‘b’,’c’, ‘d’ have only one value in the column ‘e’ . Must be the value from df_small.

    while i < len(df_small):
    while j < len(df_big):

        if ((df_big.iat[j,df_big.columns.get_loc('a')]  == df_small.iat[i,df_small.columns.get_loc('a')]) &
                        (df_big.iat[j,df_big.columns.get_loc('b')]  == df_small.iat[i,df_small.columns.get_loc('b')]) &
                        (df_big.iat[j,df_big.columns.get_loc('c')]  == df_small.iat[i,df_small.columns.get_loc('c')]) &

                        (df_big.iat[j,df_big.columns.get_loc('d')]  == df_small.iat[i,df_small.columns.get_loc('d')]) ):

                            df_big.iat[j,df_big.columns.get_loc('e')]  = df_small.iat[i,df_small.columns.get_loc('e')]
        else:
                            pass
    j= j   1
i = i  1

i have many tries and with if and iloc but I did not make it. could someone help me please!

CodePudding user response:

Use DataFrame.set_index with DataFrame.update:

np.random.seed(2021)
df_big = pd.DataFrame(np.random.randint(10, size=(7, 6)), columns=list('abcdef'))

df_small = df_big[df_big['a'].le(2)].assign(e = lambda x: x['b'] * x['e'] * 10)
print (df_big)
   a  b  c  d  e  f
0  4  5  9  0  6  5
1  8  6  6  6  6  1
2  5  7  1  1  5  2
3  0  3  1  0  2  6
4  4  8  5  1  6  7
5  5  6  9  5  6  9
6  2  4  3  9  2  8

print (df_small)
   a  b  c  d   e  f
3  0  3  1  0  60  6
6  2  4  3  9  80  8

df_big = df_big.set_index(['a','b','c','d'])
df_small = df_small.set_index(['a','b','c','d'])['e']

#if possible duplicates per a,b,c,d remove them
#df_small = df_small.drop_duplicates(['a','b','c','d']).set_index(['a','b','c','d'])['e']

df_big.update(df_small)
df_big = df_big.reset_index()
print (df_big)
   a  b  c  d     e  f
0  4  5  9  0   6.0  5
1  8  6  6  6   6.0  1
2  5  7  1  1   5.0  2
3  0  3  1  0  60.0  6
4  4  8  5  1   6.0  7
5  5  6  9  5   6.0  9
6  2  4  3  9  80.0  8
  • Related