Home > Blockchain >  Create and populate dataframe column simulating (excel) vlookup function
Create and populate dataframe column simulating (excel) vlookup function

Time:03-02

I am trying to create a new column in a dataframe and polulate it with a value from another data frame column which matches a common column from both data frames columns.

DF1   DF2
A B   W B
———   ——— 
Y 2   X 2
N 4   F 4   
Y 5   T 5

I though the following could do the tick.

df2[‘new_col’] = df1[‘A’] if df1[‘B’] == df2[‘B’] else “Not found”

So result should be:

DF2
W B new_col
X 2 Y        -> Because DF1[‘B’] == 2 and value in same row is Y
F 4 N
T 5 Y

but I get the below error, I believe that is because dataframes are different sizes?

raise ValueError("Can only compare identically-labeled Series objects”)

Can you help me understand what am I doing wrong and what is the best way to achieve what I am after?

Thank you in advance.

CodePudding user response:

Your question is not clear because why is F associated with N and T with Y? Why not F with Y and T with N?

Using merge:

>>> df2.merge(df1, on='B', how='left')
   W  B  A
0  X  2  Y
1  F  4  N  # What you want
2  F  4  Y  # Another solution
3  T  4  N  # What you want
4  T  4  Y  # Another solution

How do you decide on the right value? With row index?

Update

So you need to use the index position:

>>> df2.reset_index().merge(df1.reset_index(), on=['index', 'B'], how='left') \
       .drop(columns='index').rename(columns={'A': 'new_col'})

   W  B new_col
0  X  2       Y
1  F  4       N
2  T  4       Y

In fact you can consider the column B as an additional index of each dataframe.

Using join

>>> df2.set_index('B', append=True).join(df1.set_index('B', append=True)) \
       .reset_index('B').rename(columns={'A': 'new_col'})

   B  W new_col
0  2  X       Y
1  4  F       N
2  4  T       Y
  • Related