I want to perform vlookup on two excel files, and select only certain column in second excel file
Here's Book1.xlsx
BILL_ID APPROVED_BY BILL_DESCRIPTION BILLED_AMOUNT BILL_CONTROL_NUMBER
163191467 111 Loan Repayment 6792731.25 991110749346
167840287 111 Loan Repayment 9899385 991110840923
166618791 111 Loan Repayment 4098285 991110809688
166618796 111 Loan Repayment 960521.25 991110809690
164074843 111 Loan Repayment 18415107 991110772113
173800261 18 LOAN REPAYMENT 93000000 991110852243
163874933 111 Loan Repayment 5845935 991110767830
167256844 111 Loan Repayment 4472520 991110827054
180746273 7 LOAN REPAYMENT 6948435.7 991110852786
Here's Book2.xlsx
Name Control_number Start_date end_date
Tony 991110749346 1/31/2031 23:59 3/24/2021 15:03
Alex 991110759417 7/23/2030 23:59 7/23/2020 19:52
Jamila 991110849017 9/8/2030 23:59 9/10/2020 9:55
Fred 991110812010 8/19/2030 23:59 8/19/2020 0:26
Rosena 991110809690 6/11/2021 23:59 5/12/2021 17:41
Thobias 991110854526 6/10/2021 23:59 5/11/2021 8:00
George 991110822870 8/23/2030 23:59 8/23/2020 16:35
Foreman 991110831014 7/5/2030 23:59 8/26/2020 9:55
Tyson 991110852552 11/12/2030 23:59 11/14/2020 13:55
Genry 991110852243 8/23/2030 23:59 8/23/2020 20:04
Gastory 991110853244 12/28/2030 23:59 2/18/2021 14:16
Shebby 991110827054 6/6/2021 23:59 5/7/2021 11:11
Here's desired output
BILL_ID APPROVED_BY BILL_DESCRIPTION BILLED_AMOUNT BILL_CONTROL_NUMBER Name Control_number
163191467 111 Loan Repayment 6792731.25 991110749346 Tony 991110749346
166618796 111 Loan Repayment 960521.25 991110809690 Rosena 991110809690
173800261 18 LOAN REPAYMENT 93000000 991110852243 Genry 991110852243
167256844 111 Loan Repayment 4472520 991110827054 Shebby 991110827054
Here's my code
import pandas as pd
df1 = pd.read_excel('Book1.xlsx')
df2 = pd.read_excel('Book2.xlsx')
df1.head()
df2.head()
inner_join = pd.merge(df1,df2,left_on='BILL_CONTROL_NUMBER',right_on='Control_number')
CodePudding user response:
Change the line
inner_join = pd.merge(df1,df2,left_on='BILL_CONTROL_NUMBER',right_on='Control_number')
to this
inner_join = pd.merge(df1,df2[['Name','Control_number']],left_on='BILL_CONTROL_NUMBER',right_on='Control_number')
You can specify columns you want using [['column1','column2','etc..']]