I have 2 data frame , Need to fetch system_type column based on "Name" column of both dataframe .
I have 500000 rows of df1 as format
Name Timestamp usage AXCS 2022-01-01 5 BGXD 2022-02-01 70 HFSD 2022-03-01 45 AEVC 2022-01-01 25 BHRF 2022-02-01 12
and 550000 rows of df2 as
Name System_Type HFSD Dev BHRF Test BGXD Prod AEVC Prod AXCS Test
I used the following coding
pd.merge(df1, df2, on="Name")
It is taking much time to process , Is there another way/method to process it . Please advise
CodePudding user response:
You can do like this :
import pandas as pd
df1 = pd.DataFrame({
'System_Name':['AXCS','BGXD','HFSD','AEVC', 'BHRF'],
'Timestamp':['2022-01-01','2022-02-01','2022-03-01','2022-01-01', '2022-01-01'],
'usuage ':[5,70,45,25,12],
})
df2 = pd.DataFrame({
'System_Name':['HFSD','BHRF','BGXD','AEVC', 'AXCS'],
'System_Type':['Dev','Test','Prod','Prod', 'Test'],
})
# Get all diferent values
df3 = pd.merge(df1, df2, how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
# If you like to filter by a System_Name
df3 = pd.merge(df1, df2, on="System_Name", how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
print(df3)
#output :
System_Name Timestamp usuage System_Type Exist
0 AXCS 2022-01-01 5 Test both
1 BGXD 2022-02-01 70 Prod both
2 HFSD 2022-03-01 45 Dev both
3 AEVC 2022-01-01 25 Prod both
4 BHRF 2022-01-01 12 Test both
CodePudding user response:
You can use df2
as a dict mapping:
df1['System_Type'] = df1['Name'].map(df2.set_index('Name')['System_Type'])
print(df1)
# Output
Name Timestamp usage System_Type
0 AXCS 2022-01-01 5 Test
1 BGXD 2022-02-01 70 Prod
2 HFSD 2022-03-01 45 Dev
3 AEVC 2022-01-01 25 Prod
4 BHRF 2022-02-01 12 Test