Home > Software design >  Compare data in two excels to third excel sheet using python
Compare data in two excels to third excel sheet using python

Time:02-22

I have 3 excels A,B and C with some data as below.

Characteristics of data in Excel A,B

  1. Each title is unique

  2. For each title, there is a corresponding ID. Please find sample data below.

     Excel A                                     Excel B
     > > **New_Title**   **New_ID**         **Existing_Title** **Existing_ID**
     >   Title1            W-1                     Title3              W-3
     >   Title2            W-2                     Title4              W-4
    

    Excel C

     > > **IncidentTitle**                   
     >   Title1
         Title1
         Title1       
         Title2
         Title2      
         Title3
         Title4
    

Excel C can have multiple rows with same title.

I am trying to compare data in A, B to C and write ID column to excel C. So that for each row that has same Title, it gets written with respective ID. For eg: Title1 is appeared 3 times in Excel C, so it's ID: W-1 has to be written thrice in the Excel C and so on. I have tried with some approaches but they were unsuccessful. Any help in identifying where I am going wrong is appreciated.

for i in range(length):
\\Omitted some lines of code here.
    for j in range(c_len): //length of excel C.
        Title = id_map.at[j, 'IncidentTitle']   //id_map is the dataframe containing the data of Excel C
        if Title == New_Title:
            Jira_ID.insert(j, New_ID)
        elif Title == Existing_Title:
            Jira_ID.insert(j, Existing_ID)
        else:
            continue // Can be any placeholder.

CodePudding user response:

Defining the dataframes:

dfA = pd.DataFrame({
    'New_Title': ['Title1', 'Title2'],
    'New_ID': ['W-1', 'W-2'],
})

dfB = pd.DataFrame({
    'Existing_Title': ['Title3', 'Title4'],
    'Existing_ID': ['W-3', 'W-4'],
})

dfC = pd.DataFrame({
    'IncidentTitle': ['Title1', 'Title1', 'Title1', 'Title2', 'Title2', 'Title3', 'Title4']
})

First we can union the dfA and dfB by pd.concat,

dfAB = pd.concat([
    dfA.rename(columns={'New_Title': 'Title', 'New_ID': 'ID'}),
    dfB.rename(columns={'Existing_Title': 'Title', 'Existing_ID': 'ID'}),
])

print(dfAB)

    Title   ID
0  Title1  W-1
1  Title2  W-2
0  Title3  W-3
1  Title4  W-4

Then one of the pandas ways to do your job is by merge, it finds matching keys among the two dataframes dfC and dfAB, where 'IncidentTitle' is used for dfC and 'Title' is used for dfAB, and in case there is a match it merges data from dfAB to dfC (this merging direction is by using how='left')

dfC.merge(dfAB, left_on='IncidentTitle', right_on='Title', how='left').drop(columns='Title')

Final outcome:

  IncidentTitle   ID
0        Title1  W-1
1        Title1  W-1
2        Title1  W-1
3        Title2  W-2
4        Title2  W-2
5        Title3  W-3
6        Title4  W-4

Note that if you have duplicated Titles in dfAB, the outcome may not be as expected.

  • Related