Compare data in two excels to third excel sheet using python-CodePudding

I have 3 excels A,B and C with some data as below.

Characteristics of data in Excel A,B

Each title is unique

For each title, there is a corresponding ID. Please find sample data below.

 Excel A                                     Excel B
 > > **New_Title**   **New_ID**         **Existing_Title** **Existing_ID**
 >   Title1            W-1                     Title3              W-3
 >   Title2            W-2                     Title4              W-4

Excel C

 > > **IncidentTitle**                   
 >   Title1
     Title1
     Title1       
     Title2
     Title2      
     Title3
     Title4

Excel C can have multiple rows with same title.

I am trying to compare data in A, B to C and write ID column to excel C. So that for each row that has same Title, it gets written with respective ID. For eg: Title1 is appeared 3 times in Excel C, so it's ID: W-1 has to be written thrice in the Excel C and so on. I have tried with some approaches but they were unsuccessful. Any help in identifying where I am going wrong is appreciated.

for i in range(length):
\\Omitted some lines of code here.
    for j in range(c_len): //length of excel C.
        Title = id_map.at[j, 'IncidentTitle']   //id_map is the dataframe containing the data of Excel C
        if Title == New_Title:
            Jira_ID.insert(j, New_ID)
        elif Title == Existing_Title:
            Jira_ID.insert(j, Existing_ID)
        else:
            continue // Can be any placeholder.

CodePudding user response：

Defining the dataframes:

dfA = pd.DataFrame({
    'New_Title': ['Title1', 'Title2'],
    'New_ID': ['W-1', 'W-2'],
})

dfB = pd.DataFrame({
    'Existing_Title': ['Title3', 'Title4'],
    'Existing_ID': ['W-3', 'W-4'],
})

dfC = pd.DataFrame({
    'IncidentTitle': ['Title1', 'Title1', 'Title1', 'Title2', 'Title2', 'Title3', 'Title4']
})

First we can union the dfA and dfB by pd.concat,

dfAB = pd.concat([
    dfA.rename(columns={'New_Title': 'Title', 'New_ID': 'ID'}),
    dfB.rename(columns={'Existing_Title': 'Title', 'Existing_ID': 'ID'}),
])

print(dfAB)

    Title   ID
0  Title1  W-1
1  Title2  W-2
0  Title3  W-3
1  Title4  W-4

Then one of the pandas ways to do your job is by merge, it finds matching keys among the two dataframes dfC and dfAB, where 'IncidentTitle' is used for dfC and 'Title' is used for dfAB, and in case there is a match it merges data from dfAB to dfC (this merging direction is by using how='left')

dfC.merge(dfAB, left_on='IncidentTitle', right_on='Title', how='left').drop(columns='Title')

Final outcome:

  IncidentTitle   ID
0        Title1  W-1
1        Title1  W-1
2        Title1  W-1
3        Title2  W-2
4        Title2  W-2
5        Title3  W-3
6        Title4  W-4

Note that if you have duplicated Titles in dfAB, the outcome may not be as expected.