Home > Mobile >  Compare columns (per row) of two DataFrames in Python
Compare columns (per row) of two DataFrames in Python

Time:05-12

first of all, I'm quite new to programming overall (< 2 Months), so I'm sorry if that's an 'simple, no need to ask for help, try it yourself until you get it done' problem.

I have two data-frames with partially the same content (general overview of mobile-numbers including their cost centers in the company and monthly invoices with the affected mobile-numbers and their invoice amount).

I'd like to compare the content of the 'mobile-numbers' column of the monthly invoices DF to the content of the 'mobile-numbers' column of the general overview DF and if matching, assign the respective cost center to the mobile-number in the monthly invoices DF.

I'd love to share my code with you, but unfortunately I have absolutely zero clue how to solve that problem in any way.

Thanks

Edit: I'm from germany, I tried my best to explain the problem in english. If there is anything I messed up (so u dont get it) just tell me :)

Example of desired result

CodePudding user response:

program meets your needs, in the second dataframe I put the value '40' to demonstrate that the dataframes already filled will not be zeroed, the replacement will only occur if there is a similar value between the dataframes, if you want a better explanation about the program , comment below, and don't forget to vote and mark as solved, I also put some 'prints' for a better view, but in general they are not necessary

import pandas as pd

general_df = pd.DataFrame({"mobile_number": [1234,3456,6545,4534,9874],
                           "cost_center": ['23F','67F','32W','42W','98W']})

invoice_df = pd.DataFrame({"mobile_number": [4534,5567,1234,4871,1298],
                           "invoice_amount": ['19,99E','19,99E','19,99E','19,99E','19,99E'],
                            "cost_center": ['','','','','40']})
print(f"""GENERAL OVERVIEW DF

{general_df}

________________________________________
INVOICE DF


{invoice_df}


_________________________________________
INVOICE RESULT
""")
def func(line):
    t = 0
    for x in range(0, len(general_df['mobile_number'])):
        t = general_df.loc[general_df['mobile_number'] == line[0]]
    if t.empty:
        return line[2]
    else:
        return t.values.tolist()[0][1]

invoice_df['cost_center'] = invoice_df.apply(func, axis = 1)

print(invoice_df)


  • Related