Home > front end >  Merging data frames together
Merging data frames together

Time:11-10

I have two tables, on is called df_information. This table just have information about each user. Here is how it looks:

id_user job_function job_area title
1123 Compensation / Benefit Policy Human Resources personnel
1124 HR Systems Administration Human Resources personnel
1124 Cyber Security / Information Security IT / Computers / Electronic personnel
1125 Equipment Manufacturing / Production / Operations manager
1602 Product Development IT / Computers / Electronics director

And I have second table that is called df_all_possible_job_function_job_area. This table has job_function and job_area combinations that are not included in df_information for each title.

job_function job_area title
ASIC / Layout Design TwoIT / Computers / Electronics director
Accounting Finance / Accounting personnel
Accounts Payable Finance / Accounting personnel
Account Management Customer Support / Client Services personnel

I want to merge df_information with df_all_possible_job_function_job_area, but the thing is each job_function, job_area combination for each title in df_information should be merged for every id_user, for example if merging two small dataframe above, something like this should be gotten in result:

id_user job_function job_area title
1123 Compensation / Benefit Policy Human Resources personnel
1123 Accounting Finance / Accounting personnel
1123 Accounting Finance / Accounting personnel
1123 Accounts Payable Finance / Accounting personnel
1123 Account Management Customer Support / Client Services personnel
1124 HR Systems Administration Human Resources personnel
1124 Cyber Security / Information Security IT / Computers / Electronic personnel
1124 Accounting Finance / Accounting personnel
1124 Accounting Finance / Accounting personnel
1124 Accounts Payable Finance / Accounting personnel
1124 Account Management Customer Support / Client Services personnel
1125 Equipment Manufacturing / Production / Operations manager
1602 Product Development IT / Computers / Electronics director
1602 ASIC / Layout Design TwoIT / Computers / Electronics director

CodePudding user response:

not the most elegant way, but Outer join will solve some part of the question. and because your second table does not have id_user, the value will show as NaN. for that table.

merged = df_information.merge(df_all_possible_job_function_job_area, on=['job_function','job_area','title'], how="outer")

merged table will look like this: enter image description here

CodePudding user response:

You would have to specify a user code in the dataframe if you wanted to do this simply through a merge. Also you want duplicates of Accounting through the merge?

If you had a user_code column within df_all_possible_job_function_job_area you could do:

df = pd.concat([df_information,df_all_possible_job_function_job_area],axis=0)
df = df.sort_values(by=['id_user'],axis=0)

This would get you:

   id_user                           job_function                                 job_area      title
0     1123          Compensation / Benefit Policy                          Human Resources  personnel
1     1123                             Accounting                     Finance / Accounting  personnel
2     1123                       Accounts Payable                     Finance / Accounting  personnel
1     1124              HR Systems Administration                          Human Resources  personnel
2     1124  Cyber Security / Information Security              IT / Computers / Electronic  personnel
3     1124                     Account Management       Customer Support / Client Services  personnel
3     1125                              Equipment  Manufacturing / Production / Operations    manager
4     1602                    Product Development             IT / Computers / Electronics   director
0     1602                   ASIC / Layout Design          TwoIT / Computers / Electronics   director

Otherwise without the id_user column, you are looking at:

0   1123.0          Compensation / Benefit Policy                          Human Resources  personnel
1   1124.0              HR Systems Administration                          Human Resources  personnel
2   1124.0  Cyber Security / Information Security              IT / Computers / Electronic  personnel
3   1125.0                              Equipment  Manufacturing / Production / Operations    manager
4   1602.0                    Product Development             IT / Computers / Electronics   director
0      NaN                   ASIC / Layout Design          TwoIT / Computers / Electronics   director
1      NaN                             Accounting                     Finance / Accounting  personnel
2      NaN                       Accounts Payable                     Finance / Accounting  personnel
3      NaN                     Account Management       Customer Support / Client Services  personnel

Because there is not common values to connect this data with.

For the duplication, you would need 1124,"Accounting","Finance / Accounting","personnel" as a row in the first dataframe. There are then other documentation on duplicating certain rows such as: Python Pandas replicate rows in dataframe which can help you with that.

  • Related