I have two tables, on is called df_information. This table just have information about each user. Here is how it looks:
id_user | job_function | job_area | title |
---|---|---|---|
1123 | Compensation / Benefit Policy | Human Resources | personnel |
1124 | HR Systems Administration | Human Resources | personnel |
1124 | Cyber Security / Information Security | IT / Computers / Electronic | personnel |
1125 | Equipment | Manufacturing / Production / Operations | manager |
1602 | Product Development | IT / Computers / Electronics | director |
And I have second table that is called df_all_possible_job_function_job_area. This table has job_function and job_area combinations that are not included in df_information for each title.
job_function | job_area | title |
---|---|---|
ASIC / Layout Design | TwoIT / Computers / Electronics | director |
Accounting | Finance / Accounting | personnel |
Accounts Payable | Finance / Accounting | personnel |
Account Management | Customer Support / Client Services | personnel |
I want to merge df_information with df_all_possible_job_function_job_area, but the thing is each job_function, job_area combination for each title in df_information should be merged for every id_user, for example if merging two small dataframe above, something like this should be gotten in result:
id_user | job_function | job_area | title |
---|---|---|---|
1123 | Compensation / Benefit Policy | Human Resources | personnel |
1123 | Accounting | Finance / Accounting | personnel |
1123 | Accounting | Finance / Accounting | personnel |
1123 | Accounts Payable | Finance / Accounting | personnel |
1123 | Account Management | Customer Support / Client Services | personnel |
1124 | HR Systems Administration | Human Resources | personnel |
1124 | Cyber Security / Information Security | IT / Computers / Electronic | personnel |
1124 | Accounting | Finance / Accounting | personnel |
1124 | Accounting | Finance / Accounting | personnel |
1124 | Accounts Payable | Finance / Accounting | personnel |
1124 | Account Management | Customer Support / Client Services | personnel |
1125 | Equipment | Manufacturing / Production / Operations | manager |
1602 | Product Development | IT / Computers / Electronics | director |
1602 | ASIC / Layout Design | TwoIT / Computers / Electronics | director |
CodePudding user response:
not the most elegant way, but Outer join will solve some part of the question. and because your second table does not have id_user, the value will show as NaN. for that table.
merged = df_information.merge(df_all_possible_job_function_job_area, on=['job_function','job_area','title'], how="outer")
merged table will look like this:
CodePudding user response:
You would have to specify a user code in the dataframe if you wanted to do this simply through a merge. Also you want duplicates of Accounting through the merge?
If you had a user_code
column within df_all_possible_job_function_job_area
you could do:
df = pd.concat([df_information,df_all_possible_job_function_job_area],axis=0)
df = df.sort_values(by=['id_user'],axis=0)
This would get you:
id_user job_function job_area title
0 1123 Compensation / Benefit Policy Human Resources personnel
1 1123 Accounting Finance / Accounting personnel
2 1123 Accounts Payable Finance / Accounting personnel
1 1124 HR Systems Administration Human Resources personnel
2 1124 Cyber Security / Information Security IT / Computers / Electronic personnel
3 1124 Account Management Customer Support / Client Services personnel
3 1125 Equipment Manufacturing / Production / Operations manager
4 1602 Product Development IT / Computers / Electronics director
0 1602 ASIC / Layout Design TwoIT / Computers / Electronics director
Otherwise without the id_user
column, you are looking at:
0 1123.0 Compensation / Benefit Policy Human Resources personnel
1 1124.0 HR Systems Administration Human Resources personnel
2 1124.0 Cyber Security / Information Security IT / Computers / Electronic personnel
3 1125.0 Equipment Manufacturing / Production / Operations manager
4 1602.0 Product Development IT / Computers / Electronics director
0 NaN ASIC / Layout Design TwoIT / Computers / Electronics director
1 NaN Accounting Finance / Accounting personnel
2 NaN Accounts Payable Finance / Accounting personnel
3 NaN Account Management Customer Support / Client Services personnel
Because there is not common values to connect this data with.
For the duplication, you would need
1124,"Accounting","Finance / Accounting","personnel"
as a row in the first dataframe. There are then other documentation on duplicating certain rows such as: Python Pandas replicate rows in dataframe which can help you with that.