Problem Statment:
I have two tables with sample inputs below:
Baseline_Cars:
Name | Fname | FW_Base | Seq | FP_Base | FW_Prop | FP_Prop | isChanged | changeType |
---|---|---|---|---|---|---|---|---|
Audi | A1 | 2 | 0 | 2 | 0 | 0 | 0 | "" |
Audi | A2 | 3 | 0 | 3 | 0 | 0 | 0 | "" |
Audi | A3 | 4 | 0 | 4 | 0 | 0 | 0 | "" |
BMW | X1 | 5 | 0 | 5 | 0 | 0 | 0 | "" |
BMW | X2 | 6 | 0 | 6 | 0 | 0 | 0 | "" |
Merc | M4 | 7 | 0 | 7 | 0 | 0 | 0 | "" |
Merc | M5 | 8 | 0 | 8 | 0 | 0 | 0 | "" |
Proposed_Cars:
1 | 2 | 3 | 4(FW_Base) | 5(FW_Prop) | 6(FP_Base) | 7(FP_Prop) | 8(isChanged) | 9(changeType) |
---|---|---|---|---|---|---|---|---|
144 | Audi | A1 | 2 | 1 | 1 | 1 | 1 | W |
144 | Audi | A2 | 3 | 3 | 3 | 1 | 1 | P |
144 | BMW | X1 | 5 | 3 | 3 | 3 | 1 | W |
144 | BMW | X2 | 6 | 4 | 4 | 4 | 1 | W |
Expected Solution:
1 | 2 | 3 | 4(FW_Base) | Seq | 5(FW_Prop) | 6(FP_Base) | 7(FP_Prop) | 8(isChanged) | 9(changeType) |
---|---|---|---|---|---|---|---|---|---|
144 | Audi | A1 | 2 | 0 | 1 | 1 | 1 | 1 | W |
144 | Audi | A2 | 3 | 0 | 3 | 3 | 1 | 1 | P |
144 | Audi | A3 | 4 | 0 | 0 | 4 | 0 | 0 | NULL |
144 | BMW | X1 | 5 | 0 | 3 | 3 | 3 | 1 | W |
144 | BMW | X2 | 6 | 0 | 4 | 4 | 4 | 1 | W |
Explanation of the expected solution:
Baseline cars contain names of car brands with Fnames, baseline weights(FW_Base),baseline Parameters(FP_Base).
Proposed cars contain names of cars with Proposed new weights(FW_PROP) and new Parameters(FP_PROP).
By default the FW_PROP, FP_PROP, isChanged are 0 in Baseline cars and changeType column is empty string.
isChanged=1 in Proposed cars tells that new weights or parameters have been suggested(if changeType = W, new weight change or changeType=P, new Parameter change)
I wish to merge the tables on the condition that if the Fname in Proposed cars exists in Baseline cars, we substitute the respective columns in Baseline_Cars, along with imputing NULL for changeType incase no change were made. (e.g Audi A3 row)
Snippet code to generate the above sample input data:
baseline_cars = pd.DataFrame(columns=['Name','Fname','FW_Base','Seq','FP_Base','FW_Prop','FP_Prop','isChanged','ChangeType'])
proposed_cars = pd.DataFrame(columns=['1','2','3','4(FW_Base)','5(FW_Prop)','6(FP_Base)','7(FP_Prop)','8(isChanged)','9(ChangeType)'])
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A1','FW_Base':2,'Seq':0,'FP_Base':2,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A2','FW_Base':3,'Seq':0,'FP_Base':3,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A3','FW_Base':4,'Seq':0,'FP_Base':4,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'BMW','Fname':'X1','FW_Base':5,'Seq':0,'FP_Base':5,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'BMW','Fname':'X2','FW_Base':6,'Seq':0,'FP_Base':6,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Merc','Fname':'M4','FW_Base':7,'Seq':0,'FP_Base':7,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Merc','Fname':'M5','FW_Base':8,'Seq':0,'FP_Base':8,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'Audi','3':'A1','4(FW_Base)':2,'5(FW_Prop)':1,'6(FP_Base)':1,'7(FP_Prop)':1,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'Audi','3':'A2','4(FW_Base)':3,'5(FW_Prop)':3,'6(FP_Base)':3,'7(FP_Prop)':1,'8(isChanged)':1,'9(ChangeType)':"P"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'BMW','3':'X1','4(FW_Base)':5,'5(FW_Prop)':3,'6(FP_Base)':3,'7(FP_Prop)':3,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'BMW','3':'X2','4(FW_Base)':6,'5(FW_Prop)':4,'6(FP_Base)':4,'7(FP_Prop)':4,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
CodePudding user response:
If I understand what you are asking, there is no need to try and complete this in a one-liner. This gives you a lot more control over what happens and how to modify specific changes in the future.
You can start by just merging the baseline cars to the proposed cars.
df = pd.merge(baseline_cars, proposed_cars, how = 'left', right_on = ['2', '3'], left_on= ['Name','Fname'])
Now, this dataframe is the joined table you are after with a few differences, the columns from both are in the join, so you can equate table rows to keep consistency
df['4(FW_Base)'] = df['FW_Base']
df['5(FW_Prop)'] = df['FW_Prop']
df['6(FP_Base)'] = df['FP_Base']
df['7(FP_Prop)'] = df['FP_Prop']
Then fill the empty rows in isChanged
and changeType
.
df['8(isChanged)'].fillna(0, inplace = True)
df['9(ChangeType)'].fillna('NULL', inplace = True)
And then the output is jsut the selected columns that you have in your output
df = df[['1','Name','Fname','4(FW_Base)','Seq','5(FW_Prop)','6(FP_Base)','7(FP_Prop)','8(isChanged)','9(ChangeType)']]
Note, in your solution you have Column 1
, which is always 144
in your sample set, but there is no explanation for what it is or where the Audi, BMW, or Merc that aren't in the proposed subset would have got 144 from as its output as well since it is not in the baseline, therefore, I left it unchanged.