Dataframe1
DF1:
H Code Parent Days Product Facility Area A B C D E F G H I J K
D 620LAKEV Samsung 0 Phone 0 0 0 0 0 0 0 0 0
D 620LAKEV Samsung 13 Phone 6916.1 0 6916.1 6916.1 0 6916.1 6916.1 0 6916.1
D 620LAKEV Samsung 14 Phone 6916.1 0 6916.1 6916.1 0 6916.1 6916.1 0 6916.1
D 620LAKEV Samsung 92 Phone 20597.91 0 20597.91 20597.91 0 20597.91 20597.91 0 20597.91
D 620LAKEV Samsung 184 Phone 25164.01 0 25164.01 25164.01 0 25164.01 25164.01 0 25164.01
D 620LAKEV Samsung 365 Phone 30276.4 0 30276.4 30276.4 0 30276.4 30276.4 0 30276.4
D 620LAKEV Samsung 1096 Phone 29975.58 0 29975.58 29975.58 0 29975.58 29975.58 0 29975.58
D 1011BC Apple 0 Phone 0 0 0 0 0 0 0 0 0
D 1011BC Apple 13 Phone 0 0 0 0 0 0 0 0 0
D 1011BC Apple 14 Phone 0 0 0 0 0 0 0 0 0
D 1011BC Apple 92 Phone 0 0 0 0 0 0 0 0 0
D 1011BC Apple 184 Phone 0 0 0 0 0 0 0 0 0
D 1011BC Apple 365 Phone 611318.5 0 611318.5 611318.5 0 611318.5 611318.5 0 611318.5
D 1011BC Apple 1096 Phone 2341752.75 0 2341752.75 2341752.75 0 2341752.75 2341752.75 0 2341752.75
D 1011BC Apple 0 Phone 1607774 0 0 0 0 0 0 0 0 0
D 1011BC Apple 13 Phone 1607774 0 0 0 0 0 0 0 0 0
D 1011BC Apple 14 Phone 1607774 0 0 0 0 0 0 0 0 0
D 1011BC Apple 92 Phone 1607774 0 0 0 0 0 0 0 0 0
D 1011BC Apple 184 Phone 1607774 0 0 0 0 0 0 0 0 0
D 1011BC Apple 365 Phone 1607774 611318.5 0 611318.5 611318.5 0 611318.5 611318.5 0 611318.5
D 1011BC Apple 1096 Phone 1607774 2341752.75 0 2341752.75 2341752.75 0 2341752.75 2341752.75 0 2341752.75
D 1012SAS Apple 0 Phone 1572904 0 0 0 0 0 0 0 0 0
D 1012SAS Apple 13 Phone 1572904 0 0 0 0 0 0 0 0 0
D 1012SAS Apple 14 Phone 1572904 0 0 0 0 0 0 0 0 0
D 1012SAS Apple 92 Phone 1572904 34894.82 0 34894.82 34894.82 0 34894.82 34894.82 0 34894.82
D 1012SAS Apple 184 Phone 1572904 87095.26 0 87095.26 87095.26 0 87095.26 87095.26 0 87095.26
D 1012SAS Apple 365 Phone 1572904 128485.89 0 128485.89 128485.89 0 128485.89 128485.89 0 128485.89
D 1012SAS Apple 1096 Phone 1572904 141775.97 0 141775.97 141775.97 0 141775.97 141775.97 0 141775.97
Dataframe2:
DF2:
H Code Parent Days Product Facility Area
D 4AAK27778 Samsung 0 Phone
D 4AAK27778 Samsung 13 Phone
D 4AAK27778 Samsung 14 Phone
D 4AAK27778 Samsung 92 Phone
D 4AAK27778 Samsung 184 Phone
D 4AAK27778 Samsung 365 Phone
D 4AAK27778 Samsung 1096 Phone
D 3G251866 Samsung 0 Phone
D 3G251866 Samsung 13 Phone
D 3G251866 Samsung 14 Phone
D 3G251866 Samsung 92 Phone
D 3G251866 Samsung 184 Phone
D 3G251866 Samsung 365 Phone
D 3G251866 Samsung 1096 Phone
D 3W809327 Samsung 0 Phone
D 3W809327 Samsung 13 Phone
D 3W809327 Samsung 14 Phone
D 3W809327 Samsung 92 Phone
D 3W809327 Samsung 184 Phone
D 3W809327 Samsung 365 Phone
D 3W809327 Samsung 1096 Phone
D 620LAKEV Samsung 0 Phone
D 620LAKEV Samsung 13 Phone
D 620LAKEV Samsung 14 Phone
D 620LAKEV Samsung 92 Phone
D 620LAKEV Samsung 184 Phone
D 620LAKEV Samsung 365 Phone
D 620LAKEV Samsung 1096 Phone
Expected Output:
H Code Parent Days Product Facility Area A B C D E F G H I J K
D 620LAKEV Samsung 0 Phone 0 0 0 0 0 0 0 0 0
D 620LAKEV Samsung 13 Phone 6916.1 0 6916.1 6916.1 0 6916.1 6916.1 0 6916.1
D 620LAKEV Samsung 14 Phone 6916.1 0 6916.1 6916.1 0 6916.1 6916.1 0 6916.1
D 620LAKEV Samsung 92 Phone 20597.91 0 20597.91 20597.91 0 20597.91 20597.91 0 20597.91
D 620LAKEV Samsung 184 Phone 25164.01 0 25164.01 25164.01 0 25164.01 25164.01 0 25164.01
D 620LAKEV Samsung 365 Phone 30276.4 0 30276.4 30276.4 0 30276.4 30276.4 0 30276.4
D 620LAKEV Samsung 1096 Phone 29975.58 0 29975.58 29975.58 0 29975.58 29975.58 0 29975.58
[![enter image description here][1]][1]
[![enter image description here][2]][2]
[![enter image description here][3]][3]
Dataframe1 [1]: https://i.stack.imgur.com/PPjsM.png
Dataframe2: [2]: https://i.stack.imgur.com/8OQxl.png
Expected Output: [3]: https://i.stack.imgur.com/KPwrh.png
In Dataframe1, Column AREA, J and K are empty, while in datafarme2, Column Facility and Area are empty
I want to filter Dataframe1 based on the values in Dataframe2, I tried the below code but it did not work, I am not sure what is wrong, can you please help me to filter the dataframe1 data.
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df1.set_index(keys).index
filtered_df=df1[~i1.isin(i2)]
print(filtered_df)
CodePudding user response:
You can merge
on the non empty columns:
# select non-empty columns from df2
cols2 = list(df2.dropna(how='all', axis=1))
# remove df2's empty columns from df1
cols1 = df1.columns.difference(df2.columns.difference(cols2))
out = df2.merge(df1[cols1], on=cols2).to_string()
output:
H Code Parent Days Product Facility Area A B C D E F G H.1 I J K
0 D 620LAKEV Samsung 0 Phone NaN NaN 0.00 0.00 0.0 0.00 0.00 0.0 0.00 NaN NaN NaN NaN
1 D 620LAKEV Samsung 13 Phone NaN NaN 6916.10 6916.10 0.0 6916.10 6916.10 0.0 6916.10 NaN NaN NaN NaN
2 D 620LAKEV Samsung 14 Phone NaN NaN 6916.10 6916.10 0.0 6916.10 6916.10 0.0 6916.10 NaN NaN NaN NaN
3 D 620LAKEV Samsung 92 Phone NaN NaN 20597.91 20597.91 0.0 20597.91 20597.91 0.0 20597.91 NaN NaN NaN NaN
4 D 620LAKEV Samsung 184 Phone NaN NaN 25164.01 25164.01 0.0 25164.01 25164.01 0.0 25164.01 NaN NaN NaN NaN
5 D 620LAKEV Samsung 365 Phone NaN NaN 30276.40 30276.40 0.0 30276.40 30276.40 0.0 30276.40 NaN NaN NaN NaN
6 D 620LAKEV Samsung 1096 Phone NaN NaN 29975.58 29975.58 0.0 29975.58 29975.58 0.0 29975.58 NaN NaN NaN NaN