Home > Software engineering >  How to filter a dataframe based on the values of another dataframe in python
How to filter a dataframe based on the values of another dataframe in python

Time:08-21

Dataframe1

DF1:

H   Code     Parent  Days   Product Facility    Area    A   B   C   D   E   F   G   H   I   J   K
D   620LAKEV Samsung 0      Phone                       0   0   0   0   0   0   0   0   0       
D   620LAKEV Samsung 13     Phone                       6916.1  0   6916.1  6916.1  0   6916.1  6916.1  0   6916.1      
D   620LAKEV Samsung 14     Phone                       6916.1  0   6916.1  6916.1  0   6916.1  6916.1  0   6916.1      
D   620LAKEV Samsung 92     Phone                       20597.91    0   20597.91    20597.91    0   20597.91    20597.91    0   20597.91        
D   620LAKEV Samsung 184    Phone                       25164.01    0   25164.01    25164.01    0   25164.01    25164.01    0   25164.01        
D   620LAKEV Samsung 365    Phone                       30276.4 0   30276.4 30276.4 0   30276.4 30276.4 0   30276.4     
D   620LAKEV Samsung 1096   Phone                       29975.58    0   29975.58    29975.58    0   29975.58    29975.58    0   29975.58        
D   1011BC  Apple    0      Phone                       0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    13     Phone                       0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    14     Phone                       0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    92     Phone                       0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    184    Phone                       0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    365    Phone                       611318.5    0   611318.5    611318.5    0   611318.5    611318.5    0   611318.5        
D   1011BC  Apple    1096   Phone                       2341752.75  0   2341752.75  2341752.75  0   2341752.75  2341752.75  0   2341752.75      
D   1011BC  Apple    0      Phone   1607774             0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    13     Phone   1607774             0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    14     Phone   1607774             0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    92     Phone   1607774             0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    184    Phone   1607774             0   0   0   0   0   0   0   0   0       
D   1011BC  Apple    365    Phone   1607774             611318.5    0   611318.5    611318.5    0   611318.5    611318.5    0   611318.5        
D   1011BC  Apple    1096   Phone   1607774             2341752.75  0   2341752.75  2341752.75  0   2341752.75  2341752.75  0   2341752.75      
D   1012SAS Apple    0      Phone   1572904             0   0   0   0   0   0   0   0   0       
D   1012SAS Apple    13     Phone   1572904             0   0   0   0   0   0   0   0   0       
D   1012SAS Apple    14     Phone   1572904             0   0   0   0   0   0   0   0   0       
D   1012SAS Apple    92     Phone   1572904             34894.82    0   34894.82    34894.82    0   34894.82    34894.82    0   34894.82        
D   1012SAS Apple    184    Phone   1572904             87095.26    0   87095.26    87095.26    0   87095.26    87095.26    0   87095.26        
D   1012SAS Apple    365    Phone   1572904             128485.89   0   128485.89   128485.89   0   128485.89   128485.89   0   128485.89       
D   1012SAS Apple    1096   Phone   1572904             141775.97   0   141775.97   141775.97   0   141775.97   141775.97   0   141775.97   

Dataframe2:

DF2:

H   Code        Parent  Days    Product Facility    Area
D   4AAK27778   Samsung 0       Phone       
D   4AAK27778   Samsung 13      Phone       
D   4AAK27778   Samsung 14      Phone       
D   4AAK27778   Samsung 92      Phone       
D   4AAK27778   Samsung 184     Phone       
D   4AAK27778   Samsung 365     Phone       
D   4AAK27778   Samsung 1096    Phone       
D   3G251866    Samsung 0       Phone       
D   3G251866    Samsung 13      Phone       
D   3G251866    Samsung 14      Phone       
D   3G251866    Samsung 92      Phone       
D   3G251866    Samsung 184     Phone       
D   3G251866    Samsung 365     Phone       
D   3G251866    Samsung 1096    Phone       
D   3W809327    Samsung 0       Phone       
D   3W809327    Samsung 13      Phone       
D   3W809327    Samsung 14      Phone       
D   3W809327    Samsung 92      Phone       
D   3W809327    Samsung 184     Phone       
D   3W809327    Samsung 365     Phone       
D   3W809327    Samsung 1096    Phone       
D   620LAKEV    Samsung 0       Phone       
D   620LAKEV    Samsung 13      Phone       
D   620LAKEV    Samsung 14      Phone       
D   620LAKEV    Samsung 92      Phone       
D   620LAKEV    Samsung 184     Phone       
D   620LAKEV    Samsung 365     Phone       
D   620LAKEV    Samsung 1096    Phone

    

Expected Output:

H   Code        Parent  Days    Product Facility    Area    A   B   C   D   E   F   G   H   I   J   K
D   620LAKEV    Samsung 0       Phone                       0   0   0   0   0   0   0   0   0       
D   620LAKEV    Samsung 13      Phone                       6916.1  0   6916.1  6916.1  0   6916.1  6916.1  0   6916.1      
D   620LAKEV    Samsung 14      Phone                       6916.1  0   6916.1  6916.1  0   6916.1  6916.1  0   6916.1      
D   620LAKEV    Samsung 92      Phone                       20597.91    0   20597.91    20597.91    0   20597.91    20597.91    0   20597.91        
D   620LAKEV    Samsung 184     Phone                       25164.01    0   25164.01    25164.01    0   25164.01    25164.01    0   25164.01        
D   620LAKEV    Samsung 365     Phone                       30276.4 0   30276.4 30276.4 0   30276.4 30276.4 0   30276.4     
D   620LAKEV    Samsung 1096    Phone                       29975.58    0   29975.58    29975.58    0   29975.58    29975.58    0   29975.58        

[![enter image description here][1]][1]

[![enter image description here][2]][2]

[![enter image description here][3]][3]

Dataframe1 [1]: https://i.stack.imgur.com/PPjsM.png

Dataframe2: [2]: https://i.stack.imgur.com/8OQxl.png

Expected Output: [3]: https://i.stack.imgur.com/KPwrh.png

In Dataframe1, Column AREA, J and K are empty, while in datafarme2, Column Facility and Area are empty

I want to filter Dataframe1 based on the values in Dataframe2, I tried the below code but it did not work, I am not sure what is wrong, can you please help me to filter the dataframe1 data.

keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df1.set_index(keys).index
filtered_df=df1[~i1.isin(i2)]
print(filtered_df)

CodePudding user response:

You can merge on the non empty columns:

# select non-empty columns from df2
cols2 = list(df2.dropna(how='all', axis=1))

# remove df2's empty columns from df1
cols1 = df1.columns.difference(df2.columns.difference(cols2))

out = df2.merge(df1[cols1], on=cols2).to_string()

output:

   H      Code   Parent  Days Product  Facility  Area         A         B    C         D         E    F         G  H.1   I   J   K
0  D  620LAKEV  Samsung     0   Phone       NaN   NaN      0.00      0.00  0.0      0.00      0.00  0.0      0.00  NaN NaN NaN NaN
1  D  620LAKEV  Samsung    13   Phone       NaN   NaN   6916.10   6916.10  0.0   6916.10   6916.10  0.0   6916.10  NaN NaN NaN NaN
2  D  620LAKEV  Samsung    14   Phone       NaN   NaN   6916.10   6916.10  0.0   6916.10   6916.10  0.0   6916.10  NaN NaN NaN NaN
3  D  620LAKEV  Samsung    92   Phone       NaN   NaN  20597.91  20597.91  0.0  20597.91  20597.91  0.0  20597.91  NaN NaN NaN NaN
4  D  620LAKEV  Samsung   184   Phone       NaN   NaN  25164.01  25164.01  0.0  25164.01  25164.01  0.0  25164.01  NaN NaN NaN NaN
5  D  620LAKEV  Samsung   365   Phone       NaN   NaN  30276.40  30276.40  0.0  30276.40  30276.40  0.0  30276.40  NaN NaN NaN NaN
6  D  620LAKEV  Samsung  1096   Phone       NaN   NaN  29975.58  29975.58  0.0  29975.58  29975.58  0.0  29975.58  NaN NaN NaN NaN
  • Related