Remove rows from dataframes not present in dataframes in dictionary-CodePudding

I have a hundreds pandas dataframes such as these:

Df1 = 
    MD    A    B    C  
    r1    6    3    9   
    r2    2    1    1  
    r3    5    7    2  
    r4    8    2    0

Df2 = 
    MD    A    B    C  
    r1    1    7    1  
    r2    6    3    0  
    r3    3    1    8
    r4    1    7    4

I also have one dictionary like this (except with a number of keys and values to reflect the number of dataframes):

Dict = {'D1', MD
              r1
              r4
        [2 rows x 1 columns]
        'D2', MD
              r1
              r2
              r3
        [3 rows x 1 columns]}

What I need is a way to go through the dictionary, and for each dataframe and corresponding key (I.e. Df1 and D1, and Df2 and D2), remove the rows not present in the "dictionary-value dataframe".

So for Df1, I need to go to the key D1 and remove the rows that are not in the dataframe for this key. That is, remove r2 and r3 from Df1.

So the result I want would be:

Df1 = 
    MD    A    B    C  
    r1    6    3    9   
    r4    8    2    0

Df2 = 
    MD    A    B    C  
    r1    1    7    1  
    r2    6    3    0  
    r3    3    1    8

Any help is greatly appreciated!

PS. If it is easier to keep the rows instead of removing them, that would also be helpful!

CodePudding user response：

Here I just print the values, but you could reassign the DataFrame.

>>> df1 = pd.DataFrame.from_dict({"r1": [1,2,3], "r2": [4,5,6], "r3": [7,8,9]}, columns=["A", "B", "C"], orient="index")
>>> df2 = pd.DataFrame.from_dict({"r1": [1,2,3], "r2": [4,5,6], "r3": [7,8,9]}, columns=["A", "B", "C"], orient="index")

>>> dfs = [df1, df2]
>>> dfs

[    A  B  C
 r1  1  2  3
 r2  4  5  6
 r3  7  8  9,
     A  B  C
 r1  1  2  3
 r2  4  5  6
 r3  7  8  9]

lookup = {"d1": ["r1", 'r2'], 'd2': ["r2", "r3"]}

for rows, df in zip(lookup.values(), dfs):
    print(df.loc[rows])

    A  B  C
r1  1  2  3
r2  4  5  6
    A  B  C
r2  4  5  6
r3  7  8  9

CodePudding user response：

Your dictionary is a bit weird, so I edited it to dct below. Basically each key has a corresponding dataframe. Then you can zip dct.values() with the list of dictionaries and index out your relevant dataframes.

dct = {'D1': {'MD': ['r1', 'r4']},
       'D2': {'MD': ['r1', 'r2', 'r3']}
      }
df1_new, df2_new = [df[df['MD'].isin(v['MD'])] for v, df in zip(dct.values(),[df1,df2])]

Output:

   MD  A  B  C
 0  r1  6  3  9
 3  r4  8  2  0

    MD  A  B  C
 0  r1  1  7  1
 1  r2  6  3  0
 2  r3  3  1  8