Home > front end >  How to get dynamically subset of data frame by list comprehension?
How to get dynamically subset of data frame by list comprehension?

Time:10-21

My intend

I like to filter a data frame dynamically by a dict that is holding a list of dicts with column and value information to filter on.

Expected result should look like this:

result = {'brand': ['Honda','Honda','Toyota'],
              'model': ['a','b','d']
            }

My data frame

cars = {'brand': ['Honda','Honda','Honda','Toyota','Toyota','Audi'],
              'model': ['a','b','c','d','a','b'],
            }

My dict

Is holding information about relation between cars:

relation = {
    'relation1':[
        {'brand':'Honda','model':'a'},
        {'brand':'Honda','model':'b'},
        {'brand':'Toyota','model':'d'}
    ],
    'relation2':[
        {'brand':'...','model':'...'},
        {'brand':'...','model':'...'}
    ]
}

My try to setup list comprehension

' | '.join([f'{repr(k)}=={repr(v)}' for d in relation['relation1'] for k,v in d.items()])

Issues

  1. Implement filter will give an error:

    KeyError: "brand=='Honda' | model=='a' | brand=='Honda' | model=='b' | brand=='Toyota' | model=='d'"
    
  2. Adjusting filter to combine the dict values inside with & and outside with | so filter looks like:

    (brand=='Honda' & model=='a') | (brand=='Honda' & model=='b') | (brand=='Toyota' & model=='d')
    

CodePudding user response:

IIUC you can simply do a merge:

print (df_cars.merge(pd.DataFrame(relation["relation1"])))

   brand model
0  Honda     a
1  Honda     b

CodePudding user response:

Use nested list comprehension:

q = ' | '.join([f"({' & '.join(f'{k}=={repr(v)}' for k,v in d.items())})" 
                for d in relation['relation1']])
print (q)
(brand=='Honda' & model=='a') | 
(brand=='Honda' & model=='b') | 
(brand=='Toyota' & model=='b')


df = df.query(q)
print (df)
    brand model
0   Honda     a
1   Honda     b
3  Toyota     d
    
  • Related