Home > database >  Compare two dictionary and filter
Compare two dictionary and filter

Time:01-20

I have dictionary1 that contains

{'A': Timestamp('2022-05-23 00:00:00'), 'L': Timestamp('2017-06-21 00:00:00'), 'S': Timestamp('2021-11-02 00:00:00'), 'D': Timestamp('2021-11-08 00:00:00')}

Then I have another dictionary2 that looks like

{'A': [Timestamp('2022-01-16 00:00:00'),
  Timestamp('2022-01-13 00:00:00'),
  Timestamp('2022-01-12 00:00:00'),
  Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
  Timestamp('2023-01-13 00:00:00'),
  Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2021-01-16 00:00:00'),
  Timestamp('2022-01-13 00:00:00'),
  Timestamp('2023-01-12 00:00:00')],
 'D': [Timestamp('2023-01-16 00:00:00'),
  Timestamp('2022-10-18 00:00:00')]}

I would like to have for each A, L, S, D only those dates that are GREATER than those dates in dictionary1

So my desired output would be

{'A': [Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
  Timestamp('2023-01-13 00:00:00'),
  Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2022-01-13 00:00:00'),
  Timestamp('2023-01-12 00:00:00')],
 'D': [Timestamp('2023-01-16 00:00:00'),
  Timestamp('2022-10-18 00:00:00')]}

CodePudding user response:

Given your two data sources you might use a comprehension to create a new list based on the criteria:

import datetime

Timestamp = lambda s: datetime.datetime.strptime(s, "%Y-%m-%d  %H:%M:%S")

lookup = {
    'A': Timestamp('2022-05-23 00:00:00'),
    'L': Timestamp('2017-06-21 00:00:00'),
    'S': Timestamp('2021-11-02 00:00:00'),
    'D': Timestamp('2021-11-08 00:00:00')
}

data_in = {
    'A': [
        Timestamp('2023-01-10 00:00:00')
    ],
    'L': [
        Timestamp('2023-01-16 00:00:00'),
        Timestamp('2023-01-13 00:00:00'),
        Timestamp('2023-01-12 00:00:00')
    ],
    'S': [
        Timestamp('2022-01-13 00:00:00'),
        Timestamp('2023-01-12 00:00:00')
    ],
    'D': [
        Timestamp('2023-01-16 00:00:00'),
        Timestamp('2022-10-18 00:00:00')
    ]
}

data_out = [
    {key: [v for v in value if v > lookup[key]]}
    for key, value
    in data_in.items()
]

print(data_out)

CodePudding user response:

I don't know what Timestamp is, but if it has a function that returns the dates as a string (or any other data structure with a > defined), you can do

# This is some class that knows its stamp value (the "date")
class Timestamp:                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                          
    def __init__(self, value):                                                                                                                                                                                                                                                                                            
        self._value = value                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                          
    @property                                                                                                                                                                                                                                                                                                             
    def value(self):                                                                                                                                                                                                                                                                                                      
        return self._value                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                          
# This is your reference dict.                                                                                                                                                                                                                                                                                                                         
d1 = {                                                                                                                                                                                                                                                                                                                    
    'A': Timestamp('2022-05-23 00:00:00'),                                                                                                                                                                                                                                                                                
    'L': Timestamp('2017-06-21 00:00:00'),                                                                                                                                                                                                                                                                                
    'S': Timestamp('2021-11-02 00:00:00'),                                                                                                                                                                                                                                                                                
    'D': Timestamp('2021-11-08 00:00:00')                                                                                                                                                                                                                                                                                 
}

# This is the data you want to clean.                                                                                                                                                                                                                                                                                                                        
d2 = {                                                                                                                                                                                                                                                                                                                    
    'A': [                                                                                                                                                                                                                                                                                                                
        Timestamp('2022-01-16 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2022-01-13 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2022-01-12 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2023-01-10 00:00:00')                                                                                                                                                                                                                                                                                  
    ],                                                                                                                                                                                                                                                                                                                    
    'L': [                                                                                                                                                                                                                                                                                                                
        Timestamp('2023-01-16 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2023-01-13 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2023-01-12 00:00:00')                                                                                                                                                                                                                                                                                  
    ],                                                                                                                                                                                                                                                                                                                    
    'S': [                                                                                                                                                                                                                                                                                                                
        Timestamp('2021-01-16 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2022-01-13 00:00:00'),                                                                                                                                                                                                                                                                                 
        Timestamp('2023-01-12 00:00:00')                                                                                                                                                                                                                                                                                  
    ],                                                                                                                                                                                                                                                                                                                    
    'D': [Timestamp('2023-01-16 00:00:00'),                                                                                                                                                                                                                                                                               
          Timestamp('2022-10-18 00:00:00')]                                                                                                                                                                                                                                                                               
}                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                          
# This is the new dict you want.                                                                                                                                                                                                                                                                                          
d3 = {                                                                                                                                                                                                                                                                                                                    
    key: [stamp for stamp in stamplist if stamp.value > d1[key].value]                                                                                                                                                                                                                                                    
    for (key, stamplist) in d2.items()                                                                                                                                                                                                                                                                                    
}                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                          
# Check it:                                                                                                                                                                                                                                                                                                               
for key, stamplist in d3.items():                                                                                                                                                                                                                                                                                         
    for stamp in stamplist:                                                                                                                                                                                                                                                                                               
        print(stamp.value) 

CodePudding user response:

With , one way is to use pandas.Series constructor with a dict/listcomp :

from pandas import Timestamp

s1 = pd.Series(dictionary1)
s2 = pd.Series(dictionary2)
​
out = {k: [v for v in s2[k] if k > s1[k]] for k in s2.index}

Output : ​

   {'A': [Timestamp('2023-01-10 00:00:00')],
     'L': [Timestamp('2023-01-16 00:00:00'),
      Timestamp('2023-01-13 00:00:00'),
      Timestamp('2023-01-12 00:00:00')],
     'S': [Timestamp('2022-01-13 00:00:00'), Timestamp('2023-01-12 00:00:00')],
     'D': [Timestamp('2023-01-16 00:00:00'), Timestamp('2022-10-18 00:00:00')]}
  • Related