I have dictionary1 that contains
{'A': Timestamp('2022-05-23 00:00:00'), 'L': Timestamp('2017-06-21 00:00:00'), 'S': Timestamp('2021-11-02 00:00:00'), 'D': Timestamp('2021-11-08 00:00:00')}
Then I have another dictionary2 that looks like
{'A': [Timestamp('2022-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2022-01-12 00:00:00'),
Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2021-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]}
I would like to have for each A, L, S, D
only those dates that are GREATER than those dates in dictionary1
So my desired output would be
{'A': [Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]}
CodePudding user response:
Given your two data sources you might use a comprehension to create a new list based on the criteria:
import datetime
Timestamp = lambda s: datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
lookup = {
'A': Timestamp('2022-05-23 00:00:00'),
'L': Timestamp('2017-06-21 00:00:00'),
'S': Timestamp('2021-11-02 00:00:00'),
'D': Timestamp('2021-11-08 00:00:00')
}
data_in = {
'A': [
Timestamp('2023-01-10 00:00:00')
],
'L': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'S': [
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'D': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')
]
}
data_out = [
{key: [v for v in value if v > lookup[key]]}
for key, value
in data_in.items()
]
print(data_out)
CodePudding user response:
I don't know what Timestamp
is, but if it has a function that returns the dates as a string (or any other data structure with a >
defined), you can do
# This is some class that knows its stamp value (the "date")
class Timestamp:
def __init__(self, value):
self._value = value
@property
def value(self):
return self._value
# This is your reference dict.
d1 = {
'A': Timestamp('2022-05-23 00:00:00'),
'L': Timestamp('2017-06-21 00:00:00'),
'S': Timestamp('2021-11-02 00:00:00'),
'D': Timestamp('2021-11-08 00:00:00')
}
# This is the data you want to clean.
d2 = {
'A': [
Timestamp('2022-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2022-01-12 00:00:00'),
Timestamp('2023-01-10 00:00:00')
],
'L': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'S': [
Timestamp('2021-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]
}
# This is the new dict you want.
d3 = {
key: [stamp for stamp in stamplist if stamp.value > d1[key].value]
for (key, stamplist) in d2.items()
}
# Check it:
for key, stamplist in d3.items():
for stamp in stamplist:
print(stamp.value)
CodePudding user response:
With pandas, one way is to use pandas.Series
constructor with a dict/listcomp :
from pandas import Timestamp
s1 = pd.Series(dictionary1)
s2 = pd.Series(dictionary2)
out = {k: [v for v in s2[k] if k > s1[k]] for k in s2.index}
Output :
{'A': [Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2022-01-13 00:00:00'), Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'), Timestamp('2022-10-18 00:00:00')]}