I am trying to mask datetime
values by adding uniform random noise to the datetime
values.
I currently use the Cape Python module to add noise to my data. However I'd like to develop my own custom function similar to the one provided by Cape Python below.
!pip install cape-privacy
from cape_privacy.pandas.transformations import *
import datetime
s = pd.Series([datetime.date(year=2020, month=2, day=15)])
perturb = DatePerturbation(frequency="MONTH", min=-10, max=10)
perturb(s)
# Returns 2019-07-20
Is there a way I can add noise (between min
and max
) to the DAY
, MONTH
or YEAR
or a combination of the aforementioned, given a datetime value and make it look credible?
# Input
2021-09-23
# Expected Output when noise is added to DAY between -10 and 10
2021-09-20
CodePudding user response:
I don't know Cape Python, so this might be off...
Here's a way to do that:
from datetime import date, timedelta
from random import randint
def date_pertubation(d, attribs, minimum, maximum):
if isinstance(attribs, str):
attribs = [attribs]
attribs = [attrib.casefold() for attrib in attribs]
year = d.year
if "year" in attribs:
year = randint(minimum, maximum)
month = d.month - 1
if "month" in attribs:
month = randint(minimum, maximum)
year_delta, month = divmod(month, 12)
year = year_delta
month = 1
day_delta = d.day - 1
if "day" in attribs:
day_delta = randint(minimum, maximum)
return date(year, month, 1) timedelta(days=day_delta)
This
d = date(year=2020, month=2, day=15)
for _ in range(5):
print(date_pertubation(d, "DAY", -20, 20).strftime("%Y-%m-%d"))
for _ in range(5):
print(date_pertubation(d, "YEAR", -3, 3).strftime("%Y-%m-%d"))
for _ in range(5):
print(date_pertubation(d, ["YEAR", "MONTH", "DAY"], -3, 3).strftime("%Y-%m-%d"))
will produce something like
2020-02-11
2020-02-09
2020-01-29
2020-02-29
2020-03-01
2022-02-15
2022-02-15
2020-02-15
2017-02-15
2017-02-15
2016-12-12
2016-12-14
2021-01-13
2019-11-15
2021-05-14