Home > database >  check if my whole year dataset contains all months and days
check if my whole year dataset contains all months and days

Time:11-19

I have got the 1 year data and I would like to check if it contains observations from every day of every month. Basically to validate if all has been collected. The dataset contains day, month and year column. My idea was to plot this and see if all days of months are there. I have tried the following

fig, ax = plt.subplots()
ax.plot(earth2019['month'], earth2019['day'])

plt.show()

but the chart doesn't really confirms what I wanted to know,

My question is how to validate that my data contains all the observations? it should have some observations for each day of each month, I basically want to know if all data has been collected in that dataset. Is there some way to check this using Python code?

CodePudding user response:

Without a sample, it's difficult but you can try:

ref19 = pd.date_range('2019', '2020', closed='left', freq='D')
dti19 = pd.to_datetime(earth2019.assign(year=2019)[['year', 'month', 'day']])

out = ref19.difference(dti19)  # missing dates here

Sample output:

>>> out
DatetimeIndex(['2019-02-20', '2019-04-02', '2019-04-13', '2019-04-26',
               '2019-05-08', '2019-07-19', '2019-09-21', '2019-10-09',
               '2019-10-11', '2019-12-22'],
              dtype='datetime64[ns]', freq=None)

CodePudding user response:

1.The dataset contains day, month and year column

  1. check if it contains observations from every day of every month.

3.validate if all has been collected.

df.Yr_Mo_Dy.head()

df.Yr_Mo_Dy.value_counts()

df.Yr_Mo_Dy.isnull().sum()
  • Related