Home > Enterprise >  Days are missing in my list of dates. How can run a function to add custom days in missing rows?
Days are missing in my list of dates. How can run a function to add custom days in missing rows?

Time:04-12

I have a list that contains some dates with missing days. The list is stored as a pandas series. Here's what it looks like:

0     11-04-2022
1       -03-2022
2       -03-2022
3       -03-2022
4       -03-2022
5     12-04-2022
6       -11-2021
7      9-04-2022
8      8-04-2022
9      8-04-2022
10      -03-2022
11      -02-2022
12      -11-2021
13      -11-2021
14      -11-2021
15     7-04-2022
16     6-04-2022
17     5-04-2022

I'm using the following code to replace the missing days with '01':

for xyz in dates_wo_space:
    if xyz.startswith(('-' ' ')):
        dates_final = ('01')   str(dates_wo_space)
print(dates_final)

And here's the error I get:

NameError: name 'dates_final' is not defined

Can someone please show me how I can add '01' to rows with missing days?

CodePudding user response:

Assuming s the input string you could use a regex (^- = match - if in the beginning of the string):

s = s.str.replace(r'^-', '01-', regex=True)

you can take the opportunity to fill the single digit days with 0s (10 chars in a date)

s = s.str.replace(r'^-', '01-', regex=True).str.zfill(10)

NB. if it is possible to have spaces before the - use:

s = s.str.replace(r'^\s*-', '01-', regex=True)

output:

0     11-04-2022
1     01-03-2022
2     01-03-2022
3     01-03-2022
4     01-03-2022
5     12-04-2022
6     01-11-2021
7     09-04-2022
8     08-04-2022
9     08-04-2022
10    01-03-2022
11    01-02-2022
12    01-11-2021
13    01-11-2021
14    01-11-2021
15    07-04-2022
16    06-04-2022
Name: date, dtype: object

CodePudding user response:

Append "01" to values that start with "-":

>>> srs.where(~srs.str.startswith("-"),"01" srs)

0     11-04-2022
1     01-03-2022
2     01-03-2022
3     01-03-2022
4     01-03-2022
5     12-04-2022
6     01-11-2021
7      9-04-2022
8      8-04-2022
9      8-04-2022
10    01-03-2022
11    01-02-2022
12    01-11-2021
13    01-11-2021
14    01-11-2021
15     7-04-2022
16     6-04-2022
17     5-04-2022
dtype: object

Alternatively, with to_datetime:

srs = pd.to_datetime(srs,format="%d-%m-%Y",errors="coerce").fillna(pd.to_datetime(srs,format="-%m-%Y",errors="ignore"))
#convert back to strings if needed
srs = srs.dt.strftime("%d-%m-%Y")

>>> srs
0     11-04-2022
1     01-03-2022
2     01-03-2022
3     01-03-2022
4     01-03-2022
5     12-04-2022
6     01-11-2021
7     09-04-2022
8     08-04-2022
9     08-04-2022
10    01-03-2022
11    01-02-2022
12    01-11-2021
13    01-11-2021
14    01-11-2021
15    07-04-2022
16    06-04-2022
17    05-04-2022
dtype: object
  • Related