I have a dataframe as below. My question is, i want to assign the missing dates based on the first day of the Month available in the dataset. The dates will only be for a single month per file. It wont be multiple months in a single file. One cell will always have a date.
Please note that the files could be from a previous month so i cant use Today's date. I have used
fillna(method='ffill')
But this only works if the first cell has a date in it. If its blank itll make the rest blank as well.
EDIT: The filled dates can be the first of the month or ANY previously populated dates in the column. As long as they are from the same month, it should be fine.
Sample Data:
Other Columns | Date | Invoice | Other Columns |
---|---|---|---|
... | ABC232 | ... | |
... | 5-27-2022 | ABC232 | ... |
... | 5-27-2022 | ANBFN2323 | ... |
... | SADNF343 | ... | |
... | 1232HHH | ... |
Expected Output:
Other Columns | Date | Invoice | Other Columns |
---|---|---|---|
... | 5-01-2022 | ABC232 | ... |
... | 5-27-2022 | ABC232 | ... |
... | 5-27-2022 | ANBFN2323 | ... |
... | 5-01-2022 | SADNF343 | ... |
... | 5-01-2022 | 1232HHH | ... |
Please let me know if you have any questions. I cant find anything to even get me started.
Thanks
CodePudding user response:
import re
# Get the index of the first non-NA Date
idx = df["Date"].first_valid_index()
# Replace the day with 01
fill_value = re.sub(r"(\d )-(\d )-(\d )", r"\1-01-\3", df.loc[idx, "Date"])
# Fill
df["Date"] = df["Date"].fillna(fill_value)