How to automate the generation of dates in this special situation?-CodePudding

I have a result file generated from software and it looks like this:

,0,1,2,3,4,5,6,7,8,9
0,Month,Decade,Stage,Kc,ETc,ETc,Eff,rain,Irr.,Req.
1,coeff,mm/day,mm/dec,mm/dec,mm/dec,,,,,
2,Sep,1,Init,0.50,1.85,18.5,21.8,0.0,,
3,Sep,2,Init,0.50,1.77,17.7,30.3,0.0,,
4,Sep,3,Init,0.50,1.72,17.2,37.1,0.0,,
5,Oct,1,Deve,0.61,2.05,20.5,49.5,0.0,,
6,Oct,2,Deve,0.82,2.66,26.6,59.3,0.0,,
7,Oct,3,Deve,1.03,3.24,35.6,43.0,0.0,,
8,Nov,1,Mid,1.20,3.63,36.3,20.9,15.4,,
9,Nov,2,Mid,1.21,3.53,35.3,6.0,29.2,,
10,Nov,3,Mid,1.21,3.70,37.0,4.0,33.0,,
11,Dec,1,Mid,1.21,3.87,38.7,0.1,38.6,,
12,Dec,2,Late,1.18,3.92,39.2,0.0,39.2,,
13,Dec,3,Late,1.00,3.58,39.4,0.0,39.4,,
14,Jan,1,Late,0.88,3.36,10.1,0.0,10.1,,
15,,,,,,,,,,
16,372.1,272.2,204.9,,,,,,,

As one can observe, the months vary from September to January. Each month is divided into three divisions or decades. To be exact, the months vary from September 2017 to 1st decade of January 2018. Now, I have to generate dates with the starting date of each decade in a month in this format: 01-Sep-2017. So I will have 01-Sep-2017, 11-Sep-2017, 21-Sep-2017, ..., 01-Jan-2018. How to generate these dates? I will share the code that I have written until now.

years = [2017, 2018, 2019]
temp = pd.read_csv(folder_link) # Reading the particular result file
Month = temp['0'][2:] # First column = Month (Jul, Aug, ..)
Decade = temp['1'][2:]
for year in years:
    for j in range(2,len(Decade)): # First two lines are headers, so removed them
        if(int(Decade[j]) == 1): # First decade = 1-10 days of month
            Date = "1"   "-"   Month[j]   "-"   str(year) # Writing the date as 1-Jan-2017
            Dates.append(Date)
        if(int(Decade[j]) == 2): # Second decade = 11-20 days of month
            Date = "11"   "-"   Month[j]   "-"   str(year)
            Dates.append(Date)  
        if(int(Decade[j]) == 3): # Third decade = 21-28 or 21-30 or 21-31 days of month
            Date = "21"   "-"   Month[j]   "-"   str(year)
            Dates.append(Date)

The problem with this code is I will get 01-Sep-2017, 11-Sep-2017, 21-Sep-2017, ..., 01-Jan-2017 (instead of 2018). I need a generalized solution that could work for all months, not just for January. I have some results ranging from Sep 2017 - Aug 2018. Any help?

CodePudding user response：

If you want to stay with the iteration approach (there may be more efficient one using pandas functions), here is a simple way to do :


dates = []
year = 2017
month_list = ['Jan', 'Sep', 'Oct', 'Nov', 'Dec']
temp = pd.read_csv("data.csv") # Reading the particular result file
for index, row in temp.iterrows():
    # First two lines are headers, so skip them. Same for last two lines.
    if index > 1 and row[1] in month_list:
        if row[1] == 'Jan':
            year  = 1
        if(int(row[2]) == 1): # First decade = 1-10 days of month
            date = "1"   "-"   row[1]   "-"   str(year) # Writing the date as 1-Jan-2017
            dates.append(date)
        elif(int(row[2]) == 2): # Second decade = 11-20 days of month
            date = "11"   "-"   row[1]   "-"   str(year)
            dates.append(date)  
        elif(int(row[2]) == 3): # Third decade = 21-28 or 21-30 or 21-31 days of month
            date = "21"   "-"   row[1]   "-"   str(year)
            dates.append(date)
        else:
            print("Unrecognized value for month {}".format(row[2]))
            pass
print(dates)

Explanation :

use iterrows to iterate over your dataframe rows
then, skip headers and check you are parsing actual data by looking at month value (using a predefined list)
finally, just increment year when your month value is Jan

*Note : this solution assumes that your data is a time series with rows ordered in time.

P.S: only use capital letters for classes in Python, not variables.

CodePudding user response：

First you could start by setting your columns and index right while reading the csv file. Then you can use a formula to deduce the day from decade.
Increment year when switching from december to january only (you can extend your condition here if there are cases where january and/or december are missing).

The code becomes much easier to read and understand once you apply these:

temp = pd.read_csv(folder_link, header=1, index_col=0)
Dates = []
year = 2017
for index, row in temp.iloc[1:].iterrows():
    month = row["Month"]
    if month == "Jan" and temp.at[index-1, "Month"] == "Dec":
        year  = 1   # incrementing year if row is january while preceding row is december
    day = (int(row["Decade"]) - 1) * 10   1
    Dates.append(f"{day}-{month}-{year}")
print(Dates)

Output:

['1-Sep-2017', '11-Sep-2017', '21-Sep-2017', '1-Oct-2017', '11-Oct-2017', '21-Oct-2017', '1-Nov-2017', '11-Nov-2017', '21-Nov-2017', '1-Dec-2017', '11-Dec-2017', '21-Dec-2017', '1-Jan-2018']