Home > Software engineering >  Why it changes the day/month position on my datetime column
Why it changes the day/month position on my datetime column

Time:10-08

I have the following data:

Date;Time;GHI;DNI;DIF;flagR;SE;SA;TEMP;AP;RH;WS;WD;PWAT
01.01.1994;00:07;0;0;0;0;-41.92;-19.43;14.3;1004.4;93.4;0.3;189;17.7
01.01.1994;00:22;0;0;0;0;-40.65;-23.70;14.3;1004.4;93.6;0.1;186;17.8
01.01.1994;00:37;0;0;0;0;-39.14;-27.75;14.3;1004.3;93.7;0.0;10;18.0
01.01.1994;00:52;0;0;0;0;-37.43;-31.57;14.3;1004.1;93.7;0.2;4;18.2
...
31.08.2021;23:07;0;0;0;0;-75.77;-5.76;11.5;1004.0;71.8;0.8;261;9.0
31.08.2021;23:22;0;0;0;0;-74.99;-20.16;11.5;1003.8;71.2;1.1;237;8.3
31.08.2021;23:37;0;0;0;0;-73.45;-32.59;11.6;1003.7;70.5;1.5;224;7.5
31.08.2021;23:52;0;0;0;0;-71.33;-42.67;11.6;1003.5;69.9;2.0;217;6.6

I sort it and make a new csv file for each year. However, in each new csv file, my Date_Time column changes the day/month order on the same csv file.

file = pd.read_csv('datta.csv', 
                    sep = ';', 
                    skiprows = 56,
                    parse_dates = [['Date','Time']])


years_pre = pd.to_datetime(file['Date_Time']).dt.year

#* Create an excel file time (15 min) for each year.
for year_XX in range(pd.to_datetime(file['Date_Time']).dt.year.min(), 
                     pd.to_datetime(file['Date_Time']).dt.year.max() 1):
    print(f"Creating file (15 min) for the year: {year_XX}")
    df_subset_15 = file[years_pre == year_XX]
    df_subset_15['Date_Time'] = df_subset_15['Date_Time'
                                    ].dt.strftime('%d/%m/%Y %H:%M')
    df_subset_15.to_csv(f"data_(15 min)_year_{year_XX}.csv")

Output: (-Ex: 1994)

2970,31/01/1994 22:37,0,0,0,0,-48.94,13.69,16.9,1001.2,85.9,1.7,251,15.6
2971,31/01/1994 22:52,0,0,0,0,-49.6,8.3,16.9,1001.2,86.2,1.7,250,15.6
2972,31/01/1994 23:07,0,0,0,0,-49.94,2.79,16.9,1001.3,86.4,1.7,249,15.6
2973,31/01/1994 23:22,0,0,0,0,-49.94,-2.77,16.9,1001.3,86.6,1.7,246,15.6
2974,31/01/1994 23:37,0,0,0,0,-49.6,-8.29,16.9,1001.3,86.9,1.7,244,15.7
2975,31/01/1994 23:52,0,0,0,0,-48.94,-13.68,16.9,1001.3,87.1,1.7,241,15.7
2976,02/01/1994 00:07,0,0,0,0,-48.25,-18.95,16.9,1001.3,87.4,1.7,239,15.7
2977,02/01/1994 00:22,0,0,0,0,-46.99,-23.92,17.0,1001.1,87.2,1.8,234,15.7
2978,02/01/1994 00:37,0,0,0,0,-45.46,-28.6,17.0,1001.0,86.9,1.9,229,15.7
2979,02/01/1994 00:52,0,0,0,0,-43.68,-32.98,17.1,1000.8,86.7,2.1,225,15.8
2980,02/01/1994 01:07,0,0,0,0,-41.7,-37.03,17.2,1000.6,86.5,2.2,222,15.8
2981,02/01/1994 01:22,0,0,0,0,-39.52,-40.79,17.2,1000.5,86.7,2.3,218,15.7
2982,02/01/1994 01:37,0,0,0,0,-37.19,-44.25,17.2,1000.3,86.9,2.4,215,15.7

You can see how it changes from '%d/%m/%Y %H:%M' to '%m/%d/%Y %H:%M' later it returns to do the original order and much later it changes again.

CodePudding user response:

Here is necessary add dayfirst=True for convert datetimes in read_csv:

file = pd.read_csv('datta.csv', 
                    sep = ';', 
                    skiprows = 56,
                    dayfirst=True,
                    parse_dates = [['Date','Time']])

Difference is in datetimes with days less like 13:

temp="""Date;Time
04.07.1994;00:07
01.01.1994;00:22
31.01.1994;00:37"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";", dayfirst=True,parse_dates = [['Date','Time']])
    
print (df)
            Date_Time
0 1994-07-04 00:07:00
1 1994-01-01 00:22:00
2 1994-01-31 00:37:00

temp="""Date;Time
04.07.1994;00:07
01.01.1994;00:22
31.01.1994;00:37"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";", parse_dates = [['Date','Time']])
    
print (df)
            Date_Time
0 1994-04-07 00:07:00
1 1994-01-01 00:22:00
2 1994-01-31 00:37:00
  • Related