I have been working on a huge text file. Where I want to read and cut it with pandas.
Here is a sample of the raw file:
Date;Time;GHI;DNI;DIF;flagR;SE;SA;TEMP;AP;RH;WS;WD;PWAT
01.01.1994;00:07;0;0;0;0;-41.92;-19.43;14.3;1004.4;93.4;0.3;189;17.7
01.01.1994;00:22;0;0;0;0;-40.65;-23.70;14.3;1004.4;93.6;0.1;186;17.8
01.01.1994;00:37;0;0;0;0;-39.14;-27.75;14.3;1004.3;93.7;0.0;10;18.0
To do that, I have a date format %d.%m.%Y
, and I changed it into %d/%m/%Y
. Then I saw on the VSCode Data Viewer the need to sort because my result was %Y-%m-%d time
. This time
part is always T00:00:00
, and I do not need it because I already have time. Why is this text appearing in VSCode Data Viewer? Does this time is always generated? Is it ignored by Python? Why the date format I wrote is not working?
import pandas as pd
import numpy as np
import datetime
# It will read the file: It will separate by semi-colonne,
# and it will ignore the first 56 rows.
file = pd.read_csv('file.txt',
sep = ';',
skiprows = 56)
# It will read the "Date" column to replace the "."
# to "/". This will help the code to read properly the
# date column. Then it will give the format to the
# whole column [day/month/year].
file["Date"] = file["Date"].str.replace('.','/').apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").date())
I used the code snippet above but it doesn't work with the format %d/%m/%Y
and .date()
.
This is the file contents when I print it:
Date Time GHI DNI DIF flagR SE SA TEMP AP RH WS WD PWAT
0 1994-01-01 00:07 0 0 0 0 -41.92 -19.43 14.3 1004.4 93.4 0.3 189 17.7
1 1994-01-01 00:22 0 0 0 0 -40.65 -23.70 14.3 1004.4 93.6 0.1 186 17.8
2 1994-01-01 00:37 0 0 0 0 -39.14 -27.75 14.3 1004.3 93.7 0.0 10 18.0
This is the file contents when I look it using VSCode Data Viewer:
Date Time GHI DNI DIF flagR SE SA TEMP AP RH WS WD PWAT
0 1994-01-01T00:00:00 00:07 0 0 0 0 -41.92 -19.43 14.3 1004.4 93.4 0.3 189 17.7
1 1994-01-01T00:00:00 00:22 0 0 0 0 -40.65 -23.70 14.3 1004.4 93.6 0.1 186 17.8
2 1994-01-01T00:00:00 00:37 0 0 0 0 -39.14 -27.75 14.3 1004.3 93.7 0.0 10 18.0
Thank you
CodePudding user response:
That's how VScode Data Viewer views date, it doesn't mean it's this way actually.
So, you can change the format of your Date
column by replacing it with this:
file["Date"] = pd.to_datetime(file['Date'], format='%d.%M.%Y').dt.strftime('%d/%m/%Y')
# write dataframe to CSV file
file.to_csv("out.csv", index=False)
And this is the content of the CSV file:
Date | Time | GHI | DNI | DIF | flagR | SE | SA | TEMP | AP | RH | WS | WD | PWAT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01/01/1994 | 00:07 | 0 | 0 | 0 | 0 | -41.92 | -19.43 | 14.3 | 1004.4 | 93.4 | 0.3 | 189 | 17.7 |
01/01/1994 | 00:22 | 0 | 0 | 0 | 0 | -40.65 | -23.7 | 14.3 | 1004.4 | 93.6 | 0.1 | 186 | 17.8 |
01/01/1994 | 00:37 | 0 | 0 | 0 | 0 | -39.14 | -27.75 | 14.3 | 1004.3 | 93.7 | 0.0 | 10 | 18.0 |