I would like to know if the problem that I have with a particular csv file is a general error from pandas or is something related with the csv file. I used pandas read_csv for get the information ... but unfortunately pandas, with this function, is not load all the values. I noticed of this error because i was pretty sure that i have data in it (Particular date 2017/04/01 - 2017/04/02), so i checked the file with excel and, as i thought, the values are there. I save the file as .xlsx and use again pandas forreading but with read_excel and the data load succesfully. The most weird at all is that the problem is present only in some dates... without any patron visible, because with read_csv load some information, but no complete.
Is the same file. Initially, when processing the data, the file was saved as .csv. Later, with the .csv created a .xlsx since Excel.
csv file: https://drive.google.com/file/d/1VCte8jCu8dB-Qp4KHClZb5cEAUTzZ5lC/view?usp=sharing
excel case:
resume = pnd.read_excel("/content/gdrive/hcln/h_RiA_0.50_full_time.xlsx", sheet_name = "h_RiA_0.50_full_time (1)", parse_dates = [0])
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]
h50
Fecha
2017-04-01 23:00:00 309.0
2017-04-01 23:05:00 287.0
2017-04-01 23:10:00 315.0
2017-04-01 23:15:00 324.0
2017-04-01 23:20:00 325.0
2017-04-01 23:25:00 340.0
2017-04-01 23:30:00 323.0
2017-04-01 23:35:00 330.0
2017-04-01 23:40:00 332.0
2017-04-01 23:45:00 308.0
2017-04-01 23:50:00 319.0
2017-04-01 23:55:00 289.0
csv case:
resume = pnd.read_csv("/content/gdrive/MyDrive/hcln/h_RiA_0.50_full_time.csv", parse_dates = [0])
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]
h50
Fecha
2017-04-01 23:00:00 NaN
2017-04-01 23:05:00 NaN
2017-04-01 23:10:00 NaN
2017-04-01 23:15:00 NaN
2017-04-01 23:20:00 NaN
2017-04-01 23:25:00 NaN
2017-04-01 23:30:00 NaN
2017-04-01 23:35:00 NaN
2017-04-01 23:40:00 NaN
2017-04-01 23:45:00 NaN
2017-04-01 23:50:00 NaN
2017-04-01 23:55:00 NaN
If someone of you could get whats the error, i appreciate it your answer. Here you can get the view that i got in Google Colab.
CodePudding user response:
I found the answer. Sometime ago, i change the name column of the csv for "h50", in this case, in Excel, in that moment no show any warning message, i supposed that it is not going to affect the containing values. Apparently, that's the reason, because, i back run again the process related with ** h_RiA_0.50_full_time.csv ** and fortunately by this way all the values is loading with read_csv.
I suppose that there is a kind of problem because i made that changes in the column name, in Excel, and for some reason it generates problems with load values.
CodePudding user response:
It may be something that is totally off the way.
But usually, whenever I have mounted the drive from Google Colab, I used:
/content/drive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx
instead of:
/content/gdrive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx
Another possibility is that the format it reads must be .csv
not .xlsx