Why Pandas read_csv is not reading all the data?-CodePudding

I would like to know if the problem that I have with a particular csv file is a general error from pandas or is something related with the csv file. I used pandas read_csv for get the information ... but unfortunately pandas, with this function, is not load all the values. I noticed of this error because i was pretty sure that i have data in it (Particular date 2017/04/01 - 2017/04/02), so i checked the file with excel and, as i thought, the values are there. I save the file as .xlsx and use again pandas forreading but with read_excel and the data load succesfully. The most weird at all is that the problem is present only in some dates... without any patron visible, because with read_csv load some information, but no complete.

Is the same file. Initially, when processing the data, the file was saved as .csv. Later, with the .csv created a .xlsx since Excel.

csv file: https://drive.google.com/file/d/1VCte8jCu8dB-Qp4KHClZb5cEAUTzZ5lC/view?usp=sharing

excel file: https://docs.google.com/spreadsheets/d/1p5zJuDhS7PvLwSJMtexRrUHOvC6qexMs/edit?usp=sharing&ouid=112818913372395231829&rtpof=true&sd=true

excel case:

resume = pnd.read_excel("/content/gdrive/hcln/h_RiA_0.50_full_time.xlsx", sheet_name = "h_RiA_0.50_full_time (1)", parse_dates = [0])
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]

                     h50
Fecha   
2017-04-01 23:00:00 309.0
2017-04-01 23:05:00 287.0
2017-04-01 23:10:00 315.0
2017-04-01 23:15:00 324.0
2017-04-01 23:20:00 325.0
2017-04-01 23:25:00 340.0
2017-04-01 23:30:00 323.0
2017-04-01 23:35:00 330.0
2017-04-01 23:40:00 332.0
2017-04-01 23:45:00 308.0
2017-04-01 23:50:00 319.0
2017-04-01 23:55:00 289.0

csv case:

resume = pnd.read_csv("/content/gdrive/MyDrive/hcln/h_RiA_0.50_full_time.csv", parse_dates = [0])     
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]

                    h50
Fecha   
2017-04-01 23:00:00 NaN
2017-04-01 23:05:00 NaN
2017-04-01 23:10:00 NaN
2017-04-01 23:15:00 NaN
2017-04-01 23:20:00 NaN
2017-04-01 23:25:00 NaN
2017-04-01 23:30:00 NaN
2017-04-01 23:35:00 NaN
2017-04-01 23:40:00 NaN
2017-04-01 23:45:00 NaN
2017-04-01 23:50:00 NaN
2017-04-01 23:55:00 NaN

If someone of you could get whats the error, i appreciate it your answer. Here you can get the view that i got in Google Colab.

csv view excel view

CodePudding user response：

I found the answer. Sometime ago, i change the name column of the csv for "h50", in this case, in Excel, in that moment no show any warning message, i supposed that it is not going to affect the containing values. Apparently, that's the reason, because, i back run again the process related with ** h_RiA_0.50_full_time.csv ** and fortunately by this way all the values is loading with read_csv.

I suppose that there is a kind of problem because i made that changes in the column name, in Excel, and for some reason it generates problems with load values.

CodePudding user response：

It may be something that is totally off the way.

But usually, whenever I have mounted the drive from Google Colab, I used:

/content/drive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx

instead of:

/content/gdrive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx

Another possibility is that the format it reads must be .csv not .xlsx