Home > front end >  Pandas is not Reading Entire .CSV File
Pandas is not Reading Entire .CSV File

Time:06-18

I have been testing some issues I've had with Pandas. My end goal here is to add data to a .csv. While figuring out ways to change a .csv, I settled on this method:

import pandas
data = pandas.read_csv('path/to/my/script/test.csv')

data.iat[1,1] = 'DataHere'

data.to_csv('path/to/my/script/test.csv', index=False, header=False)

This code worked somewhat correctly. DataHere goes to the second row and second column, which is correct (because [0,0] is the first row and column. Note: it's not the normal x,y coordinates, it's more like y,x).

test.csv before code (6x6):

yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes

test.csv after code (6x5):

yes,yes,yes,yes,yes,yes
yes,DataHere,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes

It gets rid of the lowermost row for some reason! So I did some messing with the parameters of pandas.read_csv('path/to/my/script/test.csv') to fix this problem, and got this:

data = pandas.read_csv('path/to/my/script/test.csv', nrows=6, skip_blank_lines=False)

I added nrows=6 to make it read 6 rows, although I do intend to make it higher in the future. I added skip_blank_lines=False because I want to be able to add data to blank cells.

When I ran this new code (after changing the csv to its previous 6x6 state), it didn't help. It still erases the 6th row.

import pandas
data = pandas.read_csv('path/to/my/script/test.csv', nrows=6, skip_blank_lines=False)

data.iat[1,1] = 'DataHere'

data.to_csv('path/to/my/script/test.csv', index=False, header=False)

I also tried data.iat[6,3] = 'DataHere' instead of data.iat[1,1] = 'DataHere', which returned this error:

IndexError: index 6 is out of bounds for axis 0 with size 5

This shows that not only it is erasing the last row, but that it cannot add data to a blank cell. To make sure that it was the fault of this line: data = pandas.read_csv('path/to/my/script/test.csv', nrows=6, skip_blank_lines=False), I put print(data) in the line immediately after it and got this output (plus the previously stated errors). There should be a 5th row of 'yes' there. So my two problems are:

  1. Deletion of a row.
  2. Not being able to add data to a blank cell.

CodePudding user response:

pandas.read_csv('path/to/my/script/test.csv') uses the first row as a header row. Your test.csv does not have a header row. So it is likely that the first row (data row) in test.csv is being read as a header row. Giving you 5 data rows and not 6 as you expect.

This could be happening

sim_csv = io.StringIO(
'''yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes
yes,yes,yes,yes,yes,yes'''
)

data = pd.read_csv(sim_csv)
print(data)

   yes yes.1 yes.2 yes.3 yes.4 yes.5
0  yes   yes   yes   yes   yes   yes
1  yes   yes   yes   yes   yes   yes
2  yes   yes   yes   yes   yes   yes
3  yes   yes   yes   yes   yes   yes
4  yes   yes   yes   yes   yes   yes

Then when you write out the CSV with to_csv(header=None) you lose that first row of data.

To get around this you could do:

pandas.read_csv('path/to/my/script/test.csv', header=None)

Adding a row (cell to a new row)

You can add a row (cell to a row) by using the index. For example if you had:

   yes yes.1 yes.2 yes.3 yes.4 yes.5
0  yes   yes   yes   yes   yes   yes
1  yes   yes   yes   yes   yes   yes
2  yes   yes   yes   yes   yes   yes
3  yes   yes   yes   yes   yes   yes
4  yes   yes   yes   yes   yes   yes  

Then you could do: (note that this is .at and not .iat)

df.at[5,'yes'] = 'yes'

Which will give you:

   yes yes.1 yes.2 yes.3 yes.4 yes.5
0  yes   yes   yes   yes   yes   yes
1  yes   yes   yes   yes   yes   yes
2  yes   yes   yes   yes   yes   yes
3  yes   yes   yes   yes   yes   yes
4  yes   yes   yes   yes   yes   yes
5  yes   NaN   NaN   NaN   NaN   NaN
  • Related