Home > Mobile >  Dealing with multiline fields in Google Sheet when exported to csv or tsv and retrieved using Python
Dealing with multiline fields in Google Sheet when exported to csv or tsv and retrieved using Python

Time:11-05

Lets say I have a Google Sheet that looks like this.

enter image description here

You can export it so it can be reached with a URL similar to this

# CSV
https://docs.google.com/spreadsheets/d/e/Eis4Ya-Le9Py/pub?gid=0&single=true&output=csv
# TSV
https://docs.google.com/spreadsheets/d/e/Eis4Ya-Le9Py/pub?gid=0&single=true&output=tsv

If you download the file and open it on Open Office, you can clearly see that it recognize the multilines.

enter image description here

And that is because the field with multiple lines get enclosed in "".

In a plain text editor it looks like

enter image description here

However, and here is the problem, if I get the file using python requests library, the double quotes are removed.

import requests

r=requests.get(url)

print(r.text)
print(r.content)
print(r.headers)


id  description
1   one line
2   line1 line2
3   l1 l2 empty line below  end
4   normal

b'id\tdescription\r\n1\tone line\r\n2\tline1 line2\r\n3\tl1 l2 empty line below  end\r\n4\tnormal'

{'Content-Type': 'text/tab-separated-values', 'X-Frame-Options': 'ALLOW-FROM https://docs.google.com', ... , 'Transfer-Encoding': 'chunked'}

Why?

How can I change that behavior?

I know there is a library for dealing with csv files, but I cannot use it in the environment I am in.

CodePudding user response:

Thanks to @Lorena Gomez I found the answer.

The behavior is different whether you request a tsv / csv file.

In code:

url_base="https://docs.google.com/spreadsheets/d/e/2PA...be/pub?gid=0&single=true&output="

import io
import requests

# simple test
# file exported as csv
url=url_base "csv"

s=requests.get(url)
print("-------- CSV -----------")
print(s.text)
print(s.content)

url=url_base "tsv"

s=requests.get(url)
print("-------- TSV -----------")
print(s.text)
print(s.content)

Produces

-------- CSV -----------
,
1,one line
2,"two
lines"
3,"three

lines"
4,"empty below

end"
b',\r\n1,one line\r\n2,"two\nlines"\r\n3,"three\n\nlines"\r\n4,"empty below\n\nend"'
-------- TSV -----------
    
1   one line
2   two lines
3   three  lines
4   empty below  end
b'\t\r\n1\tone line\r\n2\ttwo lines\r\n3\tthree  lines\r\n4\tempty below  end'

Opening the 2 in a text editor.

enter image description here

CodePudding user response:

Adding to @Rub's answer, the issue is not the requests library. Based on this information:

A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure... Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab character.

So that means it is expected that line breaks will not be preserved once you export to a .tsv file.

  • Related