Lets say I have a Google Sheet that looks like this.
You can export it so it can be reached with a URL similar to this
# CSV
https://docs.google.com/spreadsheets/d/e/Eis4Ya-Le9Py/pub?gid=0&single=true&output=csv
# TSV
https://docs.google.com/spreadsheets/d/e/Eis4Ya-Le9Py/pub?gid=0&single=true&output=tsv
If you download the file and open it on Open Office, you can clearly see that it recognize the multilines.
And that is because the field with multiple lines get enclosed in "".
In a plain text editor it looks like
However, and here is the problem, if I get the file using python requests library, the double quotes are removed.
import requests
r=requests.get(url)
print(r.text)
print(r.content)
print(r.headers)
id description
1 one line
2 line1 line2
3 l1 l2 empty line below end
4 normal
b'id\tdescription\r\n1\tone line\r\n2\tline1 line2\r\n3\tl1 l2 empty line below end\r\n4\tnormal'
{'Content-Type': 'text/tab-separated-values', 'X-Frame-Options': 'ALLOW-FROM https://docs.google.com', ... , 'Transfer-Encoding': 'chunked'}
Why?
How can I change that behavior?
I know there is a library for dealing with csv files, but I cannot use it in the environment I am in.
CodePudding user response:
Thanks to @Lorena Gomez I found the answer.
The behavior is different whether you request a tsv / csv file.
In code:
url_base="https://docs.google.com/spreadsheets/d/e/2PA...be/pub?gid=0&single=true&output="
import io
import requests
# simple test
# file exported as csv
url=url_base "csv"
s=requests.get(url)
print("-------- CSV -----------")
print(s.text)
print(s.content)
url=url_base "tsv"
s=requests.get(url)
print("-------- TSV -----------")
print(s.text)
print(s.content)
Produces
-------- CSV -----------
,
1,one line
2,"two
lines"
3,"three
lines"
4,"empty below
end"
b',\r\n1,one line\r\n2,"two\nlines"\r\n3,"three\n\nlines"\r\n4,"empty below\n\nend"'
-------- TSV -----------
1 one line
2 two lines
3 three lines
4 empty below end
b'\t\r\n1\tone line\r\n2\ttwo lines\r\n3\tthree lines\r\n4\tempty below end'
Opening the 2 in a text editor.
CodePudding user response:
Adding to @Rub's answer, the issue is not the requests
library. Based on this information:
A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure... Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab character.
So that means it is expected that line breaks will not be preserved once you export to a .tsv
file.