I am trying to read a CSV file from my private Google Drive. The file has as authorisation: Anyone with the link. Here is the link: https://drive.google.com/file/d/12txcYHcO8aiwO9f948_nsaIE3wBGAuJa/view?usp=sharing
and here is a sample of the file:
email first_name last_name
[email protected] Luca Rossi
[email protected] Daniel Bianchi
[email protected] Gabriel Domeneghetti
[email protected] Christian Bona
[email protected] Simone Marsango
I need to read this file in order to parse this data into a program. I tried many ways, such as every possibility that has been suggested in this question: Pandas: How to read CSV file from google drive public?. This is the code I wrote to do that so far:
csv_file_url = 'the file URL as copied in the drive UI'
file_id = csv_file_url.split('/')[-2]
dwn_url = 'https://drive.google.com/uc?export=download&id=' file_id
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
df = pd.read_csv(csv_raw)
print(df.head())
And that should work, but returns only this table:
ÿþe Unnamed: 1 Unnamed: 2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
I think it is only a format matter, but I don't know how to get rid of it. Please, if you know how, help me.
CodePudding user response:
You data is UTF16 encoded. You can read it specifying the encoding:
pd.read_csv(dwn_url, encoding='utf16')
Result:
email first_name last_name
0 NaN NaN NaN
1 [email protected] Luca Rossi
2 [email protected] Daniel Bianchi
3 [email protected] Gabriel Domeneghetti
4 [email protected] Christian Bona
5 [email protected] Simone Marsango
(read_csv
can directly read from a url, no need for requests
and StringIO
.)