So I was trying do get some information about Cancer from GDA , I'm new with this so please don't judge. Through a tutorial on the website i managed to get a response from a request but I don't know how to transform it into a Dataframe (so I can make Merges with other data). Here down below I show how the data come in string version and everything is fine:
Then here when I make it into a Dataframe it all compress it into a single column with \
as divisors. I don't get how can I actually make it as a Pandas Dataframe. Here is how it looks , if I print the first column all of this is printed out meaning that it does not recognize the columns as shown in the string version above:
This is the full Tutorial page code used:
from io import BytesIO
from io import StringIO
import ast
import pandas as pd
import requests
import json
fields = [
"file_name",
"cases.submitter_id",
"cases.samples.sample_type",
"cases.disease_type",
"cases.project.project_id"
]
fields = ",".join(fields)
files_endpt = "https://api.gdc.cancer.gov/files"
# This set of filters is nested under an 'and' operator.
filters = {
"op": "and",
"content":[
{
"op": "in",
"content":{
"field": "cases.project.primary_site",
"value": ["Breast"]
}
},
{
"op": "in",
"content":{
"field": "files.experimental_strategy",
"value": ["RNA-Seq"]
}
}
]
}
# A POST is used, so the filter parameters can be passed directly as a Dict object.
params = {
"filters": filters,
"fields": fields,
"format": "TSV", #TSV
"size": "2000"
}
# The parameters are passed to 'json' rather than 'params' in this case
response = requests.post(files_endpt, headers = {"Content-Type": "application/json"}, json = params)
string = response.content.decode("utf-8")
df = pd.read_csv(BytesIO(response.content),on_bad_lines='skip')
print(df)```
CodePudding user response:
The data you are getting appears to be tab delimited, as such the following tweak should work fine:
df = pd.read_csv(BytesIO(response.content), sep='\t', on_bad_lines='skip')
Giving you a dataframe starting:
cases.0.disease_type ... id
0 Ductal and Lobular Neoplasms ... 37175dfe-e34e-4f97-88b1-c0ba4bd5d093
1 Ductal and Lobular Neoplasms ... 319bc898-6d70-4c38-a177-37ed7824dd7a
2 Complex Epithelial Neoplasms ... 42c461fe-31a4-4ee4-8d17-95a5da96a8eb