I have a Google Sheet with large amounts of rows that are colour coded. I was looking to create a new DataFrame based on coloured rows. Is it possible to select rows based on their colour? Or even if there would is a way select rows that are not color coded.
Can't really find anything on this, so really not sure if this is possible.
CodePudding user response:
I don’t know how to do it with gspread (most likely it’s impossible), but it’s easy with google-api-python-client
(which is a dependency of gspread)
You need to pass the includeGridData
parameter to spreadsheets().get()
method. Here is a slightly modified example from the doc:
data = (
service.spreadsheets()
.get(
spreadsheetId=spreadsheet_id,
ranges=ranges,
includeGridData=True # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
# now you should parse returned JSON according your needs, e.g.:
def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
# grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
for row_data in grid_data["rowData"]:
# row_data["values"] contains a list of cells (CellData), one per column
# cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
row = []
for cell_data in row_data["values"]:
value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color != white_color:
row.append(value)
else:
row.append(None)
yield row
pd.DataFrame(list(parse(data)))
Upd: read multiple columns
CodePudding user response:
I had to slight modify your answer to get past NameError: name 'row' is not defined
data = (
service.spreadsheets()
.get(
spreadsheetId=sheet_id,
ranges=ranges,
includeGridData=True, # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
#"sheets.data.rowData.values.effectiveFormat.textFormat.strikethrough",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
def parse(data):
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
for row_data in grid_data["rowData"]:
cell_data = row_data["values"][0]
value = cell_data["formattedValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color == white_color:
yield value
print(pd.DataFrame({"column": list(parse(data))}))
This is nearly what I need. In this solution only first column is returned, how do you iterate remaining columns?