Home > OS >  Is there a way to create a DataFrame from specific colour coded rows that are in within a Google She
Is there a way to create a DataFrame from specific colour coded rows that are in within a Google She

Time:10-28

I have a Google Sheet with large amounts of rows that are colour coded. I was looking to create a new DataFrame based on coloured rows. Is it possible to select rows based on their colour? Or even if there would is a way select rows that are not color coded.

Can't really find anything on this, so really not sure if this is possible.

CodePudding user response:

I don’t know how to do it with gspread (most likely it’s impossible), but it’s easy with google-api-python-client (which is a dependency of gspread)

You need to pass the includeGridData parameter to spreadsheets().get() method. Here is a slightly modified example from the doc:

data = (
    service.spreadsheets()
    .get(
        spreadsheetId=spreadsheet_id, 
        ranges=ranges, 
        includeGridData=True  # important,
        fields=",".join([  # specify only required fields to reduce response size
            "sheets.data.rowData.values.formattedValue",
            "sheets.data.rowData.values.effectiveFormat.backgroundColor",
        ])
    )
    .execute()
)

# now you should parse returned JSON according your needs, e.g.:

def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
    white_color = {"red": 1, "green": 1, "blue": 1}

    for grid_data in data["sheets"][0]["data"]:
        # grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
        for row_data in grid_data["rowData"]:
            # row_data["values"] contains a list of cells (CellData), one per column
            # cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
            row = []
            for cell_data in row_data["values"]:
                value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
                color = cell_data["effectiveFormat"]["backgroundColor"]
                if color != white_color:
                    row.append(value)
                else:
                    row.append(None)
            yield row

pd.DataFrame(list(parse(data)))

Upd: read multiple columns

CodePudding user response:

I had to slight modify your answer to get past NameError: name 'row' is not defined

data = (
    service.spreadsheets()
    .get(
        spreadsheetId=sheet_id,
        ranges=ranges,
        includeGridData=True,  # important,
        fields=",".join([  # specify only required fields to reduce response size
            "sheets.data.rowData.values.formattedValue",
            #"sheets.data.rowData.values.effectiveFormat.textFormat.strikethrough",
            "sheets.data.rowData.values.effectiveFormat.backgroundColor",
        ])
    )
    .execute()
)

def parse(data):
    white_color = {"red": 1, "green": 1, "blue": 1}
    for grid_data in data["sheets"][0]["data"]:
      for row_data in grid_data["rowData"]:
        cell_data = row_data["values"][0]
        value = cell_data["formattedValue"]
        color = cell_data["effectiveFormat"]["backgroundColor"]
        if color == white_color:
          yield value

print(pd.DataFrame({"column": list(parse(data))}))

This is nearly what I need. In this solution only first column is returned, how do you iterate remaining columns?

  • Related