Home > Software engineering >  FAQ for cloud Spanner
FAQ for cloud Spanner

Time:07-26

A tiny list of simple FAQ like questions for cloud-spanner

  1. 1 trillion rows (extra large table size) good practise or is there a point at which you have to make another table?

  2. A trillion row table could have more trillion row Interleaved tables, Is that fine too?

  3. JSON vs Interleaved Tables, which to use when?

  4. Average query speed for an indexed read on a trillion row table

  5. Email IDs, reverse bits or hashes vs UID as keys, will the UIDS kill performance significantly?

CodePudding user response:

  1. 1 trillion rows (extra large table size) good practise or is there a point at which you have to make another table?

Cloud Spanner is designed to handle large scale automatically. Cloud Spanner dynamically splits the key range of your data into chunks that are load balanced across different serving nodes/processing units. Splitting is done automatically by Cloud Spanner based on data size and load. More information can be found here: https://cloud.google.com/spanner/docs/schema-and-data-model#database-splits

  1. A trillion row table could have more trillion row Interleaved tables, Is that fine too?

Yes this is fine.

Cloud Spanner automatically partitions splits when it deems necessary due to size or load. To preserve data locality, Cloud Spanner prefers adding split boundaries as close as to the root tables, so that any given row tree can be kept in a single split. This means that operations within a row tree tend to be more efficient because they are unlikely to require communication with other splits. However, Cloud Spanner will attempt to add split boundaries to interleaved tables in order to isolate hotspots or if the split is getting too large.

https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design#table_layout

  1. JSON vs Interleaved Tables, which to use when?

Regarding using a JSON column, columns are limited to a max size of 10MB (https://cloud.google.com/spanner/quotas#tables), and the JSON column must be read and written in its entirety. JSON columns have limited scale and performance compared to interleaved tables.

Table interleaving is a mechanism that can improve the performance of certain join query patterns (e.g. queries that join parent rows with child rows). It can also improve write latency when both the parent and child rows are written in the same transaction. It is not recommended to use interleaved tables as a general data modeling concept.

https://medium.com/google-cloud/cloud-spanners-table-interleaving-a-query-optimization-feature-b8a87059da16

  1. Average query speed for an indexed read on a trillion row table

The query speed depends on the selectivity of the query predicate. E.g. if the predicate identifies row(s) in the index by specifying the value of the indexed column(s), then the query is fast regardless of the number of rows in the table. For more information, please see https://cloud.google.com/spanner/docs/sql-best-practices#secondary-indexes

  1. Email IDs, reverse bits or hashes vs UID as keys, will the UIDS kill performance significantly?

Hashes, bit-reversed sequential values, or UUIDs can all be used in primary key design to avoid hotspotting. UUIDs as the first part of primary keys, where the UUIDs are evenly distributed across the key space, will allow inserts to scale horizontally and not cause a moving hotspot constrained on a limited amount of serving resources/processing units. Please see the documentation here for more details https://cloud.google.com/spanner/docs/schema-design#uuid_primary_key

  • Related