Debugging Queries
There are a number of issues that can crop up when running BigQuery. They can be roughly grouped into a few categories:
- Insufficient Slots
- Inefficient Query Constructs
- Data Problems
Insufficient Slots
The query execution will request slots depending on how well it can parallelise its work. Bigquery will allocate slots to the query using several considerations:
- The slots available under the reservation for the project
- Idle slots available from other reservations
- Other queries running at the same time
- Autoscaling settings
In general, BigQuery tries to be fair and will allocate slots to the jobs in equal manner (subject to reservations). This allocation is dynamic, i.e. if slots become available as a query ends, the slots are redistributed within seconds.
How to detect insufficient Slots
There are several places to determine this:
- In the Tree display, stages may be highlighted in RED and the stage detail may have an entry
called
performancewith the value beingSlot Contention - Stage(s)
wait (ms)average value shoots up, with waiting times being a series portion of the overall job duuration - In the Progress Tab, Estimated Slot Usage Graph maxes out for long periods
- Same place, Estimated Runnable Units Graph maxes out for long periods
Inefficient Query constructs
There are many types of query constructs that can result in slow performance.
In the Gantt Tab, you will be able to see stages that dominate the total execution time.
The GCP BigQuery Documentation contains a number of suggestions
In addition to these, look for COUNT and QUANTILE calls that can be slow if the data has high cardinality.
In this case use the Approximate Aggregate Functions
Data Problems
A common data problem is where the data categories are uneven, i.e. 100 categories represent 1% of the total of data and one category represents the remaidner. When doing counts or aggregation of groupings, such a condition will result in one work unit taking a much longer time than the rest.
Looking at stage details, the compute (ms) entry will show a a max value that is substantially larger than the average value.