Google Bigquery Incomplete Query Replies On Odd Attempts
When querying BigQuery through the python api using: service.jobs().getQueryResults We're finding that the first attempt works fine - all expected results are included in the resp
Solution 1:
It looks like the issue is that we return different default numbers of rows for query() and getQueryResults(). So depending on whether your query finished quickly (and so you didn't have to use getQueryResults()) you'd either get more or less rows.
I've filed a bug and we should have a fix soon.
The workaround (and a good idea overall) is to set the maxResults for both the query and the getQueryResults calls. And if you're going to want a lot of rows, you might want to page through results using the returned page token.
Below is an example that reads one page of data from a completed query job. It will be included in the next release of bq.py:
class_JobTableReader(_TableReader):
"""A TableReader that reads from a completed job."""def__init__(self, local_apiclient, project_id, job_id):
self.job_id = job_id
self.project_id = project_id
self._apiclient = local_apiclient
defReadSchemaAndRows(self, max_rows=None):
"""Read at most max_rows rows from a table and the schema.
Args:
max_rows: maximum number of rows to return.
Raises:
BigqueryInterfaceError: when bigquery returns something unexpected.
Returns:
A tuple where the first item is the list of fields and the
second item a list of rows.
"""
page_token = None
rows = []
schema = {}
max_rows = max_rows if max_rows isnotNoneelse sys.maxint
whilelen(rows) < max_rows:
(more_rows, page_token, total_rows, current_schema) = self._ReadOnePage(
max_rows=max_rows - len(rows),
page_token=page_token)
ifnot schema and current_schema:
schema = current_schema.get('fields', {})
max_rows = min(max_rows, total_rows)
for row in more_rows:
rows.append([entry.get('v', '') for entry in row.get('f', [])])
ifnot page_token andlen(rows) != max_rows:
raise BigqueryInterfaceError(
'PageToken missing for %r' % (self,))
ifnot more_rows andlen(rows) != max_rows:
raise BigqueryInterfaceError(
'Not enough rows returned by server for %r' % (self,))
return (schema, rows)
def_ReadOnePage(self, max_rows, page_token=None):
data = self._apiclient.jobs().getQueryResults(
maxResults=max_rows,
pageToken=page_token,
# Sets the timeout to 0 because we assume the table is already ready.
timeoutMs=0,
projectId=self.project_id,
jobId=self.job_id).execute()
ifnot data['jobComplete']:
raise BigqueryError('Job %s is not done' % (self,))
page_token = data.get('pageToken', None)
total_rows = int(data['totalRows'])
schema = data.get('schema', None)
rows = data.get('rows', [])
return (rows, page_token, total_rows, schema)
Post a Comment for "Google Bigquery Incomplete Query Replies On Odd Attempts"