Skip to content Skip to sidebar Skip to footer

Google Bigquery Incomplete Query Replies On Odd Attempts

When querying BigQuery through the python api using: service.jobs().getQueryResults We're finding that the first attempt works fine - all expected results are included in the resp

Solution 1:

It looks like the issue is that we return different default numbers of rows for query() and getQueryResults(). So depending on whether your query finished quickly (and so you didn't have to use getQueryResults()) you'd either get more or less rows.

I've filed a bug and we should have a fix soon.

The workaround (and a good idea overall) is to set the maxResults for both the query and the getQueryResults calls. And if you're going to want a lot of rows, you might want to page through results using the returned page token.

Below is an example that reads one page of data from a completed query job. It will be included in the next release of bq.py:

class_JobTableReader(_TableReader):
  """A TableReader that reads from a completed job."""def__init__(self, local_apiclient, project_id, job_id):
    self.job_id = job_id
    self.project_id = project_id
    self._apiclient = local_apiclient

  defReadSchemaAndRows(self, max_rows=None):
    """Read at most max_rows rows from a table and the schema.

    Args:
      max_rows: maximum number of rows to return.

    Raises:
      BigqueryInterfaceError: when bigquery returns something unexpected.

    Returns:
      A tuple where the first item is the list of fields and the
      second item a list of rows.
    """
    page_token = None
    rows = []
    schema = {}
    max_rows = max_rows if max_rows isnotNoneelse sys.maxint
    whilelen(rows) < max_rows:
      (more_rows, page_token, total_rows, current_schema) = self._ReadOnePage(
          max_rows=max_rows - len(rows),
          page_token=page_token)
      ifnot schema and current_schema:
        schema = current_schema.get('fields', {})

      max_rows = min(max_rows, total_rows)
      for row in more_rows:
        rows.append([entry.get('v', '') for entry in row.get('f', [])])
      ifnot page_token andlen(rows) != max_rows:
          raise BigqueryInterfaceError(
            'PageToken missing for %r' % (self,))
      ifnot more_rows andlen(rows) != max_rows:
        raise BigqueryInterfaceError(
            'Not enough rows returned by server for %r' % (self,))
    return (schema, rows)

  def_ReadOnePage(self, max_rows, page_token=None):
    data = self._apiclient.jobs().getQueryResults(
        maxResults=max_rows,
        pageToken=page_token,
        # Sets the timeout to 0 because we assume the table is already ready.
        timeoutMs=0,
        projectId=self.project_id,
        jobId=self.job_id).execute()
    ifnot data['jobComplete']:
      raise BigqueryError('Job %s is not done' % (self,))
    page_token = data.get('pageToken', None)
    total_rows = int(data['totalRows'])
    schema = data.get('schema', None)
    rows = data.get('rows', [])
    return (rows, page_token, total_rows, schema)

Post a Comment for "Google Bigquery Incomplete Query Replies On Odd Attempts"