Skip to content Skip to sidebar Skip to footer

Bigquery Dataflow Error: Cannot Read And Write In Different Locations While Reading And Writing In Eu

I have a simple Google DataFlow task. It reads from a BigQuery table and writes into another, just like this: (p | beam.io.Read( beam.io.BigQuerySource( query='select dia

Solution 1:

The error Cannot read and write in different locations is pretty self explanatory and it might happen because of:

  • BigQuery dataset is in EU and you're running DataFlow in US
  • Your GCS buckets are in EU and you're running DataFlow in US

As you specified in the question, you have created temporary locations in GCS in EU and your BigQuery dataset is also located in the EU, so you must run DataFlow job in EU too.

In order to achieve this, you need to specify zone parameter in PipelineOptions, like this:

options = PipelineOptions()

wo = options.view_as(WorkerOptions)  # type: WorkerOptions
wo.zone = "europe-west1-b"# rest of your options:
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'm-h'
google_cloud_options.job_name = 'myjob3'
google_cloud_options.staging_location = r'gs://p_df/staging'# EUROPE-WEST1
google_cloud_options.region = r'europe-west1'
google_cloud_options.temp_location = r'gs://p_df/temp'# EUROPE-WEST1
options.view_as(StandardOptions).runner = 'DataFlowRunner'

p = beam.Pipeline(options=options)

Solution 2:

The BigQuerySource transform used in the Python DirectRunner doesn't automatically determine the locations for temp tables. See BEAM-1909 for the issue.

When using the DataflowRunner this should work.

Post a Comment for "Bigquery Dataflow Error: Cannot Read And Write In Different Locations While Reading And Writing In Eu"