Bigquery Dataflow Error: Cannot Read And Write In Different Locations While Reading And Writing In Eu
I have a simple Google DataFlow task. It reads from a BigQuery table and writes into another, just like this: (p | beam.io.Read( beam.io.BigQuerySource( query='select dia
Solution 1:
The error Cannot read and write in different locations
is pretty self explanatory and it might happen because of:
- BigQuery dataset is in EU and you're running DataFlow in US
- Your GCS buckets are in EU and you're running DataFlow in US
As you specified in the question, you have created temporary locations in GCS in EU and your BigQuery dataset is also located in the EU, so you must run DataFlow job in EU too.
In order to achieve this, you need to specify zone
parameter in PipelineOptions
, like this:
options = PipelineOptions()
wo = options.view_as(WorkerOptions) # type: WorkerOptions
wo.zone = "europe-west1-b"# rest of your options:
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'm-h'
google_cloud_options.job_name = 'myjob3'
google_cloud_options.staging_location = r'gs://p_df/staging'# EUROPE-WEST1
google_cloud_options.region = r'europe-west1'
google_cloud_options.temp_location = r'gs://p_df/temp'# EUROPE-WEST1
options.view_as(StandardOptions).runner = 'DataFlowRunner'
p = beam.Pipeline(options=options)
Solution 2:
The BigQuerySource transform used in the Python DirectRunner doesn't automatically determine the locations for temp tables. See BEAM-1909 for the issue.
When using the DataflowRunner this should work.
Post a Comment for "Bigquery Dataflow Error: Cannot Read And Write In Different Locations While Reading And Writing In Eu"