Skip to main content

BigQuery Destination

Setting up the BigQuery destination connector involves setting up the data loading method (BigQuery Standard method and Google Cloud Storage bucket) and configuring the BigQuery destination connector using the Airbyte UI.

This page guides you through setting up the BigQuery destination connector.

Prerequisites

Setup guide

Step 1: Set up a data loading method

  1. Create a Cloud Storage bucket with the Protection Tools set to none or Object versioning. Make sure the bucket does not have a retention policy.
  2. Create an HMAC key and access ID.
  3. Grant the Storage Object Admin role to the Google Cloud Service Account. This must be the same service account as the one you configure for BigQuery access in the BigQuery connector setup step.
  4. Make sure your Cloud Storage bucket is accessible from the machine running Airbyte. The easiest way to verify if Airbyte is able to connect to your bucket is via the check connection tool in the UI.

Your bucket must be encrypted using a Google-managed encryption key (this is the default setting when creating a new bucket). We currently do not support buckets using customer-managed encryption keys (CMEK). You can view this setting under the "Configuration" tab of your GCS bucket, in the Encryption type row.

Step 2: Set up the BigQuery connector

  1. Log into your Airbyte Cloud or Airbyte Open Source account.

  2. Click Destinations and then click + New destination.

  3. On the Set up the destination page, select BigQuery from the Destination type dropdown.

  4. Enter the name for the BigQuery connector.

  5. For Project ID, enter your Google Cloud project ID.

  6. For Dataset Location, select the location of your BigQuery dataset.

    You cannot change the location later.
  7. For Default Dataset ID, enter the BigQuery Dataset ID.

  8. For Loading Method, select Standard Inserts or GCS Staging

    We recommend using the GCS Staging option.
  9. For Service Account Key JSON (Required for cloud, optional for open-source), enter the Google Cloud Service Account Key in JSON format.

  10. For Transformation Query Run Type (Optional), select interactive to have BigQuery run interactive query jobs or batch to have BigQuery run batch queries.

    note

    Interactive queries are executed as soon as possible and count towards daily concurrent quotas and limits, while batch queries are executed as soon as idle resources are available in the BigQuery shared resource pool. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive. Batch queries don't count towards your concurrent rate limit, making it easier to start many queries at once.

  11. For Google BigQuery Client Chunk Size (Optional), use the default value of 15 MiB. Later, if you see networking or memory management problems with the sync (specifically on the destination), try decreasing the chunk size. In that case, the sync will be slower but more likely to succeed.

Supported sync modes

The BigQuery destination connector supports the following sync modes:

  • Full Refresh Sync
  • Incremental - Append Sync
  • Incremental - Append + Deduped

Troubleshooting permission issues

The service account does not have the proper permissions.

  • Make sure the BigQuery service account has BigQuery User and BigQuery Data Editor roles or equivalent permissions as those two roles.
  • If the GCS staging mode is selected, ensure the BigQuery service account has the right permissions to the GCS bucket and path or the Cloud Storage Admin role, which includes a superset of the required permissions.

The HMAC key is wrong.

  • Make sure the HMAC key is created for the BigQuery service account, and the service account has permission to access the GCS bucket and path. postgres.md