6 Easy Ways How to Upload Data into Google Cloud BigQuery

Discover 6 powerful ways to move data into BigQuery: manual upload, Cloud Storage, DataFlow, Pub/Sub, Data Transfer Service, and APIs/SDKs.

Karolina Everlingová
7. June 2024

When it comes to exploring or fully utilizing data in BigQuery, the first step is to upload your data to the Google Cloud Platform (GCP). There are several data ingestion methods, ranging from manual imports to automated integrations, and from batch to streaming data processing. In this article, we will explore 6 different methods to upload data into BigQuery.

1. Manual Data Upload

Starting with the simplest and most straightforward method, BigQuery allows you to upload data directly through its web interface. This method is ideal for quickly testing new datasets or importing one-time files. Supported formats include popular ones like CSV, JSON, AVRO, Parquet, and ORC. For small amounts of data and cases where regular updates aren’t necessary, manual import is a perfect choice.

Steps:

  1. In your selected dataset, choose “Create Table.”
  2. Select one of the available manual data import options.
  3. This operation will create a native BigQuery table with your data.

For more details, visit the BigQuery batch loading documentation.

Create Table via manual upload

2. Google Cloud Storage (GCS)

Google Cloud Storage (GCS) is perfect for storing large data files that you want to regularly upload to BigQuery. Supported formats include CSV, JSON, AVRO, Parquet, and ORC. With GCS, you can automate the import of logs from various systems or transfer data from other cloud services.

Example:

Assuming you have a GCS bucket with a CSV file that you want to upload to BigQuery as part of your ELT process, follow these steps:

Google Cloud Storage (GCS) file example
  1. Create a table in BigQuery.
  2. Use the GCS URI to load the data, for example:
LOAD DATA INTO `your-project-id.your_dataset.your_table`
FROM FILES (
  format = 'CSV',
  uris = ['gs://your-bucket-name/your-file.csv']);

3. DataFlow

Google Dataflow is a fully managed service for streaming and batch data processing that allows you to create and manage data pipelines with high availability and low latency.

Google Dataflow

Built on Apache Beam, Dataflow offers a unified programming interface that supports various data sources and computational resources, making it easier to develop, deploy, and monitor complex data pipelines. With automatic scaling and integration with other Google Cloud services like BigQuery, Cloud Storage, and Pub/Sub, Dataflow provides a robust and flexible solution for real-time data transformation and analysis.

Learn more about Google Dataflow.

4. Pub/Sub

Google Cloud Pub/Sub is an excellent tool for real-time data ingestion. It allows you to collect and distribute events from various sources and then transfer them to BigQuery. Pub/Sub is natively supported by many tools in GCP, such as for logging, and is well-integrated into the entire GCP ecosystem. This method is ideal for event-driven applications and their logging.

Google Cloud Pub/Sub is an excellent tool for real-time data ingestion

Steps:

  1. Create a new subscriber for an existing Pub/Sub topic, with the type set to BigQuery Writer.
  2. Follow the necessary steps to assign specific roles to the service account, create the BigQuery table, and define its schema.

For a step-by-step setup, refer to the Pub/Sub to BigQuery integration documentation.

5. BigQuery Data Transfer Service (DTS)

BigQuery Data Transfer Service (DTS) is a tool for scheduled data transfers from various sources into BigQuery. DTS supports many popular connectors, such as Google Ads, Google Analytics, YouTube, Salesforce, Marketo, Facebook Ads, Stripe, and more.

Steps:

  1. From the menu select Data Transfer.
  2. Create a transfer using one of the over 238 ready-to-use data connectors.
Create BigQuery Data Transfer

For more information, visit the BigQuery Data Transfer Service documentation.

6. API and SDK

If you have specific needs or want greater control over data transfers, you can use the BigQuery Storage API or one of the supported SDKs, such as those for Python, Java, Node.js, and others. This approach allows you to implement both batch and streaming ingestion, giving you complete control over how and what data to upload to or read from BigQuery.

Which API method to choose? You can use this decision tree:

BigQuery Storage API decision tree

Resources:

Conclusion

BigQuery offers a wide range of data ingestion methods, from manual uploads to automated transfers, and from batch to real-time processing. The choice of method depends on the specific needs of your project and the technical requirements of your solution.

Whether you’re just playing with data or need enterprise-level deployment, feel free to reach out for a free consultation.

Happy data exploration!

—team Revolt BI

Read some of our other articles