Bigquery

Introduction

The BigQueryClient class is a custom extension of the Google Cloud BigQuery Client. This enhanced client provides additional functionality for managing BigQuery operations more efficiently. It includes built-in methods for managing permissions, handling datasets and tables, and automating common tasks such as loading data, exporting tables, and simulating queries.

Class Initialization

This constructor initializes the BigQueryClient class, which inherits from the google.cloud.bigquery.Client. It shares the attributes and behavior of the original BigQuery client while enabling customized extensions for easier BigQuery operations.

Parameters

  • project_id (str): The project ID associated with the BigQuery client.
  • credentials (Optional[Credentials]): The credentials used to authenticate the client.
  • _http (Optional[requests.Session]): The HTTP session to use for requests.
  • location (Optional[str]): Default geographic location for queries and jobs.
  • default_query_job_config (Optional[QueryJobConfig]): Default configuration for query jobs.
  • default_load_job_config (Optional[LoadJobConfig]): Default configuration for load jobs.
  • client_info (Optional[ClientInfo]): Information about the client.
  • client_options (Optional[Union[ClientOptions, Dict]]): Additional client options.

Example

BigQueryClient(
    project_id: str,
    credentials: Optional[Credentials] = None,
    _http: Optional[requests.Session] = None,
    location: Optional[str] = None,
    default_query_job_config: Optional[QueryJobConfig] = None,
    default_load_job_config: Optional[LoadJobConfig] = None,
    client_info: Optional[ClientInfo] = None,
    client_options: Optional[Union[ClientOptions, Dict]] = None,
)

Methods Overview

check_table_existence

Description

Checks whether a specific table exists within a given dataset.

Parameters

  • dataset_id (str): The ID of the dataset to check.
  • table_id (str): The ID of the table to check.

Example

exists = client.check_table_existence("my_dataset", "my_table")
print(f"Table exists: {exists}")

load_data_from_csv

Description

Loads data from a CSV file into a specified BigQuery table. Optionally applies validation tests (custom or not) to the data.

Parameters

  • dataset_id (str): The destination dataset ID.
  • table_id (str): The destination table ID.
  • csv_file_path (str): Path to the CSV file.
  • test_functions (Optional[list]): A list of validation functions applied to the data. For the full list of predefined test (See the Custom checks class page for more details.)
  • encoding (str): The file encoding (default is UTF-8).

Example

from functools import partial

def custom_test(index, row, header, column_sums):
    if len(row) != len(header):
        raise ValueError(f"Row {index} does not match header length.")

data_checks = CustomDataChecks()
client.load_data_from_csv(
    dataset_id="my_dataset",
    table_id="my_table",
    csv_file_path="data.csv",
    test_functions=[
      custom_test,
      partial(data_checks.check_columns),
      partial(
        data_checks.check_unique, columns_to_test=["col1"],
      ),
      ...
    ]
)

simulate_query

Description

Simulates a query execution to estimate resource usage and provides metadata such as schema and referenced tables.

Parameters

  • query (str): The SQL query to simulate.

Returns

  • dict: Metadata about the query execution, including schema, referenced tables, and bytes processed.

Example

simulation = client.simulate_query("SELECT * FROM `my_project.my_dataset.my_table` LIMIT 10")
print(simulation)

grant_permission

Description

Manages permissions (add, remove, or update) for users on a BigQuery table or dataset.

Parameters

  • resource_id (str): The resource ID in the format project_id.dataset_id (if dataset) or project_id.dataset_id.table_id (if table).
  • user_permissions (List[Dict[str, str]]): A list of dictionaries containing user_email and role.
  • action (str): The action to perform (add, remove, or update).

Example

permissions = [
    {"user_email": "user@example.com", "role": "READER"},
    {"user_email": "admin@example.com", "role": "OWNER"}
]

client.grant_permission(
    resource_id="my_project.my_dataset",
    user_permissions=permissions,
    action="add"
)

export_table

Description

Exports a BigQuery table to a specified format (e.g., CSV, JSON) to Cloud Storage.

Parameters

  • dataset_id (str): The ID of the dataset.
  • table_id (str): The ID of the table.
  • destination (str): The destination path (e.g., gs://bucket_name/file).
  • output_file_format (OutputFileFormat): The format of the exported file (CSV, JSON, or AVRO).
  • compression (Literal): Compression type (GZIP, DEFLATE, SNAPPY, or NONE).

Example

client.export_table(
    dataset_id="my_dataset",
    table_id="my_table",
    destination="gs://my_bucket/my_table.csv",
    output_file_format="CSV",
    compression="GZIP"
)