Bigquery
Introduction
The BigQueryClient class is a custom extension of the Google Cloud BigQuery Client. This enhanced client provides additional functionality for managing BigQuery operations more efficiently. It includes built-in methods for managing permissions, handling datasets and tables, and automating common tasks such as loading data, exporting tables, and simulating queries.
Class Initialization
This constructor initializes the BigQueryClient class, which inherits from the google.cloud.bigquery.Client. It shares the attributes and behavior of the original BigQuery client while enabling customized extensions for easier BigQuery operations.
Parameters
project_id(str): The project ID associated with the BigQuery client.credentials(Optional[Credentials]): The credentials used to authenticate the client._http(Optional[requests.Session]): The HTTP session to use for requests.location(Optional[str]): Default geographic location for queries and jobs.default_query_job_config(Optional[QueryJobConfig]): Default configuration for query jobs.default_load_job_config(Optional[LoadJobConfig]): Default configuration for load jobs.client_info(Optional[ClientInfo]): Information about the client.client_options(Optional[Union[ClientOptions, Dict]]): Additional client options.
Example
BigQueryClient(
project_id: str,
credentials: Optional[Credentials] = None,
_http: Optional[requests.Session] = None,
location: Optional[str] = None,
default_query_job_config: Optional[QueryJobConfig] = None,
default_load_job_config: Optional[LoadJobConfig] = None,
client_info: Optional[ClientInfo] = None,
client_options: Optional[Union[ClientOptions, Dict]] = None,
)Methods Overview
check_table_existence
Description
Checks whether a specific table exists within a given dataset.
Parameters
dataset_id(str): The ID of the dataset to check.table_id(str): The ID of the table to check.
Example
exists = client.check_table_existence("my_dataset", "my_table")
print(f"Table exists: {exists}")load_data_from_csv
Description
Loads data from a CSV file into a specified BigQuery table. Optionally applies validation tests (custom or not) to the data.
Parameters
dataset_id(str): The destination dataset ID.table_id(str): The destination table ID.csv_file_path(str): Path to the CSV file.test_functions(Optional[list]): A list of validation functions applied to the data. For the full list of predefined test (See the Custom checks class page for more details.)encoding(str): The file encoding (default is UTF-8).
Example
from functools import partial
def custom_test(index, row, header, column_sums):
if len(row) != len(header):
raise ValueError(f"Row {index} does not match header length.")
data_checks = CustomDataChecks()
client.load_data_from_csv(
dataset_id="my_dataset",
table_id="my_table",
csv_file_path="data.csv",
test_functions=[
custom_test,
partial(data_checks.check_columns),
partial(
data_checks.check_unique, columns_to_test=["col1"],
),
...
]
)simulate_query
Description
Simulates a query execution to estimate resource usage and provides metadata such as schema and referenced tables.
Parameters
query(str): The SQL query to simulate.
Returns
dict: Metadata about the query execution, including schema, referenced tables, and bytes processed.
Example
simulation = client.simulate_query("SELECT * FROM `my_project.my_dataset.my_table` LIMIT 10")
print(simulation)grant_permission
Description
Manages permissions (add, remove, or update) for users on a BigQuery table or dataset.
Parameters
resource_id(str): The resource ID in the format project_id.dataset_id (if dataset) or project_id.dataset_id.table_id (if table).user_permissions(List[Dict[str, str]]): A list of dictionaries containing user_email and role.action(str): The action to perform (add, remove, or update).
Example
permissions = [
{"user_email": "user@example.com", "role": "READER"},
{"user_email": "admin@example.com", "role": "OWNER"}
]
client.grant_permission(
resource_id="my_project.my_dataset",
user_permissions=permissions,
action="add"
)export_table
Description
Exports a BigQuery table to a specified format (e.g., CSV, JSON) to Cloud Storage.
Parameters
dataset_id(str): The ID of the dataset.table_id(str): The ID of the table.destination(str): The destination path (e.g.,gs://bucket_name/file).output_file_format(OutputFileFormat): The format of the exported file (CSV,JSON, orAVRO).compression(Literal): Compression type (GZIP,DEFLATE,SNAPPY, orNONE).
Example
client.export_table(
dataset_id="my_dataset",
table_id="my_table",
destination="gs://my_bucket/my_table.csv",
output_file_format="CSV",
compression="GZIP"
)