connect#

seastersdb.connect

Functions

connect(…)

Establish a DuckDB connection and register macros and utility functions for accessing the SEASTERS database.

connect(**read_parquet_kws)[source]#

Establish a DuckDB connection and register macros and utility functions for accessing the SEASTERS database.

The connection is configured so that all read_parquet calls use union_by_name=True by default, ensuring consistent schema handling across heterogeneous files. The function creates a set of SQL macros that provide convenient access to data and metadata for each supported network, as well as a dedicated macro for BSRN datasets. A custom overlaps function is also registered for time-range filtering within SQL queries.

The created macros include:
  • ghcnd(), ghcnh(), gsdr(), bsrn() for bulk data access.

  • <network>_stations() and <network>_inv() for metadata.

  • <network>_var() for variable definitions (when available).

A dedicated bsrn(dataset) macro provides dataset-wise access to BSRN files. A custom overlaps function is added for evaluating temporal coverage in SQL queries.

Parameters:

**read_parquet_kws (Dict[str, Any]) – Additional keyword arguments forwarded to DuckDB’s read_parquet. These are applied to all automatically created macros. The option union_by_name=True is always enforced and cannot be overridden.

Returns:

con – A live DuckDB connection with all SEASTERS macros and custom functions registered.

Return type:

duckdb.DuckDBPyConnection