Skip to main content

Snowflake and Apache Iceberg

dbt supports materializing the table in Iceberg table format in two different ways:

  • The model configuration field table_format = 'iceberg' (legacy)
  • Catalog integration in the model, resource, or dbt_project.yml configuration

We recommend that you use the Iceberg catalog configuration and apply the catalog in the model config for ease of use and future-proof your code. Using table_format = 'iceberg' directly on the model configuration is a legacy approach.

Creating Iceberg Tables

dbt supports creating Iceberg tables for three of the Snowflake materializations:

Iceberg catalogs

Snowflake has support for Iceberg tables via built-in and external catalogs, including:

  • Snowflake built-in catalog (metadata managed by Snowflake’s built-in information schema)
  • Polaris/Open Catalog (managed Polaris)*
  • Glue Data Catalog*
  • Iceberg REST Compatible*

*dbt catalog support coming soon.

To use an externally managed catalog (anything outside of the built-in catalog), you must set up a catalog integration. To do so, you must run a SQL command similar to the following.

External catalogs

Example configurations for external catalogs.

You must set up a catalog integration to use Polaris/Open Catalog (managed Polaris).

Example code:


CREATE CATALOG INTEGRATION my_polaris_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = '<open_catalog_name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client_id>'
OAUTH_CLIENT_SECRET = '<client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;

Executing this will register the external Polaris catalog with Snowflake. Once configured, dbt can create Iceberg tables in Snowflake that register the existence of the new database object with the catalog as metadata and query Polaris-managed tables.

After you have created the external catalog integration, you will be able to do two things:

  • Query an externally managed table: Snowflake can query Iceberg tables whose metadata lives in the external catalog. In this scenario, Snowflake is a "reader" of the external catalog. The table’s data remains in external cloud storage (AWS S3 or GCP Bucket) as defined in the catalog storage configuration. Snowflake will use the catalog integration to fetch metadata via the REST API. Snowflake then reads the data files from cloud storage.

  • Sync Snowflake-managed tables to an external catalog: You can create a Snowflake Iceberg table that Snowflake manages via a cloud storage location and then register/sync that table to the external catalog. This allows other engines to discover the table.

dbt Catalog Integration Configurations for Snowflake

The following table outlines the configuration fields required to set up a catalog integration for Iceberg tables in Snowflake.

FieldRequiredAccepted values
nameyesName of catalog integration
catalog_nameyesThe name of the catalog integration in Snowflake. For example, my_dbt_iceberg_catalog)
external_volumeyes<external_volume_name>
table_formatyesiceberg
catalog_typeyesbuilt_in, iceberg_rest*
adapter_propertiesoptionalSee below

*Coming soon! Stay tuned for updates.

Adapter Properties

These are the additional optional configurations, unique to Snowflake, that can be supplied and nested under adapter_properties to add in more configurability.

FieldAccepted values
storage_serialization_policyCOMPATIBLE or OPTIMIZED
max_data_extension_time_in_days0 to 90 with a default of 14
data_retention_time_in_daysStandard Account: 1, Enterprise or higher: 0 to 90, default 1
change_trackingTrue or False
  • storage_serialization_policy

The serialization policy tells Snowflake what kind of encoding and compression to perform on the table data files. If not specified at table creation, the table inherits the value set at the schema, database, or account level. If the value isn’t specified at any level, the table uses the default value. You can’t change the value of this parameter after table creation. Accepted values: .

  • max_data_extension_time_in_days

The maximum number of days Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. The MAX_DATA_EXTENSION_TIME_IN_DAYS parameter enables you to limit this automatic extension period to control storage costs for data retention, or for compliance reasons.

  • data_retention_time_in_days

For managed Iceberg tables, you can set a retention period for Snowflake Time Travel and undropping the table over the default account values. For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.

  • change_tracking

Specifies whether to enable change tracking on the table.

Configure catalog integration for managed Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project.

    An example of Snowflake Horizon as the catalog:

catalogs:
- name: catalog_horizon
active_write_integration: snowflake_write_integration
write_integrations:
- name: snowflake_write_integration
external_volume: dbt_external_volume
table_format: iceberg
catalog_type: built_in

  1. Apply the catalog configuration at either the model, folder, or project level.

    An example of iceberg_model.sql:

{{
config(
materialized='table',
catalog = catalog_horizon

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with a dbt run -s iceberg_model.

For more information, refer to our documentation on Snowflake configurations.

Limitations

For external catalogs, Snowflake only supports read, which means it can query the table but cannot insert or modify data.

The syncing experience will be different depending on the catalog you choose. Some catalogs are automatically refreshed, and you can set parameters to do so with your catalog integration. Other catalogs might require a separate job to manage the metadata sync.

0