Securing DuckDB, Improving Startup Time, and Working Offline

DuckDB is an analytical database system that can be embedded in-process. It is fast, efficient, and easy to use. It has a number of extensions, including ones for querying popular file formats, like CSV, JSON, or Parquet; querying other databases, like SQLite, Postgres, or MySQL; integrating with open table formats, like Apache Iceberg and Delta Lake; or making HTTP requests.
Most DuckDB extensions are not installed or loaded by default because DuckDB targets embedded environments, like IoT, where it is critically important to keep the binary size and memory used by the application to a minimum. Instead, extensions are downloaded from the Internet dynamically, at run-time, and cached on the file system. Extensions are downloaded using HTTP, not HTTPS, but they are signed, and DuckDB verifies their signature before loading them.
Dynamically loading extensions presents a few challenges. First, if the run-time environment has no access to the Internet, extensions cannot be downloaded, and the application will fail at run-time. Many industrial environments for critical infrastructure, like manufacturing, or power generation and storage, do not allow access to the Internet due to cybersecurity concerns or regulations. Second, given the sensitive nature of these environments, dynamically loading software that has powerful capabilities, like reading or writing data on the filesystem, or in other databases or computers on the network, could be used as a means of cyberattack. In these industries, attention is increasingly given to securing the software supply chain, and the dynamic loading of extensions is antithetical to this. Finally, downloading extensions can take a few seconds, so in dynamic environments where applications may start and stop all the time, the penalty to download the extension can negatively impact query performance. For example, if DuckDB is used in an application on Kubernetes where pods regularly restart due to dynamic scaling or spot-instance termination, the first query to the application could take seconds longer than subsequent queries.[1]
In this article, I will review how DuckDB extension installation and loading works. Then I will demonstrate how to compile DuckDB with extensions statically linked to avoid the need to download them at run-time. I will also demonstrate how to explicitly disable extension installation to secure DuckDB for sensitive environments. The examples use the DuckDB CLI to make the concepts easy to demonstrate, but the techniques apply equally when DuckDB is embedded in-process.
Dynamic Extension Loading
Information about DuckDB extensions can be queried using SQL:
SELECT * FROM duckdb_extensions();
By default, the DuckDB CLI comes with a number of extensions statically linked:
D SELECT extension_name, loaded, installed, install_path, install_mode, installed_from FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬──────────────┬───────────────────┬────────────────┐
│ extension_name │ loaded │ installed │ install_path │ install_mode │ installed_from │
│ varchar │ boolean │ boolean │ varchar │ varchar │ varchar │
├──────────────────┼─────────┼───────────┼──────────────┼───────────────────┼────────────────┤
│ arrow │ false │ false │ │ NULL │ │
│ autocomplete │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ aws │ false │ false │ │ NULL │ │
│ azure │ false │ false │ │ NULL │ │
│ core_functions │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ delta │ false │ false │ │ NULL │ │
│ excel │ false │ false │ │ NULL │ │
│ fts │ false │ false │ │ NULL │ │
│ httpfs │ false │ false │ │ NULL │ │
│ iceberg │ false │ false │ │ NULL │ │
│ icu │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ inet │ false │ false │ │ NULL │ │
│ jemalloc │ false │ false │ │ NULL │ │
│ json │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ motherduck │ false │ false │ │ NULL │ │
│ mysql_scanner │ false │ false │ │ NULL │ │
│ parquet │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ postgres_scanner │ false │ false │ │ NULL │ │
│ shell │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ spatial │ false │ false │ │ NULL │ │
│ sqlite_scanner │ false │ false │ │ NULL │ │
│ tpcds │ false │ false │ │ NULL │ │
│ tpch │ false │ false │ │ NULL │ │
│ ui │ false │ false │ │ NULL │ │
│ vss │ false │ false │ │ NULL │ │
├──────────────────┴─────────┴───────────┴──────────────┴───────────────────┴────────────────┤
│ 25 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────┘
D
Note, the statically linked extensions are not only installed but they are also loaded, so there is no startup penalty when using them. The core_functions
extension is always included. The autotocomplete
and shell
extensions are integral to the CLI. The icu
extension is used for collations, time-zones, and time functions, and the parquet
and json
extension are used for reading and writing files in those formats.
An extension can be dynamically installed and loaded as follows:
D .timer on
D INSTALL sqlite_scanner;
Run Time (s): real 0.938 user 0.201725 sys 0.106720
D LOAD sqlite_scanner;
Run Time (s): real 0.901 user 1.329752 sys 0.031063
D SELECT extension_name, loaded, installed, install_path, install_mode, installed_from FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬───────────────────────────────────────────────────────────────────────────────────┬───────────────────┬────────────────┐
│ extension_name │ loaded │ installed │ install_path │ install_mode │ installed_from │
│ varchar │ boolean │ boolean │ varchar │ varchar │ varchar │
├──────────────────┼─────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────┼───────────────────┼────────────────┤
│ arrow │ false │ false │ │ NULL │ │
│ autocomplete │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ aws │ false │ false │ │ NULL │ │
│ azure │ false │ false │ │ NULL │ │
│ core_functions │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ delta │ false │ false │ │ NULL │ │
│ excel │ false │ false │ │ NULL │ │
│ fts │ false │ false │ │ NULL │ │
│ httpfs │ false │ false │ │ NULL │ │
│ iceberg │ false │ false │ │ NULL │ │
│ icu │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ inet │ false │ false │ │ NULL │ │
│ jemalloc │ false │ false │ │ NULL │ │
│ json │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ motherduck │ false │ false │ │ NULL │ │
│ mysql_scanner │ false │ false │ │ NULL │ │
│ parquet │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ postgres_scanner │ false │ false │ │ NULL │ │
│ shell │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ spatial │ false │ false │ │ NULL │ │
│ sqlite_scanner │ true │ true │ /Users/cbreck/.duckdb/extensions/v1.2.1/osx_amd64/sqlite_scanner.duckdb_extension │ REPOSITORY │ core │
│ tpcds │ false │ false │ │ NULL │ │
│ tpch │ false │ false │ │ NULL │ │
│ ui │ false │ false │ │ NULL │ │
│ vss │ false │ false │ │ NULL │ │
├──────────────────┴─────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────┴───────────────────┴────────────────┤
│ 25 rows 6 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 0.004 user 0.002501 sys 0.001084
D
With the sqlite_scanner
extension loaded, DuckDB can now make queries against SQLite databases. Installing the extension took almost a second, and loading the extension took almost an additional second. For an application that is sensitive to response time, this adds almost two seconds of startup delay. It could be even worse when loading multiple extensions, since they are installed and loaded serially. Note that the installation path for the sqlite_scanner
is the default DuckDB extension cache and the installation mode is REPOSITORY
rather than being statically linked.
If I restart DuckDB, note the sqlite_scanner
extension is already installed, since it is in the extensions cache, it is just not loaded. Furthermore, installing and loading this extension is now fast, since it has already been downloaded to the cache.
D SELECT extension_name, loaded, installed, install_path, install_mode, installed_from FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬───────────────────────────────────────────────────────────────────────────────────┬───────────────────┬────────────────┐
│ extension_name │ loaded │ installed │ install_path │ install_mode │ installed_from │
│ varchar │ boolean │ boolean │ varchar │ varchar │ varchar │
├──────────────────┼─────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────┼───────────────────┼────────────────┤
│ arrow │ false │ false │ │ NULL │ │
│ autocomplete │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ aws │ false │ false │ │ NULL │ │
│ azure │ false │ false │ │ NULL │ │
│ core_functions │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ delta │ false │ false │ │ NULL │ │
│ excel │ false │ false │ │ NULL │ │
│ fts │ false │ false │ │ NULL │ │
│ httpfs │ false │ false │ │ NULL │ │
│ iceberg │ false │ false │ │ NULL │ │
│ icu │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ inet │ false │ false │ │ NULL │ │
│ jemalloc │ false │ false │ │ NULL │ │
│ json │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ motherduck │ false │ false │ │ NULL │ │
│ mysql_scanner │ false │ false │ │ NULL │ │
│ parquet │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ postgres_scanner │ false │ false │ │ NULL │ │
│ shell │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ spatial │ false │ false │ │ NULL │ │
│ sqlite_scanner │ false │ true │ /Users/cbreck/.duckdb/extensions/v1.2.1/osx_amd64/sqlite_scanner.duckdb_extension │ REPOSITORY │ core │
│ tpcds │ false │ false │ │ NULL │ │
│ tpch │ false │ false │ │ NULL │ │
│ ui │ false │ false │ │ NULL │ │
│ vss │ false │ false │ │ NULL │ │
├──────────────────┴─────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────┴───────────────────┴────────────────┤
│ 25 rows 6 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
D .timer on
D INSTALL sqlite_scanner;
Run Time (s): real 0.001 user 0.000581 sys 0.000386
D LOAD sqlite_scanner;
Run Time (s): real 0.090 user 0.486691 sys 0.031426
Local Extension Repository
It is possible to use a local extension repository rather than download extensions from the Internet. First, I will move the downloaded extensions to a local folder and then clear the DuckDB extension cache:
mkdir ~/duckdb_extension_repository
mv ~/.duckdb/extensions/v1.2.1 ~/duckdb_extension_repository
rm -rf ~/.duckdb/extensions
After starting DuckDB, the extension repository can be set to the local repository and the extensions will be installed into the extension cache from the local repository (look at the installed_from
column):[2]
D set custom_extension_repository = '/Users/cbreck/duckdb_extension_repository';
D .timer on
D LOAD sqlite_scanner;
Run Time (s): real 0.965 user 0.345250 sys 0.062051
D SELECT extension_name, loaded, installed, install_path, install_mode, installed_from FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬───────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────────────────────────────────┐
│ extension_name │ loaded │ installed │ install_path │ install_mode │ installed_from │
│ varchar │ boolean │ boolean │ varchar │ varchar │ varchar │
├──────────────────┼─────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────────────────────────────────┤
│ arrow │ false │ false │ │ NULL │ │
│ autocomplete │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ aws │ false │ false │ │ NULL │ │
│ azure │ false │ false │ │ NULL │ │
│ core_functions │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ delta │ false │ false │ │ NULL │ │
│ excel │ false │ false │ │ NULL │ │
│ fts │ false │ false │ │ NULL │ │
│ httpfs │ false │ false │ │ NULL │ │
│ iceberg │ false │ false │ │ NULL │ │
│ icu │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ inet │ false │ false │ │ NULL │ │
│ jemalloc │ false │ false │ │ NULL │ │
│ json │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ motherduck │ false │ false │ │ NULL │ │
│ mysql_scanner │ false │ false │ │ NULL │ │
│ parquet │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ postgres_scanner │ false │ false │ │ NULL │ │
│ shell │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ spatial │ false │ false │ │ NULL │ │
│ sqlite_scanner │ true │ true │ /Users/cbreck/.duckdb/extensions/v1.2.1/osx_amd64/sqlite_scanner.duckdb_extension │ REPOSITORY │ /Users/cbreck/duckdb_extension_repository │
│ tpcds │ false │ false │ │ NULL │ │
│ tpch │ false │ false │ │ NULL │ │
│ ui │ false │ false │ │ NULL │ │
│ vss │ false │ false │ │ NULL │ │
├──────────────────┴─────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────────────────────────────────┤
│ 25 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 0.012 user 0.003023 sys 0.001650
Furthermore, DuckDB will not load or install extensions that are not in the local repository:
D LOAD httpfs;
IO Error:
Extension "/Users/cbreck/.duckdb/extensions/v1.2.1/osx_amd64/httpfs.duckdb_extension" not found.
Extension "httpfs" is an existing extension.
Install it first using "INSTALL httpfs".
D INSTALL httpfs;
IO Error:
Failed to copy local extension "httpfs" at PATH "/Users/cbreck/duckdb_extension_repository/v1.2.1/osx_amd64/httpfs.duckdb_extension"
Note, it still takes about a second to install the sqlite_scanner
extension into the DuckDB extension cache, so this does not solve the problem of cache warm-up on startup. To control which extensions are loaded and improve startup time, the local repository can be used and the DuckDB extension cache can be prepopulated. For example, this could be done in a Dockerfile. This setup will also work in environments that do not have access to the Internet at run-time.
Static Linking
A better way to control which extensions are loaded and improve startup time is to statically link the extensions by building DuckDB yourself. Compiling DuckDB is straightforward. Assuming you have a C++ compiler and the Ninja build system installed, clone the DuckDB repository, checkout the version you want, then build DuckDB, including the extensions you want statically linked:
git clone https://github.com/duckdb/duckdb.git
cd duckdb
git checkout v1.2.1
DISABLE_EXTENSION_LOAD=1 CORE_EXTENSIONS='icu;parquet;sqlite_scanner' GEN=ninja make
This will compile the libraries for embedding DuckDB into your application, along with the necessary header files. It will also compile the DuckDB CLI. Running the CLI and listing the extensions, unlike before, the sqlite_scanner
extension is now statically linked and already loaded:
./build/release/duckdb
v1.2.1 8e52ec4395
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT extension_name, loaded, installed, install_path, install_mode, installed_from FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬──────────────┬───────────────────┬────────────────┐
│ extension_name │ loaded │ installed │ install_path │ install_mode │ installed_from │
│ varchar │ boolean │ boolean │ varchar │ varchar │ varchar │
├──────────────────┼─────────┼───────────┼──────────────┼───────────────────┼────────────────┤
│ arrow │ false │ false │ │ NULL │ │
│ autocomplete │ false │ false │ │ NULL │ │
│ aws │ false │ false │ │ NULL │ │
│ azure │ false │ false │ │ NULL │ │
│ core_functions │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ delta │ false │ false │ │ NULL │ │
│ excel │ false │ false │ │ NULL │ │
│ fts │ false │ false │ │ NULL │ │
│ httpfs │ false │ false │ │ NULL │ │
│ iceberg │ false │ false │ │ NULL │ │
│ icu │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ inet │ false │ false │ │ NULL │ │
│ jemalloc │ false │ false │ │ NULL │ │
│ json │ false │ false │ │ NULL │ │
│ motherduck │ false │ false │ │ NULL │ │
│ mysql_scanner │ false │ false │ │ NULL │ │
│ parquet │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ postgres_scanner │ false │ false │ │ NULL │ │
│ shell │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ spatial │ false │ false │ │ NULL │ │
│ sqlite_scanner │ true │ true │ (BUILT-IN) │ STATICALLY_LINKED │ │
│ tpcds │ false │ false │ │ NULL │ │
│ tpch │ false │ false │ │ NULL │ │
│ ui │ false │ false │ │ NULL │ │
│ vss │ false │ false │ │ NULL │ │
├──────────────────┴─────────┴───────────┴──────────────┴───────────────────┴────────────────┤
│ 25 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────┘
D .timer on
D LOAD sqlite_scanner;
Run Time (s): real 0.001 user 0.000415 sys 0.000116
There is no need to load the sqlite_scanner
since it already loaded, but calling LOAD
demonstrates that with static linking, loading is instantaneous. Furthermore, because DuckDB was compiled with the DISABLE_EXTENSION_LOAD
flag, additional extensions cannot be installed or loaded:
D INSTALL httpfs;
Run Time (s): real 0.002 user 0.000751 sys 0.000843
Permission Error:
Installing external extensions is disabled through a compile time flag
D LOAD httpfs;
Run Time (s): real 0.001 user 0.000683 sys 0.000114
Permission Error:
Loading external extensions is disabled through a compile time flag
Conclusion
Statically compiling DuckDB improves security by preventing the dynamic loading of extensions. This is important for minimizing attack surface and securing the software supply chain. In addition, statically compiling DuckDB with the necessary extensions means DuckDB can be used in sensitive environments that do not have access to the Internet, or even just in environments that have poor or intermittent Internet connectivity. Lastly, statically compiling DuckDB means the extensions are already loaded, which can greatly improve the startup time for applications that are sensitive to it.[3][4]
In some circumstances, this performance issue can be mitigated using cache-priming techniques, like using a Kubernetes startup probe, but this adds complexity. ↩︎
While I ran these commands, I disabled the Wi-Fi on my laptop to prove the extension was loaded without Internet access. ↩︎
One caveat is that statically compiling DuckDB with the extensions you want also needs to be available through the client libraries in your programming language. See #461 for including this functionality in the Rust library for DuckDB. ↩︎
Statically compiling DuckDB will get all your ducks in a row and prevent waddling around at startup time. Static compilation solves for offline use cases and improves startup time, so you could say it kills two birds with one stone. It also improves security, and you never want to flock around with security. ↩︎