As described in Quickstart, it is possible to register an already
created dask dataframe with a call to
However, it is also possible to load data directly from disk (or s3, hdfs, URL, hive, …)
and register it as a table in
Behind the scenes, a call to one of the
read_<format> of the
will be executed.
Additionally, queries can be materialized into new tables for caching or faster access.
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] <table-name> WITH ( <key> = <value> [ , ... ] ) CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] <table-name> AS ( SELECT ... ) CREATE [ OR REPLACE ] VIEW [ IF NOT EXISTS ] <table-name> AS ( SELECT ... ) DROP TABLE | VIEW [ IF EXISTS ] <table-name>
See SQL Syntax for information on how to reference tables correctly. Please note, that there can only ever exist a single view or table with the same name.
As there is only a single schema “schema” in
table names should not include a separator “.” in
By default, if a table with the same name does already exist,
dask-sql will raise an exception
(and in turn will raise an exception if you try to delete a table which is not present).
With the flags
IF [NOT] EXISTS and
OR REPLACE, this behavior can be controlled:
CREATE OR REPLACE TABLE | VIEWwill override an already present table/view with the same name without raising an exception.
CREATE TABLE IF NOT EXISTSwill not create the table/view if it already exists (and will also not raise an exception).
DROP TABLE | VIEW IF EXISTSwill only drop the table/view if it exists and will not do anything otherwise.
CREATE TABLE WITH¶
This will create and register a new table “df” with the data under the specified location
For information on how to specify key-value arguments properly, see SQL Syntax.
persist parameter, it can be controlled if the data should be cached
or re-read for every SQL query.
The additional parameters are passed to the particular data loading functions.
If you omit the format argument, it will be deduced from the file name extension.
More ways to load data can be found in Data Loading and Input.
CREATE TABLE df WITH ( location = "/some/file/path", format = "csv/parquet/json/...", persist = True, additional_parameter = value, ... )
CREATE TABLE AS¶
Using a similar syntax, it is also possible to create a (materialized) view of a (maybe complicated) SQL query.
With the command, you give the result of the
SELECT query a name, that you can use
in subsequent calls.
SELECT can also contain a call to
PREDICT, see Machine Learning in SQL.
CREATE TABLE my_table AS ( SELECT a, b, SUM(c) FROM data GROUP BY a, b ... ) SELECT * FROM my_table
CREATE VIEW AS¶
Instead of using
CREATE TABLE it is also possible to use
The result is very similar, the only difference is, when the result will be computed: a view is recomputed on every usage,
whereas a table is only calculated once on creation (also known as a materialized view).
This means, if you e.g. read data from a remote file and the file changes, a query containing a view will
be updated whereas a query with a table will stay as it is.
To update a table, you need to recreate it.
Use views to simplify complicated queries (like a “shortcut”) and tables for caching.
The update of the view only works, if your primary data source (the files you were reading in), are not persisted during reading.
CREATE VIEW my_table AS ( SELECT a, b, SUM(c) FROM data GROUP BY a, b ... ) SELECT * FROM my_table
DROP TABLE | VIEW¶
Remove a table or view with the given name.
Please note again, that views and tables are treated equally, so
will also delete the view with the given name and vise versa.