Table Creation
Contents
Table Creation¶
As described in Quickstart, it is possible to register an already
created dask dataframe with a call to c.create_table
.
However, it is also possible to load data directly from disk (or s3, hdfs, URL, hive, …)
and register it as a table in dask_sql
.
Behind the scenes, a call to one of the read_<format>
of the dask.dataframe
will be executed.
Additionally, queries can be materialized into new tables for caching or faster access.
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] <table-name>
WITH ( <key> = <value> [ , ... ] )
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] <table-name>
AS ( SELECT ... )
CREATE [ OR REPLACE ] VIEW [ IF NOT EXISTS ] <table-name>
AS ( SELECT ... )
DROP TABLE | VIEW [ IF EXISTS ] <table-name>
See SQL Syntax for information on how to reference tables correctly. Please note, that there can only ever exist a single view or table with the same name.
Note
As there is only a single schema “schema” in dask-sql
,
table names should not include a separator “.” in CREATE
calls.
By default, if a table with the same name does already exist, dask-sql
will raise an exception
(and in turn will raise an exception if you try to delete a table which is not present).
With the flags IF [NOT] EXISTS
and OR REPLACE
, this behavior can be controlled:
CREATE OR REPLACE TABLE | VIEW
will override an already present table/view with the same name without raising an exception.CREATE TABLE IF NOT EXISTS
will not create the table/view if it already exists (and will also not raise an exception).DROP TABLE | VIEW IF EXISTS
will only drop the table/view if it exists and will not do anything otherwise.
CREATE TABLE WITH
¶
This will create and register a new table “df” with the data under the specified location
and format.
For information on how to specify key-value arguments properly, see SQL Syntax.
With the persist
parameter, it can be controlled if the data should be cached
or re-read for every SQL query.
The additional parameters are passed to the particular data loading functions.
If you omit the format argument, it will be deduced from the file name extension.
More ways to load data can be found in Data Loading and Input.
Example:
CREATE TABLE df WITH (
location = "/some/file/path",
format = "csv/parquet/json/...",
persist = True,
additional_parameter = value,
...
)
CREATE TABLE AS
¶
Using a similar syntax, it is also possible to create a (materialized) view of a (maybe complicated) SQL query.
With the command, you give the result of the SELECT
query a name, that you can use
in subsequent calls.
The SELECT
can also contain a call to PREDICT
, see Machine Learning in SQL.
Example:
CREATE TABLE my_table AS (
SELECT
a, b, SUM(c)
FROM data
GROUP BY a, b
...
)
SELECT * FROM my_table
CREATE VIEW AS
¶
Instead of using CREATE TABLE
it is also possible to use CREATE VIEW
.
The result is very similar, the only difference is, when the result will be computed: a view is recomputed on every usage,
whereas a table is only calculated once on creation (also known as a materialized view).
This means, if you e.g. read data from a remote file and the file changes, a query containing a view will
be updated whereas a query with a table will stay as it is.
To update a table, you need to recreate it.
Hint
Use views to simplify complicated queries (like a “shortcut”) and tables for caching.
Note
The update of the view only works, if your primary data source (the files you were reading in), are not persisted during reading.
Example:
CREATE VIEW my_table AS (
SELECT
a, b, SUM(c)
FROM data
GROUP BY a, b
...
)
SELECT * FROM my_table
DROP TABLE | VIEW
¶
Remove a table or view with the given name.
Please note again, that views and tables are treated equally, so CREATE TABLE
will also delete the view with the given name and vise versa.