FugueSQL is a related project that aims to provide a unified SQL interface for a variety of different computing frameworks, including Dask. While it offers a SQL engine with a larger set of supported commands, this comes at the cost of slower performance when using Dask in comparison to dask-sql. In order to offer a “best of both worlds” solution, dask-sql includes several options to integrate with FugueSQL, using its faster implementation of SQL commands when possible and falling back on FugueSQL when necessary.
dask-sql as a FugueSQL engine¶
FugueSQL users unfamiliar with dask-sql can take advantage of its functionality by installing it in an environment alongside Fugue; this will automatically register
dask_sql.integrations.fugue.DaskSQLExecutionEngine as the default Dask execution engine for FugueSQL queries.
For more information and sample usage, see Fugue — dask-sql as a FugueSQL engine.
Using FugueSQL on an existing
dask-sql users attempting to expand their SQL querying options for an existing
Context can use
dask_sql.integrations.fugue.fsql_dask(), which executes the provided query using FugueSQL, using the tables within the provided context as input.
The results of this query can then optionally be registered to the context:
# define a custom prepartition function for FugueSQL def median(df: pd.DataFrame) -> pd.DataFrame: df["y"] = df["y"].median() return df.head(1) # create a context with some tables c = Context() ... # run a FugueSQL query using the context as input query = """ j = SELECT df1.*, df2.x FROM df1 INNER JOIN df2 ON df1.key = df2.key PERSIST TAKE 5 ROWS PREPARTITION BY x PRESORT key PRINT TRANSFORM j PREPARTITION BY x USING median PRINT """ result = fsql_dask(query, c, register=True) # results aren't registered by default assert "j" in result # returns a dict of resulting tables assert "j" in c.tables # results are also registered to the context