Configuration in Dask-SQL
Configuration in Dask-SQL¶
dask-sql supports a list of configuration options to configure behavior of certain operations.
dask-sql uses Dask’s config
module and configuration options can be specified with YAML files, via environment variables,
or directly, either through the dask.config.set method
config_options argument in the
Number of output partitions from an aggregation operation
Number of branches per reduction step from an aggregation operation.
Whether sql identifiers are considered case sensitive while parsing.
If boolean, it determines whether all joins should use the broadcast join algorithm. If float, it's a value denoting dask's likelihood of selecting a broadcast join based codepath over a shuffle based join. Concretely, dask will select a broadcast based join algorithm if small_table.npartitions < log2(big_table.npartitions) * broadcast_bias Note: Forcing a broadcast join might lead to perf issues or OOM errors in cases where the broadcasted table is too large to fit on a single worker.
Whether or not to check the first partition length when computing a LIMIT without an OFFSET on a table with a relatively simple Dask graph (i.e. only IO and/or partition-wise layers); checking partition length triggers a Dask graph computation which can be slow for complex queries, but can signicantly reduce memory usage when querying a small subset of a large table. Default is ``true``.
Whether the first generated logical plan should be further optimized or used as is.
Whether to try pushing down filter predicates into IO (when possible).
Total number of elements below which dask-sql should attempt to apply the top-k optimization (when possible). ``nelem`` is defined as the limit or ``k`` value times the number of columns. Default is 1000000, corresponding to a LIMIT clause of 1 million in a 1 column table.