Documentation
  • The Fundamental
  • ACTIVE SYNC
    • Data Ingestion
      • Data Tracking
        • API Key Management
        • Generate Tracking ID
        • Install tracking with Tag Manager
        • Install Tracking from the Console
        • Tracking Method on Website
      • Datasource
        • MySQL
        • PostgreSQL
        • MongoDB
        • Microsoft SQL Server
        • Shopify
        • CSV
        • Google Sheets
    • Data Ingestion API
      • Data Lake
        • File upload
        • Tracking API
      • Data Warehouse
        • Batch upload
        • CSV upload
        • Tracking API
      • Data Schema Warehouse API
    • Data Integrations
      • Manage your API Key
      • Get Data using API
  • ROCKET.BI
    • Introduction
    • Data Warehouse
      • Data Management
      • Ad-hoc Query
        • Measure Schema
        • Calculated Field
      • Query Analysis
      • Relationship
    • Row-level Security
    • Dashboard
      • Dashboard Filter
      • Chart Control
        • Tab Control
        • Single Choice
        • Multiple Choice
        • Dropdown Control
        • Slicer Control
        • Date Control
        • Input Control
      • Manage Dashboard
        • Relationship
        • View and Share
        • Select Main Date Filter
        • Boost
        • Settings
        • Add Chart
        • Add Tab
        • Add Text
    • Chart Builder
      • Chart Types
        • Pie Chart
        • Column Chart
        • Bar Chart
        • Line Chart
        • Line Stock Chart
        • Pareto Chart
        • Bubble Chart
        • Scatter Chart
        • Map Chart
        • Area Chart
        • KPI Chart
        • Lollipop Chart
        • Parliament Chart
        • Funnel Chart
        • Pyramid Chart
        • Gauge Chart
        • Bullet Graph Chart
        • Heat Map Chart
        • Word Cloud Chart
        • Tree Map Chart
        • Stacked Column Chart
        • Stacked Bar Chart
        • Sankey Chart
        • Spider Web Chart
        • Wind Rose Chart
        • Histogram Chart
        • Bell Curve Chart
        • Table Chart
        • Pivot Table Chart
      • Chart Settings
        • Zoom
        • Inner chart filter
      • Chart Filters
        • Tab Filter
        • Single Choice
        • Multiple Choice
        • Dropdown Filter
        • Slicer Filter
        • Date Filter
        • Input Filter
      • Right-click Settings
        • Change date function
        • Drill down
        • Drill through
        • Use as a filter
    • SQL Query
      • Syntax
      • Functions
      • Aggregate Functions
      • Data Types
  • UNLOCK.CI
    • Unlock.CI
Powered by GitBook
On this page
  • count
  • min
  • max
  • sum
  • avg
  • any
  • stddevPop
  • stddevSamp
  • varPop(x)
  • varSamp
  • covarPop
  • covarSamp
  • anyHeavy
  • anyLast
  • argMin
  • avgWeighted
  • topK
  • topKWeighted
  • groupArray
  • groupUniqArray
  • groupArrayInsertAt
  • groupArrayMovingAvg
  • sumWithOverflow
  • sumMap
  • minMap
  • maxMap
  • skewSamp
  • skewPop
  • kurtSamp
  • kurtPop
  • uniq
  • uniqExact
  • uniqCombined
  • uniqCombined64
  • uniqHLL12
  • quantile
  • quantiles​
  • quantilesExactExclusive​
  • quantilesExactInclusive​
  • quantileExactWeighted
  • quantileTiming
  • quantileTimingWeighted
  • quantilesTimingWeighted
  1. ROCKET.BI
  2. SQL Query

Aggregate Functions

PreviousFunctionsNextData Types

Last updated 2 years ago

count

Counts the number of rows or not-NULL values.

ClickHouse supports the following syntaxes for count:

  • count(expr) or COUNT(DISTINCT expr).

  • count() or COUNT(*). The count() syntax is ClickHouse-specific.

Arguments

The function can take:

  • Zero parameters.

  • One .

Returned value

  • If the function is called without parameters it counts the number of rows.

  • If the is passed, then the function counts how many times this expression returned not null. If the expression returns a -type value, then the result of count stays not Nullable. The function returns 0 if the expression returned NULL for all the rows.

In both cases the type of the returned value is .

Details

ClickHouse supports the COUNT(DISTINCT ...) syntax. The behavior of this construction depends on the setting. It defines which of the functions is used to perform the operation. The default is the function.

The SELECT count() FROM table query is optimized by default using metadata from MergeTree. If you need to use row-level security, disable optimization using the setting.

Examples

Example 1:

SELECT count() FROM t
┌─count()─┐
│       5 │
└─────────┘

Example 2:

SELECT name, value FROM system.settings WHERE name = 'count_distinct_implementation'
┌─name──────────────────────────┬─value─────┐
│ count_distinct_implementation │ uniqExact │
└───────────────────────────────┴───────────┘
SELECT count(DISTINCT num) FROM t
┌─uniqExact(num)─┐
│              3 │
└────────────────┘

This example shows that count(DISTINCT num) is performed by the uniqExact function according to the count_distinct_implementation setting value.

min

Aggregate function that calculates the minimum across a group of values.

Example:

SELECT min(salary) FROM employees;
SELECT department, min(salary) FROM employees GROUP BY department;

If you need non-aggregate function to choose a minimum of two values, see least:

SELECT least(a, b) FROM table;

max

Aggregate function that calculates the maximum across a group of values.

Example:

SELECT max(salary) FROM employees;
SELECT department, max(salary) FROM employees GROUP BY department;

If you need non-aggregate function to choose a maximum of two values, see greatest:

SELECT greatest(a, b) FROM table;

sum

Calculates the sum. Only works for numbers.

avg

Calculates the arithmetic mean.

Syntax

avg(x)

Arguments

Returned value

  • NaN if the input parameter x is empty.

Example

Query:

SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5);

Result:

┌─avg(x)─┐
│    2.5 │
└────────┘

Example

Create a temp table:

Query:

CREATE table test (t UInt8) ENGINE = Memory;

Get the arithmetic mean:

Query:

SELECT avg(t) FROM test;

Result:

┌─avg(x)─┐
│    nan │
└────────┘

any

Selects the first encountered (non-NULL) value, unless all rows have NULL values in that column. The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate. To get a determinate result, you can use the ‘min’ or ‘max’ function instead of ‘any’.

In some cases, you can rely on the order of execution. This applies to cases when SELECT comes from a subquery that uses ORDER BY.

When a SELECT query has the GROUP BY clause or at least one aggregate function, ClickHouse (in contrast to MySQL) requires that all expressions in the SELECT, HAVING, and ORDER BY clauses be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions. To get behavior like in MySQL, you can put the other columns in the any aggregate function.

stddevPop

NOTE

stddevSamp

NOTE

varPop(x)

Calculates the amount Σ((x - x̅)^2) / n, where n is the sample size and x̅is the average value of x.

In other words, dispersion for a set of values. Returns Float64.

NOTE

varSamp

Calculates the amount Σ((x - x̅)^2) / (n - 1), where n is the sample size and x̅is the average value of x.

It represents an unbiased estimate of the variance of a random variable if passed values from its sample.

Returns Float64. When n <= 1, returns +∞.

NOTE

covarPop

Syntax: covarPop(x, y)

Calculates the value of Σ((x - x̅)(y - y̅)) / n.

NOTE

covarSamp

Calculates the value of Σ((x - x̅)(y - y̅)) / (n - 1).

Returns Float64. When n <= 1, returns +∞.

NOTE

anyHeavy

anyHeavy(column)

Arguments

  • column – The column name.

Example

SELECT anyHeavy(AirlineID) AS res
FROM ontime
┌───res─┐
│ 19690 │
└───────┘

anyLast

argMin

Calculates the arg value for a minimum val value. If there are several different values of arg for minimum values of val, returns the first of these values encountered.

Syntax

argMin(arg, val)

Arguments

  • arg — Argument.

  • val — Value.

Returned value

  • arg value that corresponds to minimum val value.

Type: matches arg type.

Example

Input table:

┌─user─────┬─salary─┐
│ director │   5000 │
│ manager  │   3000 │
│ worker   │   1000 │
└──────────┴────────┘

Query:

SELECT argMin(user, salary) FROM salary

Result:

┌─argMin(user, salary)─┐
│ worker               │
└──────────────────────┘

avgWeighted

Syntax

avgWeighted(x, weight)

Arguments

  • x — Values.

  • weight — Weights of the values.

Returned value

  • NaN if all the weights are equal to 0 or the supplied weights parameter is empty.

  • Weighted mean otherwise.

Example

Query:

SELECT avgWeighted(x, w)
FROM values('x Int8, w Int8', (4, 1), (1, 0), (10, 2))

Result:

┌─avgWeighted(x, weight)─┐
│                      8 │
└────────────────────────┘

Example

Query:

SELECT avgWeighted(x, w)
FROM values('x Int8, w Float64', (4, 1), (1, 0), (10, 2))

Result:

┌─avgWeighted(x, weight)─┐
│                      8 │
└────────────────────────┘

Example

Query:

SELECT avgWeighted(x, w)
FROM values('x Int8, w Int8', (0, 0), (1, 0), (10, 0))

Result:

┌─avgWeighted(x, weight)─┐
│                    nan │
└────────────────────────┘

Example

Query:

CREATE table test (t UInt8) ENGINE = Memory;
SELECT avgWeighted(t) FROM test

Result:

┌─avgWeighted(x, weight)─┐
│                    nan │
└────────────────────────┘

topK

Returns an array of the approximately most frequent values in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves).

topK(N)(column)

This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.

We recommend using the N < 10 value; performance is reduced with large N values. Maximum value of N = 65536.

Arguments

  • N – The number of elements to return.

If the parameter is omitted, default value 10 is used.

Arguments

  • x – The value to calculate frequency.

Example

SELECT topK(3)(AirlineID) AS res
FROM ontime
┌─res─────────────────┐
│ [19393,19790,19805] │
└─────────────────────┘

topKWeighted

Returns an array of the approximately most frequent values in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves). Additionally, the weight of the value is taken into account.

Syntax

topKWeighted(N)(x, weight)

Arguments

  • N — The number of elements to return.

  • x — The value.

Returned value

Returns an array of the values with maximum approximate sum of weights.

Example

Query:

SELECT topKWeighted(10)(number, number) FROM numbers(1000)

Result:

┌─topKWeighted(10)(number, number)──────────┐
│ [999,998,997,996,995,994,993,992,991,990] │
└───────────────────────────────────────────┘

See Also

groupArray

Syntax: groupArray(x) or groupArray(max_size)(x)

Creates an array of argument values. Values can be added to the array in any (indeterminate) order.

The second version (with the max_size parameter) limits the size of the resulting array to max_size elements. For example, groupArray(1)(x) is equivalent to [any (x)].

In some cases, you can still rely on the order of execution. This applies to cases when SELECT comes from a subquery that uses ORDER BY.

Example

SELECT * FROM default.ck;

┌─id─┬─name─────┐
│  1 │ zhangsan │
│  1 │ ᴺᵁᴸᴸ     │
│  1 │ lisi     │
│  2 │ wangwu   │
└────┴──────────┘

Query:

select id, groupArray(10)(name) from default.ck group by id;

Result:

┌─id─┬─groupArray(10)(name)─┐
│  1 │ ['zhangsan','lisi']  │
│  2 │ ['wangwu']           │
└────┴──────────────────────┘

The groupArray function will remove ᴺᵁᴸᴸ value based on the above results.

groupUniqArray

Syntax: groupUniqArray(x) or groupUniqArray(max_size)(x)

The second version (with the max_size parameter) limits the size of the resulting array to max_size elements. For example, groupUniqArray(1)(x) is equivalent to [any(x)].

groupArrayInsertAt

Inserts a value into the array at the specified position.

Syntax

groupArrayInsertAt(default_x, size)(x, pos)

If in one query several values are inserted into the same position, the function behaves in the following ways:

  • If a query is executed in a single thread, the first one of the inserted values is used.

  • If a query is executed in multiple threads, the resulting value is an undetermined one of the inserted values.

Arguments

Returned value

  • Array with inserted values.

Example

Query:

SELECT groupArrayInsertAt(toString(number), number * 2) FROM numbers(5);

Result:

┌─groupArrayInsertAt(toString(number), multiply(number, 2))─┐
│ ['0','','1','','2','','3','','4']                         │
└───────────────────────────────────────────────────────────┘

Query:

SELECT groupArrayInsertAt('-')(toString(number), number * 2) FROM numbers(5);

Result:

┌─groupArrayInsertAt('-')(toString(number), multiply(number, 2))─┐
│ ['0','-','1','-','2','-','3','-','4']                          │
└────────────────────────────────────────────────────────────────┘

Query:

SELECT groupArrayInsertAt('-', 5)(toString(number), number * 2) FROM numbers(5);

Result:

┌─groupArrayInsertAt('-', 5)(toString(number), multiply(number, 2))─┐
│ ['0','-','1','-','2']                                             │
└───────────────────────────────────────────────────────────────────┘

Multi-threaded insertion of elements into one position.

Query:

SELECT groupArrayInsertAt(number, 0) FROM numbers_mt(10) SETTINGS max_block_size = 1;

As a result of this query you get random integer in the [0,9] range. For example:

┌─groupArrayInsertAt(number, 0)─┐
│ [7]                           │
└───────────────────────────────┘

groupArrayMovingAvg

Calculates the moving average of input values.

groupArrayMovingAvg(numbers_for_summing)
groupArrayMovingAvg(window_size)(numbers_for_summing)

The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of rows in the column.

Arguments

  • window_size — Size of the calculation window.

Returned values

  • Array of the same size and type as the input data.

Example

The sample table b:

CREATE TABLE t
(
    `int` UInt8,
    `float` Float32,
    `dec` Decimal32(2)
)
ENGINE = TinyLog
┌─int─┬─float─┬──dec─┐
│   1 │   1.1 │ 1.10 │
│   2 │   2.2 │ 2.20 │
│   4 │   4.4 │ 4.40 │
│   7 │  7.77 │ 7.77 │
└─────┴───────┴──────┘

The queries:

SELECT
    groupArrayMovingAvg(int) AS I,
    groupArrayMovingAvg(float) AS F,
    groupArrayMovingAvg(dec) AS D
FROM t
┌─I─────────┬─F───────────────────────────────────┬─D─────────────────────┐
│ [0,0,1,3] │ [0.275,0.82500005,1.9250001,3.8675] │ [0.27,0.82,1.92,3.86] │
└───────────┴─────────────────────────────────────┴───────────────────────┘
SELECT
    groupArrayMovingAvg(2)(int) AS I,
    groupArrayMovingAvg(2)(float) AS F,
    groupArrayMovingAvg(2)(dec) AS D
FROM t
┌─I─────────┬─F────────────────────────────────┬─D─────────────────────┐
│ [0,1,3,5] │ [0.55,1.6500001,3.3000002,6.085] │ [0.55,1.65,3.30,6.08] │
└───────────┴──────────────────────────────────┴───────────────────────┘

sumWithOverflow

Computes the sum of the numbers, using the same data type for the result as for the input parameters. If the sum exceeds the maximum value for this data type, it is calculated with overflow.

Only works for numbers.

sumMap

Syntax: sumMap(key, value) or sumMap(Tuple(key, value))

Totals the value array according to the keys specified in the key array.

Passing tuple of keys and values arrays is a synonym to passing two arrays of keys and values.

The number of elements in key and value must be the same for each row that is totaled.

Returns a tuple of two arrays: keys in sorted order, and values ​​summed for the corresponding keys.

Example:

CREATE TABLE sum_map(
    date Date,
    timeslot DateTime,
    statusMap Nested(
        status UInt16,
        requests UInt64
    ),
    statusMapTuple Tuple(Array(Int32), Array(Int32))
) ENGINE = Log;
INSERT INTO sum_map VALUES
    ('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10], ([1, 2, 3], [10, 10, 10])),
    ('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10], ([3, 4, 5], [10, 10, 10])),
    ('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10], ([4, 5, 6], [10, 10, 10])),
    ('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10], ([6, 7, 8], [10, 10, 10]));

SELECT
    timeslot,
    sumMap(statusMap.status, statusMap.requests),
    sumMap(statusMapTuple)
FROM sum_map
GROUP BY timeslot
┌────────────timeslot─┬─sumMap(statusMap.status, statusMap.requests)─┬─sumMap(statusMapTuple)─────────┐
│ 2000-01-01 00:00:00 │ ([1,2,3,4,5],[10,10,20,10,10])               │ ([1,2,3,4,5],[10,10,20,10,10]) │
│ 2000-01-01 00:01:00 │ ([4,5,6,7,8],[10,10,20,10,10])               │ ([4,5,6,7,8],[10,10,20,10,10]) │
└─────────────────────┴──────────────────────────────────────────────┴────────────────────────────────┘

minMap

Syntax: minMap(key, value) or minMap(Tuple(key, value))

Calculates the minimum from value array according to the keys specified in the key array.

Passing a tuple of keys and value ​​arrays is identical to passing two arrays of keys and values.

The number of elements in key and value must be the same for each row that is totaled.

Returns a tuple of two arrays: keys in sorted order, and values calculated for the corresponding keys.

Example:

SELECT minMap(a, b)
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
┌─minMap(a, b)──────┐
│ ([1,2,3],[2,1,1]) │
└───────────────────┘

maxMap

Syntax: maxMap(key, value) or maxMap(Tuple(key, value))

Calculates the maximum from value array according to the keys specified in the key array.

Passing a tuple of keys and value arrays is identical to passing two arrays of keys and values.

The number of elements in key and value must be the same for each row that is totaled.

Returns a tuple of two arrays: keys and values calculated for the corresponding keys.

Example:

SELECT maxMap(a, b)
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
┌─maxMap(a, b)──────┐
│ ([1,2,3],[2,2,1]) │
└───────────────────┘

skewSamp

It represents an unbiased estimate of the skewness of a random variable if passed values form its sample.

skewSamp(expr)

Arguments

Returned value

Example

SELECT skewSamp(value) FROM series_with_value_column;

skewPop

skewPop(expr)

Arguments

Returned value

Example

SELECT skewPop(value) FROM series_with_value_column;

kurtSamp

It represents an unbiased estimate of the kurtosis of a random variable if passed values form its sample.

kurtSamp(expr)

Arguments

Returned value

Example

SELECT kurtSamp(value) FROM series_with_value_column;

kurtPop

kurtPop(expr)

Arguments

Returned value

Example

SELECT kurtPop(value) FROM series_with_value_column;

uniq

Calculates the approximate number of different values of the argument.

uniq(x[, ...])

Arguments

The function takes a variable number of parameters. Parameters can be Tuple, Array, Date, DateTime, String, or numeric types.

Returned value

Implementation details

Function:

  • Calculates a hash for all parameters in the aggregate, then uses it in calculations.

  • Uses an adaptive sampling algorithm. For the calculation state, the function uses a sample of element hash values up to 65536. This algorithm is very accurate and very efficient on the CPU. When the query contains several of these functions, using uniq is almost as fast as using other aggregate functions.

  • Provides the result deterministically (it does not depend on the query processing order).

We recommend using this function in almost all scenarios.

uniqExact

Calculates the exact number of different argument values.

uniqExact(x[, ...])

The uniqExact function uses more memory than uniq, because the size of the state has unbounded growth as the number of different values increases.

Arguments

The function takes a variable number of parameters. Parameters can be Tuple, Array, Date, DateTime, String, or numeric types.

uniqCombined

Calculates the approximate number of different argument values.

uniqCombined(HLL_precision)(x[, ...])

The uniqCombined function is a good choice for calculating the number of different values.

Arguments

The function takes a variable number of parameters. Parameters can be Tuple, Array, Date, DateTime, String, or numeric types.

Returned value

Implementation details

Function:

  • Calculates a hash (64-bit hash for String and 32-bit otherwise) for all parameters in the aggregate, then uses it in calculations.

  • Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.

    For a small number of distinct elements, an array is used. When the set size is larger, a hash table is used. For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
  • Provides the result deterministically (it does not depend on the query processing order).

NOTE

  • Consumes several times less memory.

  • Calculates with several times higher accuracy.

  • Usually has slightly lower performance. In some scenarios, uniqCombined can perform better than uniq, for example, with distributed queries that transmit a large number of aggregation states over the network.

uniqCombined64

uniqHLL12

uniqHLL12(x[, ...])

Arguments

The function takes a variable number of parameters. Parameters can be Tuple, Array, Date, DateTime, String, or numeric types.

Returned value

Implementation details

Function:

  • Calculates a hash for all parameters in the aggregate, then uses it in calculations.

  • Uses the HyperLogLog algorithm to approximate the number of different argument values.

    2^12 5-bit cells are used. The size of the state is slightly more than 2.5 KB. The result is not very accurate (up to ~10% error) for small data sets (<10K elements). However, the result is fairly accurate for high-cardinality data sets (10K-100M), with a maximum error of ~1.6%. Starting from 100M, the estimation error increases, and the function will return very inaccurate results for data sets with extremely high cardinality (1B+ elements).
  • Provides the determinate result (it does not depend on the query processing order).

quantile

Note that for an empty numeric sequence, quantile will return NaN, but its quantile* variants will return either NaN or a default value for the sequence type, depending on the variant.

Syntax

quantile(level)(expr)

Alias: median.

Arguments

Returned value

  • Approximate quantile of the specified level.

Type:

Example

Input table:

┌─val─┐
│   1 │
│   1 │
│   2 │
│   3 │
└─────┘

Query:

SELECT quantile(val) FROM t

Result:

┌─quantile(val)─┐
│           1.5 │
└───────────────┘

Syntax: quantiles(level1, level2, …)(x)

All the quantile functions also have corresponding quantiles functions: quantiles, quantilesDeterministic, quantilesTiming, quantilesTimingWeighted, quantilesExact, quantilesExactWeighted, quantilesTDigest, quantilesBFloat16. These functions calculate all the quantiles of the listed levels in one pass, and return an array of the resulting values.

To get exact value, all the passed values ​​are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is a number of values that were passed. However, for a small number of values, the function is very effective.

Syntax

quantilesExactExclusive(level1, level2, ...)(expr)

Arguments

Parameters

Returned value

Type of array values:

Example

Query:

CREATE TABLE num AS numbers(1000);

SELECT quantilesExactExclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x) FROM (SELECT number AS x FROM num);

Result:

┌─quantilesExactExclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x)─┐
│ [249.25,499.5,749.75,899.9,949.9499999999999,989.99,998.999]        │
└─────────────────────────────────────────────────────────────────────┘

To get exact value, all the passed values ​​are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is a number of values that were passed. However, for a small number of values, the function is very effective.

Syntax

quantilesExactInclusive(level1, level2, ...)(expr)

Arguments

Parameters

Returned value

Type of array values:

Example

Query:

CREATE TABLE num AS numbers(1000);

SELECT quantilesExactInclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x) FROM (SELECT number AS x FROM num);

Result:

┌─quantilesExactInclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x)─┐
│ [249.75,499.5,749.25,899.1,949.05,989.01,998.001]                   │

quantileExactWeighted

Syntax

quantileExactWeighted(level)(expr, weight)

Alias: medianExactWeighted.

Arguments

  • weight — Column with weights of sequence members. Weight is a number of value occurrences.

Returned value

  • Quantile of the specified level.

Type:

Example

Input table:

┌─n─┬─val─┐
│ 0 │   3 │
│ 1 │   2 │
│ 2 │   1 │
│ 5 │   4 │
└───┴─────┘

Query:

SELECT quantileExactWeighted(n, val) FROM t

Result:

┌─quantileExactWeighted(n, val)─┐
│                             1 │
└───────────────────────────────┘

quantileTiming

The result is deterministic (it does not depend on the query processing order). The function is optimized for working with sequences which describe distributions like loading web pages times or backend response times.

Syntax

quantileTiming(level)(expr)

Alias: medianTiming.

Arguments

    • If negative values are passed to the function, the behavior is undefined.

    • If the value is greater than 30,000 (a page loading time of more than 30 seconds), it is assumed to be 30,000.

Accuracy

The calculation is accurate if:

  • Total number of values does not exceed 5670.

  • Total number of values exceeds 5670, but the page loading time is less than 1024ms.

Otherwise, the result of the calculation is rounded to the nearest multiple of 16 ms.

NOTE

Returned value

  • Quantile of the specified level.

Type: Float32.

NOTE

Example

Input table:

┌─response_time─┐
│            72 │
│           112 │
│           126 │
│           145 │
│           104 │
│           242 │
│           313 │
│           168 │
│           108 │
└───────────────┘

Query:

SELECT quantileTiming(response_time) FROM t

Result:

┌─quantileTiming(response_time)─┐
│                           126 │
└───────────────────────────────┘

quantileTimingWeighted

The result is deterministic (it does not depend on the query processing order). The function is optimized for working with sequences which describe distributions like loading web pages times or backend response times.

Syntax

quantileTimingWeighted(level)(expr, weight)

Alias: medianTimingWeighted.

Arguments

  • - If negative values are passed to the function, the behavior is undefined.
    - If the value is greater than 30,000 (a page loading time of more than 30 seconds), it is assumed to be 30,000.
  • weight — Column with weights of sequence elements. Weight is a number of value occurrences.

Accuracy

The calculation is accurate if:

  • Total number of values does not exceed 5670.

  • Total number of values exceeds 5670, but the page loading time is less than 1024ms.

Otherwise, the result of the calculation is rounded to the nearest multiple of 16 ms.

NOTE

Returned value

  • Quantile of the specified level.

Type: Float32.

NOTE

Example

Input table:

┌─response_time─┬─weight─┐
│            68 │      1 │
│           104 │      2 │
│           112 │      3 │
│           126 │      2 │
│           138 │      1 │
│           162 │      1 │
└───────────────┴────────┘

Query:

SELECT quantileTimingWeighted(response_time, weight) FROM t

Result:

┌─quantileTimingWeighted(response_time, weight)─┐
│                                           112 │
└───────────────────────────────────────────────┘

quantilesTimingWeighted

Same as quantileTimingWeighted, but accept multiple parameters with quantile levels and return an Array filled with many values of that quantiles.

Example

Input table:

┌─response_time─┬─weight─┐
│            68 │      1 │
│           104 │      2 │
│           112 │      3 │
│           126 │      2 │
│           138 │      1 │
│           162 │      1 │
└───────────────┴────────┘

Query:

SELECT quantilesTimingWeighted(0,5, 0.99)(response_time, weight) FROM t

Result:

┌─quantilesTimingWeighted(0.5, 0.99)(response_time, weight)─┐
│ [112,162]                                                 │
└───────────────────────────────────────────────────────────┘

However SELECT count(nullable_column) FROM table query can be optimized by enabling the setting. With optimize_functions_to_subcolumns = 1 the function reads only subcolumn instead of reading and processing the whole column data. The query SELECT count(n) FROM table transforms to SELECT sum(NOT n.null) FROM table.

x — input values, must be , , or .

The arithmetic mean, always as .

The result is equal to the square root of .

This function uses a numerically unstable algorithm. If you need in calculations, use the stddevPopStable function. It works slower but provides a lower computational error.

The result is equal to the square root of .

This function uses a numerically unstable algorithm. If you need in calculations, use the stddevSampStable function. It works slower but provides a lower computational error.

This function uses a numerically unstable algorithm. If you need in calculations, use the varPopStable function. It works slower but provides a lower computational error.

This function uses a numerically unstable algorithm. If you need in calculations, use the varSampStable function. It works slower but provides a lower computational error.

This function uses a numerically unstable algorithm. If you need in calculations, use the covarPopStable function. It works slower but provides a lower computational error.

This function uses a numerically unstable algorithm. If you need in calculations, use the covarSampStable function. It works slower but provides a lower computational error.

Selects a frequently occurring value using the algorithm. If there is a value that occurs more than in half the cases in each of the query’s execution threads, this value is returned. Normally, the result is nondeterministic.

Take the data set and select any frequently occurring value in the AirlineID column.

Selects the last value encountered. The result is just as indeterminate as for the function.

Calculates the .

x and weight must both be , , or , but may have different types.

Return type is always .

Implements the algorithm for analyzing TopK, based on the reduce-and-combine algorithm from .

Take the data set and select the three most frequently occurring values in the AirlineID column.

weight — The weight. Every value is accounted weight times for frequency calculation. .

Creates an array from different argument values. Memory consumption is the same as for the function.

x — Value to be inserted. resulting in one of the .

pos — Position at which the specified element x is to be inserted. Index numbering in the array starts from zero. .

default_x — Default value for substituting in empty positions. Optional parameter. resulting in the data type configured for the x parameter. If default_x is not defined, the are used.

size — Length of the resulting array. Optional parameter. When using this parameter, the default value default_x must be specified. .

Type: .

numbers_for_summing — resulting in a numeric data type value.

The function uses . It truncates the decimal places insignificant for the resulting data type.

Computes the of a sequence.

expr — returning a number.

The skewness of the given distribution. Type — . If n <= 1 (n is the size of the sample), then the function returns nan.

Computes the of a sequence.

expr — returning a number.

The skewness of the given distribution. Type —

Computes the of a sequence.

expr — returning a number.

The kurtosis of the given distribution. Type — . If n <= 1 (n is a size of the sample), then the function returns nan.

Computes the of a sequence.

expr — returning a number.

The kurtosis of the given distribution. Type —

A -type number.

Use the uniqExact function if you absolutely need an exact result. Otherwise use the function.

HLL_precision is the base-2 logarithm of the number of cells in . Optional, you can use the function as uniqCombined(x[, ...]). The default value for HLL_precision is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).

A number -type number.

Since it uses 32-bit hash for non-String type, the result will have very high error for cardinalities significantly larger than UINT_MAX (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use

Compared to the function, the uniqCombined:

Same as , but uses 64-bit hash for all data types.

Calculates the approximate number of different argument values, using the algorithm.

A -type number.

We do not recommend using this function. In most cases, use the or function.

Computes an approximate of a numeric data sequence.

This function applies with a reservoir size up to 8192 and a random number generator for sampling. The result is non-deterministic. To get an exact quantile, use the function.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the function.

level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates .

expr — Expression over the column values resulting in numeric , or .

for numeric data type input.

if input values have the Date type.

if input values have the DateTime type.

quantiles

quantilesExactExclusive

Exactly computes the of a numeric data sequence.

This function is equivalent to Excel function, ().

Works more efficiently with sets of levels than .

expr — Expression over the column values resulting in numeric , or .

level — Levels of quantiles. Possible values: (0, 1) — bounds not included. .

of quantiles of the specified levels.

for numeric data type input.

if input values have the Date type.

if input values have the DateTime type.

quantilesExactInclusive

Exactly computes the of a numeric data sequence.

This function is equivalent to Excel function, ().

Works more efficiently with sets of levels than .

expr — Expression over the column values resulting in numeric , or .

level — Levels of quantiles. Possible values: [0, 1] — bounds included. .

of quantiles of the specified levels.

for numeric data type input.

if input values have the Date type.

if input values have the DateTime type.

Exactly computes the of a numeric data sequence, taking into account the weight of each element.

To get exact value, all the passed values ​​are combined into an array, which is then partially sorted. Each value is counted with its weight, as if it is present weight times. A hash table is used in the algorithm. Because of this, if the passed values ​​are frequently repeated, the function consumes less RAM than . You can use this function instead of quantileExact and specify the weight 1.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the function.

level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates .

expr — Expression over the column values resulting in numeric , or .

for numeric data type input.

if input values have the Date type.

if input values have the DateTime type.

With the determined precision computes the of a numeric data sequence.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the function.

level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates .

expr — over a column values returning a -type number.

For calculating page loading time quantiles, this function is more effective and accurate than .

If no values are passed to the function (when using quantileTimingIf), is returned. The purpose of this is to differentiate these cases from cases that result in zero. See for notes on sorting NaN values.

With the determined precision computes the of a numeric data sequence according to the weight of each sequence member.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the function.

level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates .

expr — over a column values returning a -type number.

For calculating page loading time quantiles, this function is more effective and accurate than .

If no values are passed to the function (when using quantileTimingIf), is returned. The purpose of this is to differentiate these cases from cases that result in zero. See for notes on sorting NaN values.

expression
expression
Nullable
UInt64
count_distinct_implementation
uniq*
uniqExact
optimize_trivial_count_query
optimize_functions_to_subcolumns
null
Integer
Float
Decimal
Float64
varPop
numerical stability
varSamp
numerical stability
numerical stability
numerical stability
numerical stability
numerical stability
heavy hitters
OnTime
any
weighted arithmetic mean
Integer
floating-point
Decimal
Float64
Filtered Space-Saving
Parallel Space Saving
OnTime
UInt64
topK
uniqExact
Expression
supported data types
UInt32
Expression
default values
UInt32
Array
Expression
rounding towards zero
sample skewness
Expression
Float64
skewness
Expression
Float64
sample kurtosis
Expression
Float64
kurtosis
Expression
Float64
UInt64
uniq
HyperLogLog
UInt64
uniqCombined64
uniq
uniqCombined
HyperLogLog
UInt64
uniq
uniqCombined
quantile
reservoir sampling
quantileExact
quantiles
median
data types
Date
DateTime
Float64
Date
DateTime
​
​
quantiles
PERCENTILE.EXC
type R6
quantileExactExclusive
data types
Date
DateTime
Float
Array
Float64
Date
DateTime
​
quantiles
PERCENTILE.INC
type R7
quantileExactInclusive
data types
Date
DateTime
Float
Array
Float64
Date
DateTime
quantile
quantileExact
quantiles
median
data types
Date
DateTime
Float64
Date
DateTime
quantile
quantiles
median
Expression
Float*
quantile
NaN
ORDER BY clause
quantile
quantiles
median
Expression
Float*
quantile
NaN
ORDER BY clause