Functions

There are at least* two types of functions - regular functions (they are just called “functions”) and aggregate functions. These are completely different concepts. Regular functions work as if they are applied to each row separately (for each row, the result of the function does not depend on the other rows). Aggregate functions accumulate a set of values from various rows (i.e. they depend on the entire set of rows).

In this section we discuss regular functions. For aggregate functions, see the section “Aggregate functions”.

* - There is a third type of function that the ‘arrayJoin’ function belongs to; table functions can also be mentioned separately.*

ARITHMETIC

For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it floats. If there are not enough bits, the highest bit type is taken.

Example

SELECT toTypeName(0), toTypeName(0 + 0), toTypeName(0 + 0 + 0), toTypeName(0 + 0 + 0 + 0)
┌─toTypeName(0)─┬─toTypeName(plus(0, 0))─┬─toTypeName(plus(plus(0, 0), 0))─┬─toTypeName(plus(plus(plus(0, 0), 0), 0))─┐
│ UInt8         │ UInt16                 │ UInt32                          │ UInt64                                   │
└───────────────┴────────────────────────┴─────────────────────────────────┴──────────────────────────────────────────┘

Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.

Overflow is produced the same way as in C++.

plus(a, b), a + b operator

Calculates the sum of the numbers. You can also add integer numbers with a date or date and time. In the case of a date, adding an integer means adding the corresponding number of days. For a date with time, it means adding the corresponding number of seconds.

Example

"plus(1,2) = 3"

minus(a, b), a - b operator

Calculates the difference. The result is always signed.

You can also calculate integer numbers from a date or date with time. The idea is the same – see above for ‘plus’.

Example

"minus(5,2) = 3"

multiply(a, b), a * b operator

Calculates the product of the numbers.

Example

"divide(50,2) = 2.5e+01"

divide(a, b), a / b operator

Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the ‘intDiv’ function. When dividing by zero you get ‘inf’, ‘-inf’, or ‘nan’.

Example

"divide(50,2) = 2.5e+01"

intDiv(a, b)

Calculates the quotient of the numbers. Divides into integers, rounding down (by the absolute value). An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

Example

"intDiv(10, -2) = -5"

intDivOrZero(a, b)

Differs from ‘intDiv’ in that it returns zero when dividing by zero or when dividing a minimal negative number by minus one.

Example

"intDivOrZero(10, -2) = -5"

modulo(a, b), a % b operator

Calculates the remainder when dividing a by b. The result type is an integer if both inputs are integers. If one of the inputs is a floating-point number, the result is a floating-point number. The remainder is computed like in C++. Truncated division is used for negative numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

Example

"modulo(10, 3) = 1"

moduloOrZero(a, b)

Differs from modulo in that it returns zero when the divisor is zero.

Example

"moduloOrZero(10, 5) = 0"

negate(a), -a operator

Calculates a number with the reverse sign. The result is always signed.

Example

"negate(20) = -20"

abs(a)

Calculates the absolute value of the number (a). That is, if a \< 0, it returns -a. For unsigned types it does not do anything. For signed integer types, it returns an unsigned number.

Example

"abs(-2) = 2"

gcd(a, b)

Returns the greatest common divisor of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

Example

"gcd(27,18) = 9"

lcm(a, b)

Returns the least common multiple of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

Example

"lcm(27,18) = 54"

ARRAY

empty

Checks whether the input array is empty.

Syntax

empty([x])

An array is considered empty if it does not contain any elements.

NOTE

Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1 the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT empty(arr) FROM TABLE; transforms to SELECT arr.size0 = 0 FROM TABLE;.

The function also works for strings or UUID.

Arguments

  • [x] — Input array. Array.

Returned value

  • Returns 1 for an empty array or 0 for a non-empty array.

Type: UInt8.

Example

Query:

SELECT empty([]);

Result:

┌─empty(array())─┐
│              1 │
└────────────────┘

notEmpty

Checks whether the input array is non-empty.

Syntax

notEmpty([x])

An array is considered non-empty if it contains at least one element.

NOTE

Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1 the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT notEmpty(arr) FROM table transforms to SELECT arr.size0 != 0 FROM TABLE.

The function also works for strings or UUID.

Arguments

  • [x] — Input array. Array.

Returned value

  • Returns 1 for a non-empty array or 0 for an empty array.

Type: UInt8.

Example

Query:

SELECT notEmpty([1,2]);

Result:

┌─notEmpty([1, 2])─┐
│                1 │
└──────────────────┘

length

Returns the number of items in the array. The result type is UInt64. The function also works for strings.

Syntax

length(string)

Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1 the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT length(arr) FROM table transforms to SELECT arr.size0 FROM TABLE.

Example

length(\"ABC Corporation\")

emptyArrayUInt8, emptyArrayUInt16, emptyArrayUInt32, emptyArrayUInt64

emptyArrayInt8, emptyArrayInt16, emptyArrayInt32, emptyArrayInt64

emptyArrayFloat32, emptyArrayFloat64

emptyArrayDate, emptyArrayDateTime

emptyArrayString

Accepts zero arguments and returns an empty array of the appropriate type.

emptyArrayToSingle

Accepts an empty array and returns a one-element array that is equal to the default value.

range(end), range([start, ] end [, step])

Returns an array of UInt numbers from start to end - 1 by step.

Syntax

range([start, ] end [, step])

Arguments

  • start — The first element of the array. Optional, required if step is used. Default value: 0. UInt

  • end — The number before which the array is constructed. Required. UInt

  • step — Determines the incremental step between each element in the array. Optional. Default value: 1. UInt

Returned value

  • Array of UInt numbers from start to end - 1 by step.

Implementation details

  • All arguments must be positive values: start, end, step are UInt data types, as well as elements of the returned array.

  • An exception is thrown if query results in arrays with a total length of more than number of elements specified by the function_range_max_elements_in_block setting.

Examples

Query:

SELECT range(5), range(1, 5), range(1, 5, 2);

Result:

┌─range(5)────┬─range(1, 5)─┬─range(1, 5, 2)─┐
│ [0,1,2,3,4] │ [1,2,3,4]   │ [1,3]          │
└─────────────┴─────────────┴────────────────┘

array(x1, …), operator [x1, …]

Creates an array from the function arguments. The arguments must be constants and have types that have the smallest common type. At least one argument must be passed, because otherwise it isn’t clear which type of array to create. That is, you can’t use this function to create an empty array (to do that, use the ‘emptyArray*’ function described above). Returns an ‘Array(T)’ type result, where ‘T’ is the smallest common type out of the passed arguments.

Example

SELECT array(1,2,3);

arrayConcat

Combines arrays passed as arguments.

arrayConcat(arrays)

Arguments

  • arrays – Arbitrary number of arguments of Array type. Example

SELECT arrayConcat([1, 2], [3, 4], [5, 6]) AS res
┌─res───────────┐
│ [1,2,3,4,5,6] │
└───────────────┘

has(arr, elem)

Checks whether the ‘arr’ array has the ‘elem’ element. Returns 0 if the element is not in the array, or 1 if it is.

NULL is processed as a value.

SELECT has([1, 2, NULL], NULL)
┌─has([1, 2, NULL], NULL)─┐
│                       1 │
└─────────────────────────┘

hasAll

Checks whether one array is a subset of another.

hasAll(set, subset)

Arguments

  • set – Array of any type with a set of elements.

  • subset – Array of any type with elements that should be tested to be a subset of set.

Return values

  • 1, if set contains all of the elements from subset.

  • 0, otherwise.

Peculiar properties

  • An empty array is a subset of any array.

  • Null processed as a value.

  • Order of values in both of arrays does not matter.

Examples

SELECT hasAll([], []) returns 1.

SELECT hasAll([1, Null], [Null]) returns 1.

SELECT hasAll([1.0, 2, 3, 4], [1, 3]) returns 1.

SELECT hasAll(['a', 'b'], ['a']) returns 1.

SELECT hasAll([1], ['a']) returns 0.

SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [3, 5]]) returns 0.

hasAny

Checks whether two arrays have intersection by some elements.

hasAny(array1, array2)

Arguments

  • array1 – Array of any type with a set of elements.

  • array2 – Array of any type with a set of elements.

Return values

  • 1, if array1 and array2 have one similar element at least.

  • 0, otherwise.

Peculiar properties

  • Null processed as a value.

  • Order of values in both of arrays does not matter.

Examples

SELECT hasAny([1], []) returns 0.

SELECT hasAny([Null], [Null, 1]) returns 1.

SELECT hasAny([-128, 1., 512], [1]) returns 1.

SELECT hasAny([[1, 2], [3, 4]], ['a', 'c']) returns 0.

SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [1, 2]]) returns 1.

hasSubstr

Checks whether all the elements of array2 appear in array1 in the same exact order. Therefore, the function will return 1, if and only if array1 = prefix + array2 + suffix.

hasSubstr(array1, array2)

In other words, the functions will check whether all the elements of array2 are contained in array1 like the hasAll function. In addition, it will check that the elements are observed in the same order in both array1 and array2.

For Example:

  • hasSubstr([1,2,3,4], [2,3]) returns 1. However, hasSubstr([1,2,3,4], [3,2]) will return 0.

  • hasSubstr([1,2,3,4], [1,2,3]) returns 1. However, hasSubstr([1,2,3,4], [1,2,4]) will return 0.

Arguments

  • array1 – Array of any type with a set of elements.

  • array2 – Array of any type with a set of elements.

Return values

  • 1, if array1 contains array2.

  • 0, otherwise.

Peculiar properties

  • The function will return 1 if array2 is empty.

  • Null processed as a value. In other words hasSubstr([1, 2, NULL, 3, 4], [2,3]) will return 0. However, hasSubstr([1, 2, NULL, 3, 4], [2,NULL,3]) will return 1

  • Order of values in both of arrays does matter.

Examples

SELECT hasSubstr([], []) returns 1.

SELECT hasSubstr([1, Null], [Null]) returns 1.

SELECT hasSubstr([1.0, 2, 3, 4], [1, 3]) returns 0.

SELECT hasSubstr(['a', 'b'], ['a']) returns 1.

SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'b']) returns 1.

SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'c']) returns 0.

SELECT hasSubstr([[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4]]) returns 1.

indexOf(arr, x)

Returns the index of the first ‘x’ element (starting from 1) if it is in the array, or 0 if it is not.

Example:

SELECT indexOf([1, 3, NULL, NULL], NULL)
┌─indexOf([1, 3, NULL, NULL], NULL)─┐
│                                 3 │
└───────────────────────────────────┘

Elements set to NULL are handled as normal values.

arrayCount([func,] arr1, …)

Returns the number of elements for which func(arr1[i], …, arrN[i]) returns something other than 0. If func is not specified, it returns the number of non-zero elements in the array.

Note that the arrayCount is a higher-order function. You can pass a lambda function to it as the first argument.

Example

arrayCount(lambda(tuple(x, y), equals(x, y)), [1, 2, 3], [1, 5, 3]) = 2

countEqual(arr, x)

Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem -> elem = x, arr).

NULL elements are handled as separate values.

Example

SELECT countEqual([1, 2, NULL, NULL], NULL)
┌─countEqual([1, 2, NULL, NULL], NULL)─┐
│                                    2 │
└──────────────────────────────────────┘

arrayEnumerate(arr)

Returns the array [1, 2, 3, …, length (arr) ]

This function is normally used with ARRAY JOIN. It allows counting something just once for each array after applying ARRAY JOIN. Example:

SELECT
    count() AS Reaches,
    countIf(num = 1) AS Hits
FROM test.hits
ARRAY JOIN
    GoalsReached,
    arrayEnumerate(GoalsReached) AS num
WHERE CounterID = 160656
LIMIT 10
┌─Reaches─┬──Hits─┐
│   95606 │ 31406 │
└─────────┴───────┘

In this example, Reaches is the number of conversions (the strings received after applying ARRAY JOIN), and Hits is the number of pageviews (strings before ARRAY JOIN). In this particular case, you can get the same result in an easier way:

SELECT
    sum(length(GoalsReached)) AS Reaches,
    count() AS Hits
FROM test.hits
WHERE (CounterID = 160656) AND notEmpty(GoalsReached)
┌─Reaches─┬──Hits─┐
│   95606 │ 31406 │
└─────────┴───────┘

This function can also be used in higher-order functions. For example, you can use it to get array indexes for elements that match a condition.

arrayEnumerateUniq(arr, …)

Returns an array the same size as the source array, indicating for each element what its position is among elements with the same value. For example: arrayEnumerateUniq([10, 20, 10, 30]) = [1, 1, 2, 1].

This function is useful when using ARRAY JOIN and aggregation of array elements. Example:

SELECT
    Goals.ID AS GoalID,
    sum(Sign) AS Reaches,
    sumIf(Sign, num = 1) AS Visits
FROM test.visits
ARRAY JOIN
    Goals,
    arrayEnumerateUniq(Goals.ID) AS num
WHERE CounterID = 160656
GROUP BY GoalID
ORDER BY Reaches DESC
LIMIT 10
┌──GoalID─┬─Reaches─┬─Visits─┐
│   53225 │    3214 │   1097 │
│ 2825062 │    3188 │   1097 │
│   56600 │    2803 │    488 │
│ 1989037 │    2401 │    365 │
│ 2830064 │    2396 │    910 │
│ 1113562 │    2372 │    373 │
│ 3270895 │    2262 │    812 │
│ 1084657 │    2262 │    345 │
│   56599 │    2260 │    799 │
│ 3271094 │    2256 │    812 │
└─────────┴─────────┴────────┘

In this example, each goal ID has a calculation of the number of conversions (each element in the Goals nested data structure is a goal that was reached, which we refer to as a conversion) and the number of sessions. Without ARRAY JOIN, we would have counted the number of sessions as sum(Sign). But in this particular case, the rows were multiplied by the nested Goals structure, so in order to count each session one time after this, we apply a condition to the value of the arrayEnumerateUniq(Goals.ID) function.

The arrayEnumerateUniq function can take multiple arrays of the same size as arguments. In this case, uniqueness is considered for tuples of elements in the same positions in all the arrays.

SELECT arrayEnumerateUniq([1, 1, 1, 2, 2, 2], [1, 1, 2, 1, 1, 2]) AS res
┌─res───────────┐
│ [1,2,1,1,2,1] │
└───────────────┘

This is necessary when using ARRAY JOIN with a nested data structure and further aggregation across multiple elements in this structure.

arrayPopBack

Removes the last item from the array.

arrayPopBack(array)

Arguments

  • array – Array.

Example

SELECT arrayPopBack([1, 2, 3]) AS res;
┌─res───┐
│ [1,2] │
└───────┘

arrayPopFront

Removes the first item from the array.

arrayPopFront(array)

Arguments

  • array – Array.

Example

SELECT arrayPopFront([1, 2, 3]) AS res;
┌─res───┐
│ [2,3] │
└───────┘

arrayPushBack

Adds one item to the end of the array.

arrayPushBack(array, single_value)

Arguments

  • array – Array.

  • single_value – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the single_value type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can be NULL. The function adds a NULL element to an array, and the type of array elements converts to Nullable.

Example

SELECT arrayPushBack(['a'], 'b') AS res;
┌─res───────┐
│ ['a','b'] │
└───────────┘

arrayPushFront

Adds one element to the beginning of the array.

arrayPushFront(array, single_value)

Arguments

  • array – Array.

  • single_value – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the single_value type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can be NULL. The function adds a NULL element to an array, and the type of array elements converts to Nullable.

Example

SELECT arrayPushFront(['b'], 'a') AS res;
┌─res───────┐
│ ['a','b'] │
└───────────┘

arrayResize

Changes the length of the array.

arrayResize(array, size[, extender])

Arguments:

  • array — Array.

  • size — Required length of the array.

    • If size is less than the original size of the array, the array is truncated from the right.

  • If size is larger than the initial size of the array, the array is extended to the right with extender values or default values for the data type of the array items.

  • extender — Value for extending an array. Can be NULL.

Returned value:

An array of length size.

Examples of calls

SELECT arrayResize([1], 3);
┌─arrayResize([1], 3)─┐
│ [1,0,0]             │
└─────────────────────┘
SELECT arrayResize([1], 3, NULL);
┌─arrayResize([1], 3, NULL)─┐
│ [1,NULL,NULL]             │
└───────────────────────────┘

arraySlice

Returns a slice of the array.

arraySlice(array, offset[, length])

Arguments

  • array – Array of data.

  • offset – Indent from the edge of the array. A positive value indicates an offset on the left, and a negative value is an indent on the right. Numbering of the array items begins with 1.

  • length – The length of the required slice. If you specify a negative value, the function returns an open slice [offset, array_length - length]. If you omit the value, the function returns the slice [offset, the_end_of_array].

Example

SELECT arraySlice([1, 2, NULL, 4, 5], 2, 3) AS res;
┌─res────────┐
│ [2,NULL,4] │
└────────────┘

Array elements set to NULL are handled as normal values.

arraySort([func,] arr, …)

Sorts the elements of the arr array in ascending order. If the func function is specified, sorting order is determined by the result of the func function applied to the elements of the array. If func accepts multiple arguments, the arraySort function is passed several arrays that the arguments of func will correspond to. Detailed examples are shown at the end of arraySort description.

Example of integer values sorting:

SELECT arraySort([1, 3, 3, 0]);
┌─arraySort([1, 3, 3, 0])─┐
│ [0,1,3,3]               │
└─────────────────────────┘

Example of string values sorting:

SELECT arraySort(['hello', 'world', '!']);
┌─arraySort(['hello', 'world', '!'])─┐
│ ['!','hello','world']              │
└────────────────────────────────────┘

Consider the following sorting order for the NULL, NaN and Inf values:

SELECT arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]);
┌─arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf])─┐
│ [-inf,-4,1,2,3,inf,nan,nan,NULL,NULL]                     │
└───────────────────────────────────────────────────────────┘
  • -Inf values are first in the array.

  • NULL values are last in the array.

  • NaN values are right before NULL.

  • Inf values are right before NaN.

Note that arraySort is a higher-order function. You can pass a lambda function to it as the first argument. In this case, sorting order is determined by the result of the lambda function applied to the elements of the array.

Let’s consider the following example:

SELECT arraySort((x) -> -x, [1, 2, 3]) as res;
┌─res─────┐
│ [3,2,1] │
└─────────┘

For each element of the source array, the lambda function returns the sorting key, that is, [1 –> -1, 2 –> -2, 3 –> -3]. Since the arraySort function sorts the keys in ascending order, the result is [3, 2, 1]. Thus, the (x) –> -x lambda function sets the descending order in a sorting.

The lambda function can accept multiple arguments. In this case, you need to pass the arraySort function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:

SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]) as res;
┌─res────────────────┐
│ ['world', 'hello'] │
└────────────────────┘

Here, the elements that are passed in the second array ([2, 1]) define a sorting key for the corresponding element from the source array ([‘hello’, ‘world’]), that is, [‘hello’ –> 2, ‘world’ –> 1]. Since the lambda function does not use x, actual values of the source array do not affect the order in the result. So, ‘hello’ will be the second element in the result, and ‘world’ will be the first.

Other examples are shown below.

SELECT arraySort((x, y) -> y, [0, 1, 2], ['c', 'b', 'a']) as res;
┌─res─────┐
│ [2,1,0] │
└─────────┘
SELECT arraySort((x, y) -> -y, [0, 1, 2], [1, 2, 3]) as res;
┌─res─────┐
│ [2,1,0] │
└─────────┘

NOTE

To improve sorting efficiency, the Schwartzian transform is used.

arrayReverseSort([func,] arr, …)

Sorts the elements of the arr array in descending order. If the func function is specified, arr is sorted according to the result of the func function applied to the elements of the array, and then the sorted array is reversed. If func accepts multiple arguments, the arrayReverseSort function is passed several arrays that the arguments of func will correspond to. Detailed examples are shown at the end of arrayReverseSort description.

Example of integer values sorting:

SELECT arrayReverseSort([1, 3, 3, 0]);
┌─arrayReverseSort([1, 3, 3, 0])─┐
│ [3,3,1,0]                      │
└────────────────────────────────┘

Example of string values sorting:

SELECT arrayReverseSort(['hello', 'world', '!']);
┌─arrayReverseSort(['hello', 'world', '!'])─┐
│ ['world','hello','!']                     │
└───────────────────────────────────────────┘

Consider the following sorting order for the NULL, NaN and Inf values:

SELECT arrayReverseSort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]) as res;
┌─res───────────────────────────────────┐
│ [inf,3,2,1,-4,-inf,nan,nan,NULL,NULL] │
└───────────────────────────────────────┘
  • Inf values are first in the array.

  • NULL values are last in the array.

  • NaN values are right before NULL.

  • -Inf values are right before NaN.

Note that the arrayReverseSort is a higher-order function. You can pass a lambda function to it as the first argument. Example is shown below.

SELECT arrayReverseSort((x) -> -x, [1, 2, 3]) as res;
┌─res─────┐
│ [1,2,3] │
└─────────┘

The array is sorted in the following way:

  1. At first, the source array ([1, 2, 3]) is sorted according to the result of the lambda function applied to the elements of the array. The result is an array [3, 2, 1].

  2. Array that is obtained on the previous step, is reversed. So, the final result is [1, 2, 3].

The lambda function can accept multiple arguments. In this case, you need to pass the arrayReverseSort function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:

SELECT arrayReverseSort((x, y) -> y, ['hello', 'world'], [2, 1]) as res;
┌─res───────────────┐
│ ['hello','world'] │
└───────────────────┘

In this example, the array is sorted in the following way:

  1. At first, the source array ([‘hello’, ‘world’]) is sorted according to the result of the lambda function applied to the elements of the arrays. The elements that are passed in the second array ([2, 1]), define the sorting keys for corresponding elements from the source array. The result is an array [‘world’, ‘hello’].

  2. Array that was sorted on the previous step, is reversed. So, the final result is [‘hello’, ‘world’].

Other examples are shown below.

SELECT arrayReverseSort((x, y) -> y, [4, 3, 5], ['a', 'b', 'c']) AS res;
┌─res─────┐
│ [5,3,4] │
└─────────┘
SELECT arrayReverseSort((x, y) -> -y, [4, 3, 5], [1, 2, 3]) AS res;
┌─res─────┐
│ [4,3,5] │
└─────────┘

arrayUniq(arr, …)

If one argument is passed, it counts the number of different elements in the array. If multiple arguments are passed, it counts the number of different tuples of elements at corresponding positions in multiple arrays.

If you want to get a list of unique items in an array, you can use arrayReduce(‘groupUniqArray’, arr).

Example

SELECT arrayUniq([2, 3]) AS res;

arrayJoin(arr)

A special function. See the section “ArrayJoin function”.

arrayDifference

Calculates the difference between adjacent array elements. Returns an array where the first element will be 0, the second is the difference between a[1] - a[0], etc. The type of elements in the resulting array is determined by the type inference rules for subtraction (e.g. UInt8 - UInt8 = Int16).

Syntax

arrayDifference(array)

Arguments

Returned values

Returns an array of differences between adjacent elements.

Type: UInt*, Int*, Float*.

Example

Query:

SELECT arrayDifference([1, 2, 3, 4]);

Result:

┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1]                     │
└───────────────────────────────┘

Example of the overflow due to result type Int64:

Query:

SELECT arrayDifference([0, 10000000000000000000]);

Result:

┌─arrayDifference([0, 10000000000000000000])─┐
│ [0,-8446744073709551616]                   │
└────────────────────────────────────────────┘

arrayDistinct

Takes an array, returns an array containing the distinct elements only.

Syntax

arrayDistinct(array)

Arguments

Returned values

Returns an array containing the distinct elements.

Example

Query:

SELECT arrayDistinct([1, 2, 2, 3, 1]);

Result:

┌─arrayDistinct([1, 2, 2, 3, 1])─┐
│ [1,2,3]                        │
└────────────────────────────────┘

arrayEnumerateDense(arr)

Returns an array of the same size as the source array, indicating where each element first appears in the source array.

Example

SELECT arrayEnumerateDense([10, 20, 10, 30])
┌─arrayEnumerateDense([10, 20, 10, 30])─┐
│ [1,2,1,3]                             │
└───────────────────────────────────────┘

arrayIntersect(arr)

Takes multiple arrays, returns an array with elements that are present in all source arrays.

Example

SELECT
    arrayIntersect([1, 2], [1, 3], [2, 3]) AS no_intersect,
    arrayIntersect([1, 2], [1, 3], [1, 4]) AS intersect
┌─no_intersect─┬─intersect─┐
│ []           │ [1]       │
└──────────────┴───────────┘

arrayReduce

Applies an aggregate function to array elements and returns its result. The name of the aggregation function is passed as a string in single quotes 'max', 'sum'. When using parametric aggregate functions, the parameter is indicated after the function name in parentheses 'uniqUpTo(6)'.

Syntax

arrayReduce(agg_func, arr1, arr2, ..., arrN)

Arguments

  • agg_func — The name of an aggregate function which should be a constant string.

  • arr — Any number of array type columns as the parameters of the aggregation function.

Returned value

Example

Query:

SELECT arrayReduce('max', [1, 2, 3]);

Result:

┌─arrayReduce('max', [1, 2, 3])─┐
│                             3 │
└───────────────────────────────┘

If an aggregate function takes multiple arguments, then this function must be applied to multiple arrays of the same size.

Query:

SELECT arrayReduce('maxIf', [3, 5], [1, 0]);

Result:

┌─arrayReduce('maxIf', [3, 5], [1, 0])─┐
│                                    3 │
└──────────────────────────────────────┘

Example with a parametric aggregate function:

Query:

SELECT arrayReduce('uniqUpTo(3)', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);

Result:

┌─arrayReduce('uniqUpTo(3)', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])─┐
│                                                           4 │
└─────────────────────────────────────────────────────────────┘

arrayReduceInRanges

Applies an aggregate function to array elements in given ranges and returns an array containing the result corresponding to each range. The function will return the same result as multiple arrayReduce(agg_func, arraySlice(arr1, index, length), ...).

Syntax

arrayReduceInRanges(agg_func, ranges, arr1, arr2, ..., arrN)

Arguments

  • agg_func — The name of an aggregate function which should be a constant string.

  • ranges — The ranges to aggretate which should be an array of tuples which containing the index and the length of each range.

  • arr — Any number of Array type columns as the parameters of the aggregation function.

Returned value

  • Array containing results of the aggregate function over specified ranges.

Type: Array.

Example

Query:

SELECT arrayReduceInRanges(
    'sum',
    [(1, 5), (2, 3), (3, 4), (4, 4)],
    [1000000, 200000, 30000, 4000, 500, 60, 7]
) AS res

Result:

┌─res─────────────────────────┐
│ [1234500,234000,34560,4567] │
└─────────────────────────────┘

arrayReverse(arr)

Returns an array of the same size as the original array containing the elements in reverse order.

Example:

SELECT arrayReverse([1, 2, 3])
┌─arrayReverse([1, 2, 3])─┐
│ [3,2,1]                 │
└─────────────────────────┘

reverse(arr)

Synonym for “arrayReverse”

arrayFlatten

Converts an array of arrays to a flat array.

Function:

  • Applies to any depth of nested arrays.

  • Does not change arrays that are already flat.

The flattened array contains all the elements from all source arrays.

Syntax

flatten(array_of_arrays)

Alias: flatten.

Arguments

  • array_of_arraysArray of arrays. For example, [[1,2,3], [4,5]].

Examples

SELECT flatten([[[1]], [[2], [3]]]);
┌─flatten(array(array([1]), array([2], [3])))─┐
│ [1,2,3]                                     │
└─────────────────────────────────────────────┘

arrayCompact

Removes consecutive duplicate elements from an array. The order of result values is determined by the order in the source array.

Syntax

arrayCompact(arr)

Arguments

arr — The array to inspect.

Returned value

The array without duplicate.

Type: Array.

Example

Query:

SELECT arrayCompact([1, 1, nan, nan, 2, 3, 3, 3]);

Result:

┌─arrayCompact([1, 1, nan, nan, 2, 3, 3, 3])─┐
│ [1,nan,nan,2,3]                            │
└────────────────────────────────────────────┘

arrayZip

Combines multiple arrays into a single array. The resulting array contains the corresponding elements of the source arrays grouped into tuples in the listed order of arguments.

Syntax

arrayZip(arr1, arr2, ..., arrN)

Arguments

The function can take any number of arrays of different types. All the input arrays must be of equal size.

Returned value

  • Array with elements from the source arrays grouped into tuples. Data types in the tuple are the same as types of the input arrays and in the same order as arrays are passed.

Type: Array.

Example

Query:

SELECT arrayZip(['a', 'b', 'c'], [5, 2, 1]);

Result:

┌─arrayZip(['a', 'b', 'c'], [5, 2, 1])─┐
│ [('a',5),('b',2),('c',1)]            │
└──────────────────────────────────────┘

arrayAUC

Calculate AUC (Area Under the Curve, which is a concept in machine learning, see more details: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).

Syntax

arrayAUC(arr_scores, arr_labels)

Arguments

  • arr_scores — scores prediction model gives.

  • arr_labels — labels of samples, usually 1 for positive sample and 0 for negtive sample.

Returned value

Returns AUC value with type Float64.

Example

Query:

select arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);

Result:

┌─arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
│                                          0.75 │
└───────────────────────────────────────────────┘

arrayMap(func, arr1, …)

Returns an array obtained from the original arrays by application of func(arr1[i], …, arrN[i]) for each element. Arrays arr1arrN must have the same number of elements.

Examples

SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res;
┌─res─────┐
│ [3,4,5] │
└─────────┘

The following example shows how to create a tuple of elements from different arrays:

SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res
┌─res─────────────────┐
│ [(1,4),(2,5),(3,6)] │
└─────────────────────┘

Note that the arrayMap is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arrayFilter(func, arr1, …)

Returns an array containing only the elements in arr1 for which func(arr1[i], …, arrN[i]) returns something other than 0.

Examples

SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res
┌─res───────────┐
│ ['abc World'] │
└───────────────┘
SELECT
    arrayFilter(
        (i, x) -> x LIKE '%World%',
        arrayEnumerate(arr),
        ['Hello', 'abc World'] AS arr)
    AS res
┌─res─┐
│ [2] │
└─────┘

Note that the arrayFilter is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arrayFill(func, arr1, …)

Scan through arr1 from the first element to the last element and replace arr1[i] by arr1[i - 1] if func(arr1[i], …, arrN[i]) returns 0. The first element of arr1 will not be replaced.

Examples

SELECT arrayFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res
┌─res──────────────────────────────┐
│ [1,1,3,11,12,12,12,5,6,14,14,14] │
└──────────────────────────────────┘

Note that the arrayFill is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arrayReverseFill(func, arr1, …)

Scan through arr1 from the last element to the first element and replace arr1[i] by arr1[i + 1] if func(arr1[i], …, arrN[i]) returns 0. The last element of arr1 will not be replaced.

Examples:

SELECT arrayReverseFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res
┌─res────────────────────────────────┐
│ [1,3,3,11,12,5,5,5,6,14,NULL,NULL] │
└────────────────────────────────────┘

Note that the arrayReverseFill is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arraySplit(func, arr1, …)

Split arr1 into multiple arrays. When func(arr1[i], …, arrN[i]) returns something other than 0, the array will be split on the left hand side of the element. The array will not be split before the first element.

Examples:

SELECT arraySplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res
┌─res─────────────┐
│ [[1,2,3],[4,5]] │
└─────────────────┘

Note that the arraySplit is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arrayReverseSplit(func, arr1, …)

Split arr1 into multiple arrays. When func(arr1[i], …, arrN[i]) returns something other than 0, the array will be split on the right hand side of the element. The array will not be split after the last element.

Examples:

SELECT arrayReverseSplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res
┌─res───────────────┐
│ [[1],[2,3,4],[5]] │
└───────────────────┘

Note that the arrayReverseSplit is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

arrayExists([func,] arr1, …)

Returns 1 if there is at least one element in arr for which func(arr1[i], …, arrN[i]) returns something other than 0. Otherwise, it returns 0.

Note that the arrayExists is a higher-order function. You can pass a lambda function to it as the first argument.

Example

SELECT arrayAll((x,y)->x==y,[1,2,3],[4,5,6]);

arrayAll([func,] arr1, …)

Returns 1 if func(arr1[i], …, arrN[i]) returns something other than 0 for all the elements in arrays. Otherwise, it returns 0.

Note that the arrayAll is a higher-order function. You can pass a lambda function to it as the first argument.

Example

SELECT arrayAll((x,y)->x==y,[1,2,3],[4,5,6]);

arrayFirst(func, arr1, …)

Returns the first element in the arr1 array for which func(arr1[i], …, arrN[i]) returns something other than 0.

Note that the arrayFirst is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

Example

SELECT arrayFirst(x -> x LIKE '%World%', ['Hello World', 'abc World']) AS res

arrayFirstIndex(func, arr1, …)

Returns the index of the first element in the arr1 array for which func(arr1[i], …, arrN[i]) returns something other than 0.

Note that the arrayFirstIndex is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.

Example

SELECT arrayFirstIndex(x -> x LIKE '%World%', ['Hello World', 'abc World']) AS res
```json
Returns a bitwise 'AND' of two numbers.
```

arrayMin

Returns the minimum of elements in the source array.

If the func function is specified, returns the mininum of elements converted by this function.

Note that the arrayMin is a higher-order function. You can pass a lambda function to it as the first argument.

Syntax

arrayMin([func,] arr)

Arguments

Returned value

  • The minimum of function values (or the array minimum).

Type: if func is specified, matches func return value type, else matches the array elements type.

Examples

Query:

SELECT arrayMin([1, 2, 4]) AS res;

Result:

┌─res─┐
│   1 │
└─────┘

Query:

SELECT arrayMin(x -> (-x), [1, 2, 4]) AS res;

Result:

┌─res─┐
│  -4 │
└─────┘

arrayMax

Returns the maximum of elements in the source array.

If the func function is specified, returns the maximum of elements converted by this function.

Note that the arrayMax is a higher-order function. You can pass a lambda function to it as the first argument.

Syntax

arrayMax([func,] arr)

Arguments

Returned value

  • The maximum of function values (or the array maximum).

Type: if func is specified, matches func return value type, else matches the array elements type.

Examples

Query:

SELECT arrayMax([1, 2, 4]) AS res;

Result:

┌─res─┐
│   4 │
└─────┘

Query:

SELECT arrayMax(x -> (-x), [1, 2, 4]) AS res;

Result:

┌─res─┐
│  -1 │
└─────┘

arraySum

Returns the sum of elements in the source array.

If the func function is specified, returns the sum of elements converted by this function.

Note that the arraySum is a higher-order function. You can pass a lambda function to it as the first argument.

Syntax

arraySum([func,] arr)

Arguments

Returned value

  • The sum of the function values (or the array sum).

Type: for decimal numbers in source array (or for converted values, if func is specified) — Decimal128, for floating point numbers — Float64, for numeric unsigned — UInt64, and for numeric signed — Int64.

Examples

Query:

SELECT arraySum([2, 3]) AS res;

Result:

┌─res─┐
│   5 │
└─────┘

Query:

SELECT arraySum(x -> x*x, [2, 3]) AS res;

Result:

┌─res─┐
│  13 │
└─────┘

arrayAvg

Returns the average of elements in the source array.

If the func function is specified, returns the average of elements converted by this function.

Note that the arrayAvg is a higher-order function. You can pass a lambda function to it as the first argument.

Syntax

arrayAvg([func,] arr)

Arguments

Returned value

  • The average of function values (or the array average).

Type: Float64.

Examples

Query:

SELECT arrayAvg([1, 2, 4]) AS res;

Result:

┌────────────────res─┐
│ 2.3333333333333335 │
└────────────────────┘

Query:

SELECT arrayAvg(x -> (x * x), [2, 4]) AS res;

Result:

┌─res─┐
│  10 │
└─────┘

arrayCumSum([func,] arr1, …)

Returns an array of partial sums of elements in the source array (a running sum). If the func function is specified, then the values of the array elements are converted by func(arr1[i], …, arrN[i]) before summing.

Example:

SELECT arrayCumSum([1, 1, 1, 1]) AS res
┌─res──────────┐
│ [1, 2, 3, 4] │
└──────────────┘

Note that the arrayCumSum is a higher-order function. You can pass a lambda function to it as the first argument.

arrayCumSumNonNegative(arr)

Same as arrayCumSum, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example:

SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res
┌─res───────┐
│ [1,2,0,1] │
└───────────┘

Note that the arraySumNonNegative is a higher-order function. You can pass a lambda function to it as the first argument.

arrayProduct

Multiplies elements of an array.

Syntax

arrayProduct(arr)

Arguments

  • arrArray of numeric values.

Returned value

  • A product of array's elements.

Type: Float64.

Examples

Query:

SELECT arrayProduct([1,2,3,4,5,6]) as res;

Result:

┌─res───┐
│ 720   │
└───────┘

Query:

SELECT arrayProduct([toDecimal64(1,8), toDecimal64(2,8), toDecimal64(3,8)]) as res, toTypeName(res);

Return value type is always Float64. Result:

┌─res─┬─toTypeName(arrayProduct(array(toDecimal64(1, 8), toDecimal64(2, 8), toDecimal64(3, 8))))─┐
│ 6   │ Float64                                                                                  │
└─────┴──────────────────────────────────────────────────────────────────────────────────────────┘

BIT

Bit functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64. Some functions support String and FixedString types.

The result type is an integer with bits equal to the maximum bits of its arguments. If at least one of the arguments is signed, the result is a signed number. If an argument is a floating-point number, it is cast to Int64.

bitAnd(a, b)

Returns a bitwise 'AND' of two numbers.

bitOr(a, b)

bitXor(a, b)

bitNot(a)

bitShiftLeft(a, b)

Shifts the binary representation of a value to the left by a specified number of bit positions.

A FixedString or a String is treated as a single multibyte value.

Bits of a FixedString value are lost as they are shifted out. On the contrary, a String value is extended with additional bytes, so no bits are lost.

Syntax

bitShiftLeft(a, b)

Arguments

Returned value

  • Shifted value.

The type of the returned value is the same as the type of the input value.

Example

In the following queries bin and hex functions are used to show bits of shifted values.

SELECT 99 AS a, bin(a), bitShiftLeft(a, 2) AS a_shifted, bin(a_shifted);
SELECT 'abc' AS a, hex(a), bitShiftLeft(a, 4) AS a_shifted, hex(a_shifted);
SELECT toFixedString('abc', 3) AS a, hex(a), bitShiftLeft(a, 4) AS a_shifted, hex(a_shifted);

Result:

┌──a─┬─bin(99)──┬─a_shifted─┬─bin(bitShiftLeft(99, 2))─┐
│ 99 │ 01100011 │       140 │ 10001100                 │
└────┴──────────┴───────────┴──────────────────────────┘
┌─a───┬─hex('abc')─┬─a_shifted─┬─hex(bitShiftLeft('abc', 4))─┐
│ abc │ 616263     │ &0        │ 06162630                    │
└─────┴────────────┴───────────┴─────────────────────────────┘
┌─a───┬─hex(toFixedString('abc', 3))─┬─a_shifted─┬─hex(bitShiftLeft(toFixedString('abc', 3), 4))─┐
│ abc │ 616263                       │ &0        │ 162630                                        │
└─────┴──────────────────────────────┴───────────┴───────────────────────────────────────────────┘

bitShiftRight(a, b)

Shifts the binary representation of a value to the right by a specified number of bit positions.

A FixedString or a String is treated as a single multibyte value. Note that the length of a String value is reduced as bits are shifted out.

Syntax

bitShiftRight(a, b)

Arguments

Returned value

  • Shifted value.

The type of the returned value is the same as the type of the input value.

Example

Query:

SELECT 101 AS a, bin(a), bitShiftRight(a, 2) AS a_shifted, bin(a_shifted);
SELECT 'abc' AS a, hex(a), bitShiftRight(a, 12) AS a_shifted, hex(a_shifted);
SELECT toFixedString('abc', 3) AS a, hex(a), bitShiftRight(a, 12) AS a_shifted, hex(a_shifted);

Result:

┌───a─┬─bin(101)─┬─a_shifted─┬─bin(bitShiftRight(101, 2))─┐
│ 101 │ 01100101 │        25 │ 00011001                   │
└─────┴──────────┴───────────┴────────────────────────────┘
┌─a───┬─hex('abc')─┬─a_shifted─┬─hex(bitShiftRight('abc', 12))─┐
│ abc │ 616263     │           │ 0616                          │
└─────┴────────────┴───────────┴───────────────────────────────┘
┌─a───┬─hex(toFixedString('abc', 3))─┬─a_shifted─┬─hex(bitShiftRight(toFixedString('abc', 3), 12))─┐
│ abc │ 616263                       │           │ 000616                                          │
└─────┴──────────────────────────────┴───────────┴─────────────────────────────────────────────────┘

bitRotateLeft(a, b)

bitRotateRight(a, b)

bitTest

Takes any integer and converts it into binary form, returns the value of a bit at specified position. The countdown starts from 0 from the right to the left.

Syntax

SELECT bitTest(number, index)

Arguments

  • number – Integer number.

  • index – Position of bit.

Returned values

Returns a value of bit at specified position.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTest(43, 1);

Result:

┌─bitTest(43, 1)─┐
│              1 │
└────────────────┘

Another example:

Query:

SELECT bitTest(43, 2);

Result:

┌─bitTest(43, 2)─┐
│              0 │
└────────────────┘

bitTestAll

Returns result of logical conjuction (AND operator) of all bits at given positions. The countdown starts from 0 from the right to the left.

The conjuction for bitwise operations:

0 AND 0 = 0

0 AND 1 = 0

1 AND 0 = 0

1 AND 1 = 1

Syntax

SELECT bitTestAll(number, index1, index2, index3, index4, ...)

Arguments

  • number – Integer number.

  • index1, index2, index3, index4 – Positions of bit. For example, for set of positions (index1, index2, index3, index4) is true if and only if all of its positions are true (index1index2, ⋀ index3index4).

Returned values

Returns result of logical conjuction.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTestAll(43, 0, 1, 3, 5);

Result:

┌─bitTestAll(43, 0, 1, 3, 5)─┐
│                          1 │
└────────────────────────────┘

Another example:

Query:

SELECT bitTestAll(43, 0, 1, 3, 5, 2);

Result:

┌─bitTestAll(43, 0, 1, 3, 5, 2)─┐
│                             0 │
└───────────────────────────────┘

bitTestAny

Returns result of logical disjunction (OR operator) of all bits at given positions. The countdown starts from 0 from the right to the left.

The disjunction for bitwise operations:

0 OR 0 = 0

0 OR 1 = 1

1 OR 0 = 1

1 OR 1 = 1

Syntax

SELECT bitTestAny(number, index1, index2, index3, index4, ...)

Arguments

  • number – Integer number.

  • index1, index2, index3, index4 – Positions of bit.

Returned values

Returns result of logical disjuction.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTestAny(43, 0, 2);

Result:

┌─bitTestAny(43, 0, 2)─┐
│                    1 │
└──────────────────────┘

Another example:

Query:

SELECT bitTestAny(43, 4, 2);

Result:

┌─bitTestAny(43, 4, 2)─┐
│                    0 │
└──────────────────────┘

bitCount

Calculates the number of bits set to one in the binary representation of a number.

Syntax

bitCount(x)

Arguments

  • xInteger or floating-point number. The function uses the value representation in memory. It allows supporting floating-point numbers.

Returned value

  • Number of bits set to one in the input number.

The function does not convert input value to a larger type (sign extension). So, for example, bitCount(toUInt8(-1)) = 8.

Type: UInt8.

Example

Take for example the number 333. Its binary representation: 0000000101001101.

Query:

SELECT bitCount(333);

Result:

┌─bitCount(333)─┐
│             5 │
└───────────────┘

bitHammingDistance

Returns the Hamming Distance between the bit representations of two integer values. Can be used with SimHash functions for detection of semi-duplicate strings. The smaller is the distance, the more likely those strings are the same.

Syntax

bitHammingDistance(int1, int2)

Arguments

  • int1 — First integer value. Int64.

  • int2 — Second integer value. Int64.

Returned value

  • The Hamming distance.

Type: UInt8.

Examples

Query:

SELECT bitHammingDistance(111, 121);

Result:

┌─bitHammingDistance(111, 121)─┐
│                            3 │
└──────────────────────────────┘

With SimHash:

SELECT bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'));

Result:

┌─bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'))─┐
│                                                                            5 │
└──────────────────────────────────────────────────────────────────────────────┘

BITMAP

Bitmap functions work for two bitmaps Object value calculation, it is to return new bitmap or cardinality while using formula calculation, such as and, or, xor, and not, etc.

There are 2 kinds of construction methods for Bitmap Object. One is to be constructed by aggregation function groupBitmap with -State, the other is to be constructed by Array Object. It is also to convert Bitmap Object to Array Object.

RoaringBitmap is wrapped into a data structure while actual storage of Bitmap objects. When the cardinality is less than or equal to 32, it uses Set objet. When the cardinality is greater than 32, it uses RoaringBitmap object. That is why storage of low cardinality set is faster.

For more information on RoaringBitmap, see: CRoaring.

bitmapBuild

Build a bitmap from unsigned integer array.

bitmapBuild(array)

Arguments

  • array – Unsigned integer array.

Example

SELECT bitmapBuild([1, 2, 3, 4, 5]) AS res, toTypeName(res);
┌─res─┬─toTypeName(bitmapBuild([1, 2, 3, 4, 5]))─────┐
│     │ AggregateFunction(groupBitmap, UInt8)        │
└─────┴──────────────────────────────────────────────┘

bitmapToArray

Convert bitmap to integer array.

bitmapToArray(bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapToArray(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─────────┐
│ [1,2,3,4,5] │
└─────────────┘

bitmapSubsetInRange

Return subset in specified range (not include the range_end).

bitmapSubsetInRange(bitmap, range_start, range_end)

Arguments

  • bitmapBitmap object.

  • range_start – Range start point. Type: UInt32.

  • range_end – Range end point (excluded). Type: UInt32.

Example

SELECT bitmapToArray(bitmapSubsetInRange(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res;
┌─res───────────────┐
│ [30,31,32,33,100] │
└───────────────────┘

bitmapSubsetLimit

Creates a subset of bitmap with n elements taken between range_start and cardinality_limit.

Syntax

bitmapSubsetLimit(bitmap, range_start, cardinality_limit)

Arguments

  • bitmapBitmap object.

  • range_start – The subset starting point. Type: UInt32.

  • cardinality_limit – The subset cardinality upper limit. Type: UInt32.

Returned value

The subset.

Type: Bitmap object.

Example

Query:

SELECT bitmapToArray(bitmapSubsetLimit(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res;

Result:

┌─res───────────────────────┐
│ [30,31,32,33,100,200,500] │
└───────────────────────────┘

subBitmap

Returns the bitmap elements, starting from the offset position. The number of returned elements is limited by the cardinality_limit parameter. Analog of the substring) string function, but for bitmap.

Syntax

subBitmap(bitmap, offset, cardinality_limit)

Arguments

  • bitmap – The bitmap. Type: Bitmap object.

  • offset – The position of the first element of the subset. Type: UInt32.

  • cardinality_limit – The maximum number of elements in the subset. Type: UInt32.

Returned value

The subset.

Type: Bitmap object.

Example

Query:

SELECT bitmapToArray(subBitmap(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(10), toUInt32(10))) AS res;

Result:

┌─res─────────────────────────────┐
│ [10,11,12,13,14,15,16,17,18,19] │
└─────────────────────────────────┘

bitmapContains

Checks whether the bitmap contains an element.

bitmapContains(haystack, needle)

Arguments

  • haystackBitmap object, where the function searches.

  • needle – Value that the function searches. Type: UInt32.

Returned values

  • 0 — If haystack does not contain needle.

  • 1 — If haystack contains needle.

Type: UInt8.

Example

SELECT bitmapContains(bitmapBuild([1,5,7,9]), toUInt32(9)) AS res;
┌─res─┐
│  1  │
└─────┘

bitmapHasAny

Checks whether two bitmaps have intersection by some elements.

bitmapHasAny(bitmap1, bitmap2)

If you are sure that bitmap2 contains strictly one element, consider using the bitmapContains function. It works more efficiently.

Arguments

  • bitmap* – Bitmap object.

Return values

  • 1, if bitmap1 and bitmap2 have one similar element at least.

  • 0, otherwise.

Example

SELECT bitmapHasAny(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│  1  │
└─────┘

bitmapHasAll

Analogous to hasAll(array, array) returns 1 if the first bitmap contains all the elements of the second one, 0 otherwise. If the second argument is an empty bitmap then returns 1.

bitmapHasAll(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapHasAll(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│  0  │
└─────┘

bitmapCardinality

Retrun bitmap cardinality of type UInt64.

bitmapCardinality(bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapCardinality(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─┐
│   5 │
└─────┘

bitmapMin

Retrun the smallest value of type UInt64 in the set, UINT32_MAX if the set is empty.

bitmapMin(bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapMin(bitmapBuild([1, 2, 3, 4, 5])) AS res;
 ┌─res─┐
 │   1 │
 └─────┘

bitmapMax

Retrun the greatest value of type UInt64 in the set, 0 if the set is empty.

bitmapMax(bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapMax(bitmapBuild([1, 2, 3, 4, 5])) AS res;
 ┌─res─┐
 │   5 │
 └─────┘

bitmapTransform

Transform an array of values in a bitmap to another array of values, the result is a new bitmap.

bitmapTransform(bitmap, from_array, to_array)

Arguments

  • bitmap – Bitmap object.

  • from_array – UInt32 array. For idx in range [0, from_array.size()), if bitmap contains from_array[idx], then replace it with to_array[idx]. Note that the result depends on array ordering if there are common elements between from_array and to_array.

  • to_array – UInt32 array, its size shall be the same to from_array.

Example

SELECT bitmapToArray(bitmapTransform(bitmapBuild([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), cast([5,999,2] as Array(UInt32)), cast([2,888,20] as Array(UInt32)))) AS res;
 ┌─res───────────────────┐
 │ [1,3,4,6,7,8,9,10,20] │
 └───────────────────────┘

bitmapAnd

Two bitmap and calculation, the result is a new bitmap.

bitmapAnd(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapToArray(bitmapAnd(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res─┐
│ [3] │
└─────┘

bitmapOr

Two bitmap or calculation, the result is a new bitmap.

bitmapOr(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapToArray(bitmapOr(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res─────────┐
│ [1,2,3,4,5] │
└─────────────┘

bitmapXor

Two bitmap xor calculation, the result is a new bitmap.

bitmapXor(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapToArray(bitmapXor(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res───────┐
│ [1,2,4,5] │
└───────────┘

bitmapAndnot

Two bitmap andnot calculation, the result is a new bitmap.

bitmapAndnot(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapToArray(bitmapAndnot(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res───┐
│ [1,2] │
└───────┘

bitmapAndCardinality

Two bitmap and calculation, return cardinality of type UInt64.

bitmapAndCardinality(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapAndCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│   1 │
└─────┘

bitmapOrCardinality

Two bitmap or calculation, return cardinality of type UInt64.

bitmapOrCardinality(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapOrCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│   5 │
└─────┘

bitmapXorCardinality

Two bitmap xor calculation, return cardinality of type UInt64.

bitmapXorCardinality(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapXorCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│   4 │
└─────┘

bitmapAndnotCardinality

Two bitmap andnot calculation, return cardinality of type UInt64.

bitmapAndnotCardinality(bitmap,bitmap)

Arguments

  • bitmap – Bitmap object.

Example

SELECT bitmapAndnotCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│   2 │
└─────┘

CONDITIONAL

if

Controls conditional branching. Unlike most systems, ClickHouse always evaluates both expressions then and else.

Syntax

if(cond, then, else)

If the condition cond evaluates to a non-zero value, returns the result of the expression then, and the result of the expression else, if present, is skipped. If the cond is zero or NULL, then the result of the then expression is skipped and the result of the else expression, if present, is returned.

You can use the short_circuit_function_evaluation setting to calculate the if function according to a short scheme. If this setting is enabled, then expression is evaluated only on rows where cond is true, else expression – where cond is false. For example, an exception about division by zero is not thrown when executing the query SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10), because intDiv(42, number) will be evaluated only for numbers that doesn't satisfy condition number = 0.

Arguments

  • cond – The condition for evaluation that can be zero or not. The type is UInt8, Nullable(UInt8) or NULL.

  • then – The expression to return if condition is met.

  • else – The expression to return if condition is not met.

Returned values

The function executes then and else expressions and returns its result, depending on whether the condition cond ended up being zero or not.

Example

Query:

SELECT if(1, plus(2, 2), plus(2, 6));

Result:

┌─plus(2, 2)─┐
│          4 │
└────────────┘

Query:

SELECT if(0, plus(2, 2), plus(2, 6));

Result:

┌─plus(2, 6)─┐
│          8 │
└────────────┘
  • then and else must have the lowest common type.

Example:

Take this LEFT_RIGHT table:

SELECT *
FROM LEFT_RIGHT

┌─left─┬─right─┐
│ ᴺᵁᴸᴸ │     4 │
│    1 │     3 │
│    2 │     2 │
│    3 │     1 │
│    4 │  ᴺᵁᴸᴸ │
└──────┴───────┘

The following query compares left and right values:

SELECT
    left,
    right,
    if(left < right, 'left is smaller than right', 'right is greater or equal than left') AS is_smaller
FROM LEFT_RIGHT
WHERE isNotNull(left) AND isNotNull(right)

┌─left─┬─right─┬─is_smaller──────────────────────────┐
│    1 │     3 │ left is smaller than right          │
│    2 │     2 │ right is greater or equal than left │
│    3 │     1 │ right is greater or equal than left │
└──────┴───────┴─────────────────────────────────────┘

Note: NULL values are not used in this example, check NULL values in conditionals section.

Ternary Operator

It works same as if function.

Syntax: cond ? then : else

Returns then if the cond evaluates to be true (greater than zero), otherwise returns else.

  • cond must be of type of UInt8, and then and else must have the lowest common type.

  • then and else can be NULL

See also

multiIf

Allows you to write the CASE operator more compactly in the query.

Syntax

multiIf(cond_1, then_1, cond_2, then_2, ..., else)

You can use the short_circuit_function_evaluation setting to calculate the multiIf function according to a short scheme. If this setting is enabled, then_i expression is evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}) AND cond_i) is true, cond_i will be evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1})) is true. For example, an exception about division by zero is not thrown when executing the query SELECT multiIf(number = 2, intDiv(1, number), number = 5) FROM numbers(10).

Arguments

  • cond_N — The condition for the function to return then_N.

  • then_N — The result of the function when executed.

  • else — The result of the function if none of the conditions is met.

The function accepts 2N+1 parameters.

Returned values

The function returns one of the values then_N or else, depending on the conditions cond_N.

Example

Again using LEFT_RIGHT table.

SELECT
    left,
    right,
    multiIf(left < right, 'left is smaller', left > right, 'left is greater', left = right, 'Both equal', 'Null value') AS result
FROM LEFT_RIGHT

┌─left─┬─right─┬─result──────────┐
│ ᴺᵁᴸᴸ │     4 │ Null value      │
│    1 │     3 │ left is smaller │
│    2 │     2 │ Both equal      │
│    3 │     1 │ left is greater │
│    4 │  ᴺᵁᴸᴸ │ Null value      │
└──────┴───────┴─────────────────┘

Case

DATES AND TIMES

Support for time zones.

All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument. Example: Asia/Yekaterinburg. In this case, they use the specified time zone instead of the local (default) one.

SELECT
    toDateTime('2016-06-15 23:00:00') AS time,
    toDate(time) AS date_local,
    toDate(time, 'Asia/Yekaterinburg') AS date_yekat,
    toString(time, 'US/Samoa') AS time_samoa
┌────────────────time─┬─date_local─┬─date_yekat─┬─time_samoa──────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-16 │ 2016-06-15 09:00:00 │
└─────────────────────┴────────────┴────────────┴─────────────────────┘

timeZone

Returns the timezone of the server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.

Syntax

timeZone()

Alias: timezone.

Returned value

  • Timezone.

Type: String.

toTimeZone

Converts time or date and time to the specified time zone. The time zone is an attribute of the Date and DateTime data types. The internal value (number of seconds) of the table field or of the resultset's column does not change, the column's type changes and its string representation changes accordingly.

Syntax

toTimezone(value, timezone)

Alias: toTimezone.

Arguments

  • value — Time or date and time. DateTime64.

  • timezone — Timezone for the returned value. String. This argument is a constant, because toTimezone changes the timezone of a column (timezone is an attribute of DateTime* types).

Returned value

  • Date and time.

Type: DateTime.

Example

Query:

SELECT toDateTime('2019-01-01 00:00:00', 'UTC') AS time_utc,
    toTypeName(time_utc) AS type_utc,
    toInt32(time_utc) AS int32utc,
    toTimeZone(time_utc, 'Asia/Yekaterinburg') AS time_yekat,
    toTypeName(time_yekat) AS type_yekat,
    toInt32(time_yekat) AS int32yekat,
    toTimeZone(time_utc, 'US/Samoa') AS time_samoa,
    toTypeName(time_samoa) AS type_samoa,
    toInt32(time_samoa) AS int32samoa
FORMAT Vertical;

Result:

Row 1:
──────
time_utc:   2019-01-01 00:00:00
type_utc:   DateTime('UTC')
int32utc:   1546300800
time_yekat: 2019-01-01 05:00:00
type_yekat: DateTime('Asia/Yekaterinburg')
int32yekat: 1546300800
time_samoa: 2018-12-31 13:00:00
type_samoa: DateTime('US/Samoa')
int32samoa: 1546300800

toTimeZone(time_utc, 'Asia/Yekaterinburg') changes the DateTime('UTC') type to DateTime('Asia/Yekaterinburg'). The value (Unixtimestamp) 1546300800 stays the same, but the string representation (the result of the toString() function) changes from time_utc: 2019-01-01 00:00:00 to time_yekat: 2019-01-01 05:00:00.

timeZoneOf

Returns the timezone name of DateTime or DateTime64 data types.

Syntax

timeZoneOf(value)

Alias: timezoneOf.

Arguments

Returned value

  • Timezone name.

Type: String.

Example

Query:

SELECT timezoneOf(now());

Result:

┌─timezoneOf(now())─┐
│ Etc/UTC           │
└───────────────────┘

timeZoneOffset

Returns a timezone offset in seconds from UTC. The function takes into account daylight saving time and historical timezone changes at the specified date and time. IANA timezone database is used to calculate the offset.

Syntax

timeZoneOffset(value)

Alias: timezoneOffset.

Arguments

Returned value

  • Offset from UTC in seconds.

Type: Int32.

Example

Query:

SELECT toDateTime('2021-04-21 10:20:30', 'America/New_York') AS Time, toTypeName(Time) AS Type,
       timeZoneOffset(Time) AS Offset_in_seconds, (Offset_in_seconds / 3600) AS Offset_in_hours;

Result:

┌────────────────Time─┬─Type─────────────────────────┬─Offset_in_seconds─┬─Offset_in_hours─┐
│ 2021-04-21 10:20:30 │ DateTime('America/New_York') │            -14400 │              -4 │
└─────────────────────┴──────────────────────────────┴───────────────────┴─────────────────┘

toYear

Converts a date or date with time to a UInt16 number containing the year number (AD).

Alias: YEAR.

toQuarter

Converts a date or date with time to a UInt8 number containing the quarter number.

Alias: QUARTER.

toMonth

Converts a date or date with time to a UInt8 number containing the month number (1-12).

Alias: MONTH.

toDayOfYear

Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).

Alias: DAYOFYEAR.

toDayOfMonth

Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).

Aliases: DAYOFMONTH, DAY.

toDayOfWeek

Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).

Alias: DAYOFWEEK.

toHour

Converts a date with time to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true – even in Moscow the clocks were twice changed at a different time).

Alias: HOUR.

toMinute

Converts a date with time to a UInt8 number containing the number of the minute of the hour (0-59).

Alias: MINUTE.

toSecond

Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for.

Alias: SECOND.

toUnixTimestamp

For DateTime argument: converts value to the number with type UInt32 -- Unix Timestamp (https://en.wikipedia.org/wiki/Unix_time). For String argument: converts the input string to the datetime according to the timezone (optional second argument, server timezone is used by default) and returns the corresponding unix timestamp.

Syntax

toUnixTimestamp(datetime)
toUnixTimestamp(str, [timezone])

Returned value

  • Returns the unix timestamp.

Type: UInt32.

Example

Query:

SELECT toUnixTimestamp('2017-11-05 08:07:47', 'Asia/Tokyo') AS unix_timestamp

Result:

┌─unix_timestamp─┐
│     1509836867 │
└────────────────┘

NOTE

The return type of toStartOf*, toLastDayOfMonth, toMonday, timeSlot functions described below is determined by the configuration parameter enable_extended_results_for_datetime_functions which is 0 by default.

Behavior for

  • enable_extended_results_for_datetime_functions = 0: Functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toLastDayOfMonth, toMonday return Date or DateTime. Functions toStartOfDay, toStartOfHour, toStartOfFifteenMinutes, toStartOfTenMinutes, toStartOfFiveMinutes, toStartOfMinute, timeSlot return DateTime. Though these functions can take values of the extended types Date32 and DateTime64 as an argument, passing them a time outside the normal range (year 1970 to 2149 for Date / 2106 for DateTime) will produce wrong results.

  • enable_extended_results_for_datetime_functions = 1:

    • Functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toLastDayOfMonth, toMonday return Date or DateTime if their argument is a Date or DateTime, and they return Date32 or DateTime64 if their argument is a Date32 or DateTime64.

    • Functions toStartOfDay, toStartOfHour, toStartOfFifteenMinutes, toStartOfTenMinutes, toStartOfFiveMinutes, toStartOfMinute, timeSlot return DateTime if their argument is a Date or DateTime, and they return DateTime64 if their argument is a Date32 or DateTime64.

toStartOfYear

Rounds down a date or date with time to the first day of the year. Returns the date.

toStartOfISOYear

Rounds down a date or date with time to the first day of ISO year. Returns the date.

toStartOfQuarter

Rounds down a date or date with time to the first day of the quarter. The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. Returns the date.

toStartOfMonth

Rounds down a date or date with time to the first day of the month. Returns the date.

NOTE

The behavior of parsing incorrect dates is implementation specific. ClickHouse may return zero date, throw an exception or do “natural” overflow.

If toLastDayOfMonth is called with an argument of type Date greater then 2149-05-31, the result will be calculated from the argument 2149-05-31 instead.

toMonday

Rounds down a date or date with time to the nearest Monday. Returns the date.

toStartOfWeek(t[,mode])

Rounds down a date or date with time to the nearest Sunday or Monday by mode. Returns the date. The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.

toStartOfDay

Rounds down a date with time to the start of the day.

toStartOfHour

Rounds down a date with time to the start of the hour.

toStartOfMinute

Rounds down a date with time to the start of the minute.

toStartOfSecond

Truncates sub-seconds.

Syntax

toStartOfSecond(value, [timezone])

Arguments

  • value — Date and time. DateTime64.

  • timezoneTimezone for the returned value (optional). If not specified, the function uses the timezone of the value parameter. String.

Returned value

  • Input value without sub-seconds.

Type: DateTime64.

Examples

Query without timezone:

WITH toDateTime64('2020-01-01 10:20:30.999', 3) AS dt64
SELECT toStartOfSecond(dt64);

Result:

┌───toStartOfSecond(dt64)─┐
│ 2020-01-01 10:20:30.000 │
└─────────────────────────┘

Query with timezone:

WITH toDateTime64('2020-01-01 10:20:30.999', 3) AS dt64
SELECT toStartOfSecond(dt64, 'Asia/Istanbul');

Result:

┌─toStartOfSecond(dt64, 'Asia/Istanbul')─┐
│                2020-01-01 13:20:30.000 │
└────────────────────────────────────────┘

See also

  • Timezone server configuration parameter.

toStartOfFiveMinutes

Rounds down a date with time to the start of the five-minute interval.

toStartOfTenMinutes

Rounds down a date with time to the start of the ten-minute interval.

toStartOfFifteenMinutes

Rounds down the date with time to the start of the fifteen-minute interval.

toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone])

This is a generalization of other functions named toStartOf*. For example, toStartOfInterval(t, INTERVAL 1 year) returns the same as toStartOfYear(t), toStartOfInterval(t, INTERVAL 1 month) returns the same as toStartOfMonth(t), toStartOfInterval(t, INTERVAL 1 day) returns the same as toStartOfDay(t), toStartOfInterval(t, INTERVAL 15 minute) returns the same as toStartOfFifteenMinutes(t) etc.

toTime

Converts a date with time to a certain fixed date, while preserving the time.

toRelativeYearNum

Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.

toRelativeQuarterNum

Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past.

toRelativeMonthNum

Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.

toRelativeWeekNum

Converts a date with time or date to the number of the week, starting from a certain fixed point in the past.

toRelativeDayNum

Converts a date with time or date to the number of the day, starting from a certain fixed point in the past.

toRelativeHourNum

Converts a date with time or date to the number of the hour, starting from a certain fixed point in the past.

toRelativeMinuteNum

Converts a date with time or date to the number of the minute, starting from a certain fixed point in the past.

toRelativeSecondNum

Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.

toISOYear

Converts a date or date with time to a UInt16 number containing the ISO Year number.

toISOWeek

Converts a date or date with time to a UInt8 number containing the ISO Week number.

toWeek(date[,mode])

This function returns the week number for date or datetime. The two-argument form of toWeek() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the default mode is 0. toISOWeek()is a compatibility function that is equivalent to toWeek(date,3). The following table describes how the mode argument works.

ModeFirst day of weekRangeWeek 1 is the first week …

0

Sunday

0-53

with a Sunday in this year

1

Monday

0-53

with 4 or more days this year

2

Sunday

1-53

with a Sunday in this year

3

Monday

1-53

with 4 or more days this year

4

Sunday

0-53

with 4 or more days this year

5

Monday

0-53

with a Monday in this year

6

Sunday

1-53

with 4 or more days this year

7

Monday

1-53

with a Monday in this year

8

Sunday

1-53

contains January 1

9

Monday

1-53

contains January 1

For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:

  • If the week containing January 1 has 4 or more days in the new year, it is week 1.

  • Otherwise, it is the last week of the previous year, and the next week is week 1.

For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It does not matter how many days in the new year the week contained, even if it contained only one day.

toWeek(date, [, mode][, Timezone])

Arguments

  • date – Date or DateTime.

  • mode – Optional parameter, Range of values is [0,9], default is 0.

  • Timezone – Optional parameter, it behaves like any other conversion function.

Example

SELECT toDate('2016-12-27') AS date, toWeek(date) AS week0, toWeek(date,1) AS week1, toWeek(date,9) AS week9;
┌───────date─┬─week0─┬─week1─┬─week9─┐
│ 2016-12-27 │    52 │    52 │     1 │
└────────────┴───────┴───────┴───────┘

toYearWeek(date[,mode])

Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.

The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.

toISOYear()is a compatibility function that is equivalent to intDiv(toYearWeek(date,3),100).

Example

SELECT toDate('2016-12-27') AS date, toYearWeek(date) AS yearWeek0, toYearWeek(date,1) AS yearWeek1, toYearWeek(date,9) AS yearWeek9;
┌───────date─┬─yearWeek0─┬─yearWeek1─┬─yearWeek9─┐
│ 2016-12-27 │    201652 │    201652 │    201701 │
└────────────┴───────────┴───────────┴───────────┘

date_trunc

Truncates date and time data to the specified part of date.

Syntax

date_trunc(unit, value[, timezone])

Alias: dateTrunc.

Arguments

  • unit — The type of interval to truncate the result. String Literal. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

  • value — Date and time. DateTime or DateTime64.

  • timezoneTimezone name for the returned value (optional). If not specified, the function uses the timezone of the value parameter. String.

Returned value

  • Value, truncated to the specified part of date.

Type: DateTime.

Example

Query without timezone:

SELECT now(), date_trunc('hour', now());

Result:

┌───────────────now()─┬─date_trunc('hour', now())─┐
│ 2020-09-28 10:40:45 │       2020-09-28 10:00:00 │
└─────────────────────┴───────────────────────────┘

Query with the specified timezone:

SELECT now(), date_trunc('hour', now(), 'Asia/Istanbul');

Result:

┌───────────────now()─┬─date_trunc('hour', now(), 'Asia/Istanbul')─┐
│ 2020-09-28 10:46:26 │                        2020-09-28 13:00:00 │
└─────────────────────┴────────────────────────────────────────────┘

See Also

date_add

Adds the time interval or date interval to the provided date or date with time.

Syntax

date_add(unit, value, date)

Aliases: dateAdd, DATE_ADD.

Arguments

  • unit — The type of interval to add. String. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

  • value — Value of interval to add. Int.

  • date — The date or date with time to which value is added. Date or DateTime.

Returned value

Date or date with time obtained by adding value, expressed in unit, to date.

Type: Date or DateTime.

Example

Query:

SELECT date_add(YEAR, 3, toDate('2018-01-01'));

Result:

┌─plus(toDate('2018-01-01'), toIntervalYear(3))─┐
│                                    2021-01-01 │
└───────────────────────────────────────────────┘

date_diff

Returns the difference between two dates or dates with time values. The difference is calculated using relative units, e.g. the difference between 2022-01-01 and 2021-12-29 is 3 days for day unit (see toRelativeDayNum), 1 month for month unit (see toRelativeMonthNum), 1 year for year unit (see toRelativeYearNum).

Syntax

date_diff('unit', startdate, enddate, [timezone])

Aliases: dateDiff, DATE_DIFF.

Arguments

  • unit — The type of interval for result. String. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

  • startdate — The first time value to subtract (the subtrahend). Date, Date32, DateTime or DateTime64.

  • enddate — The second time value to subtract from (the minuend). Date, Date32, DateTime or DateTime64.

  • timezoneTimezone name (optional). If specified, it is applied to both startdate and enddate. If not specified, timezones of startdate and enddate are used. If they are not the same, the result is unspecified. String.

Returned value

Difference between enddate and startdate expressed in unit.

Type: Int.

Example

Query:

SELECT dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'));

Result:

┌─dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'))─┐
│                                                                                     25 │
└────────────────────────────────────────────────────────────────────────────────────────┘

Query:

SELECT
    toDate('2022-01-01') AS e,
    toDate('2021-12-29') AS s,
    dateDiff('day', s, e) AS day_diff,
    dateDiff('month', s, e) AS month__diff,
    dateDiff('year', s, e) AS year_diff;

Result:

┌──────────e─┬──────────s─┬─day_diff─┬─month__diff─┬─year_diff─┐
│ 2022-01-01 │ 2021-12-29 │        3 │           1 │         1 │
└────────────┴────────────┴──────────┴─────────────┴───────────┘

date_sub

Subtracts the time interval or date interval from the provided date or date with time.

Syntax

date_sub(unit, value, date)

Aliases: dateSub, DATE_SUB.

Arguments

  • unit — The type of interval to subtract. String. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

  • value — Value of interval to subtract. Int.

  • date — The date or date with time from which value is subtracted. Date or DateTime.

Returned value

Date or date with time obtained by subtracting value, expressed in unit, from date.

Type: Date or DateTime.

Example

Query:

SELECT date_sub(YEAR, 3, toDate('2018-01-01'));

Result:

┌─minus(toDate('2018-01-01'), toIntervalYear(3))─┐
│                                     2015-01-01 │
└────────────────────────────────────────────────┘

timestamp_add

Adds the specified time value with the provided date or date time value.

Syntax

timestamp_add(date, INTERVAL value unit)

Aliases: timeStampAdd, TIMESTAMP_ADD.

Arguments

  • date — Date or date with time. Date or DateTime.

  • value — Value of interval to add. Int.

  • unit — The type of interval to add. String. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

Returned value

Date or date with time with the specified value expressed in unit added to date.

Type: Date or DateTime.

Example

Query:

select timestamp_add(toDate('2018-01-01'), INTERVAL 3 MONTH);

Result:

┌─plus(toDate('2018-01-01'), toIntervalMonth(3))─┐
│                                     2018-04-01 │
└────────────────────────────────────────────────┘

timestamp_sub

Subtracts the time interval from the provided date or date with time.

Syntax

timestamp_sub(unit, value, date)

Aliases: timeStampSub, TIMESTAMP_SUB.

Arguments

  • unit — The type of interval to subtract. String. Possible values:

    • second

    • minute

    • hour

    • day

    • week

    • month

    • quarter

    • year

  • value — Value of interval to subtract. Int.

  • date — Date or date with time. Date or DateTime.

Returned value

Date or date with time obtained by subtracting value, expressed in unit, from date.

Type: Date or DateTime.

Example

Query:

select timestamp_sub(MONTH, 5, toDateTime('2018-12-18 01:02:03'));

Result:

┌─minus(toDateTime('2018-12-18 01:02:03'), toIntervalMonth(5))─┐
│                                          2018-07-18 01:02:03 │
└──────────────────────────────────────────────────────────────┘

now

Returns the current date and time at the moment of query analysis. The function is a constant expression.

Syntax

now([timezone])

Arguments

Returned value

  • Current date and time.

Type: DateTime.

Example

Query without timezone:

SELECT now();

Result:

┌───────────────now()─┐
│ 2020-10-17 07:42:09 │
└─────────────────────┘

Query with the specified timezone:

SELECT now('Asia/Istanbul');

Result:

┌─now('Asia/Istanbul')─┐
│  2020-10-17 10:42:23 │
└──────────────────────┘

Returns the current date and time with sub-second precision at the moment of query analysis. The function is a constant expression.

Syntax

now64([scale], [timezone])

Arguments

  • scale - Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ]. Typically are used - 3 (default) (milliseconds), 6 (microseconds), 9 (nanoseconds).

  • timezoneTimezone name for the returned value (optional). String.

Returned value

  • Current date and time with sub-second precision.

Type: DateTime64.

Example

SELECT now64(), now64(9, 'Asia/Istanbul');

Result:

┌─────────────────now64()─┬─────now64(9, 'Asia/Istanbul')─┐
│ 2022-08-21 19:34:26.196 │ 2022-08-21 22:34:26.196542766 │
└─────────────────────────┴───────────────────────────────┘

today

Accepts zero arguments and returns the current date at one of the moments of query analysis. The same as ‘toDate(now())’.

yesterday

Accepts zero arguments and returns yesterday’s date at one of the moments of query analysis. The same as ‘today() - 1’.

timeSlot

Rounds the time to the half hour.

toYYYYMM

Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM).

toYYYYMMDD

Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).

toYYYYMMDDhhmmss

Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).

addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters

Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:

WITH
    toDate('2018-01-01') AS date,
    toDateTime('2018-01-01 00:00:00') AS date_time
SELECT
    addYears(date, 1) AS add_years_with_date,
    addYears(date_time, 1) AS add_years_with_date_time
┌─add_years_with_date─┬─add_years_with_date_time─┐
│          2019-01-01 │      2019-01-01 00:00:00 │
└─────────────────────┴──────────────────────────┘

subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters

Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:

WITH
    toDate('2019-01-01') AS date,
    toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
    subtractYears(date, 1) AS subtract_years_with_date,
    subtractYears(date_time, 1) AS subtract_years_with_date_time
┌─subtract_years_with_date─┬─subtract_years_with_date_time─┐
│               2018-01-01 │           2018-01-01 00:00:00 │
└──────────────────────────┴───────────────────────────────┘

formatDateTime

Formats a Time according to the given Format string. Format is a constant expression, so you cannot have multiple formats for a single result column.

formatDateTime uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.

Syntax

formatDateTime(Time, Format[, Timezone])

Returned value(s)

Returns time and date values according to the determined format.

Replacement fields Using replacement fields, you can define a pattern for the resulting string. “Example” column shows formatting result for 2018-01-02 22:33:44.

PlaceholderDescriptionExample

%C

year divided by 100 and truncated to integer (00-99)

20

%d

day of the month, zero-padded (01-31)

02

%D

Short MM/DD/YY date, equivalent to %m/%d/%y

01/02/18

%e

day of the month, space-padded ( 1-31)

2

%f

fractional second from the fractional part of DateTime64

1234560

%F

short YYYY-MM-DD date, equivalent to %Y-%m-%d

2018-01-02

%G

four-digit year format for ISO week number, calculated from the week-based year defined by the ISO 8601 standard, normally useful only with %V

2018

%g

two-digit year format, aligned to ISO 8601, abbreviated from four-digit notation

18

%H

hour in 24h format (00-23)

22

%I

hour in 12h format (01-12)

10

%j

day of the year (001-366)

002

%m

month as a decimal number (01-12)

01

%M

minute (00-59)

33

%n

new-line character (‘’)

%p

AM or PM designation

PM

%Q

Quarter (1-4)

1

%R

24-hour HH:MM time, equivalent to %H:%M

22:33

%S

second (00-59)

44

%t

horizontal-tab character (’)

%T

ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S

22:33:44

%u

ISO 8601 weekday as number with Monday as 1 (1-7)

2

%V

ISO 8601 week number (01-53)

01

%w

weekday as a decimal number with Sunday as 0 (0-6)

2

%y

Year, last two digits (00-99)

18

%Y

Year

2018

%z

Time offset from UTC as +HHMM or -HHMM

-0500

%%

a % sign

%

Example

Query:

SELECT formatDateTime(toDate('2010-01-04'), '%g')

Result:

┌─formatDateTime(toDate('2010-01-04'), '%g')─┐
│ 10                                         │
└────────────────────────────────────────────┘

Query:

SELECT formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f')

Result:

┌─formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f')─┐
│ 1234560                                                             │
└─────────────────────────────────────────────────────────────────────┘

See Also

formatDateTimeInJodaSyntax

Similar to formatDateTime, except that it formats datetime in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.

Replacement fields

Using replacement fields, you can define a pattern for the resulting string.

PlaceholderDescriptionPresentationExamples

G

era

text

AD

C

century of era (>=0)

number

20

Y

year of era (>=0)

year

1996

x

weekyear(not supported yet)

year

1996

w

week of weekyear(not supported yet)

number

27

e

day of week

number

2

E

day of week

text

Tuesday; Tue

y

year

year

1996

D

day of year

number

189

M

month of year

month

July; Jul; 07

d

day of month

number

10

a

halfday of day

text

PM

K

hour of halfday (0~11)

number

0

h

clockhour of halfday (1~12)

number

12

H

hour of day (0~23)

number

0

k

clockhour of day (1~24)

number

24

m

minute of hour

number

30

s

second of minute

number

55

S

fraction of second(not supported yet)

number

978

z

time zone(short name not supported yet)

text

Pacific Standard Time; PST

Z

time zone offset/id(not supported yet)

zone

-0800; -08:00; America/Los_Angeles

'

escape for text

delimiter

''

single quote

literal

'

Example

Query:

SELECT formatDateTimeInJodaSyntax(toDateTime('2010-01-04 12:34:56'), 'yyyy-MM-dd HH:mm:ss')

Result:

┌─formatDateTimeInJodaSyntax(toDateTime('2010-01-04 12:34:56'), 'yyyy-MM-dd HH:mm:ss')─┐
│ 2010-01-04 12:34:56                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────┘

dateName

Returns specified part of date.

Syntax

dateName(date_part, date)

Arguments

  • date_part — Date part. Possible values: 'year', 'quarter', 'month', 'week', 'dayofyear', 'day', 'weekday', 'hour', 'minute', 'second'. String.

  • date — Date. Date, Date32, DateTime or DateTime64.

  • timezone — Timezone. Optional. String.

Returned value

  • The specified part of date.

Type: String

Example

Query:

WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT
    dateName('year', date_value),
    dateName('month', date_value),
    dateName('day', date_value);

Result:

┌─dateName('year', date_value)─┬─dateName('month', date_value)─┬─dateName('day', date_value)─┐
│ 2021                         │ April                         │ 14                          │
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────┘

FROM_UNIXTIME

Function converts Unix timestamp to a calendar date and a time of a day. When there is only a single argument of Integer type, it acts in the same way as toDateTime and return DateTime type.

FROM_UNIXTIME uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.

Alias: fromUnixTimestamp.

Example:

Query:

SELECT FROM_UNIXTIME(423543535);

Result:

┌─FROM_UNIXTIME(423543535)─┐
│      1983-06-04 10:58:55 │
└──────────────────────────┘

When there are two or three arguments, the first an Integer, Date, Date32, DateTime or DateTime64, the second a constant format string and the third an optional constant time zone string — it acts in the same way as formatDateTime and return String type.

For example:

SELECT FROM_UNIXTIME(1234334543, '%Y-%m-%d %R:%S') AS DateTime;
┌─DateTime────────────┐
│ 2009-02-11 14:42:23 │
└─────────────────────┘

See Also

fromUnixTimestampInJodaSyntax

Similar to FROM_UNIXTIME, except that it formats time in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.

Example: Query:

SELECT fromUnixTimestampInJodaSyntax(1669804872, 'yyyy-MM-dd HH:mm:ss', 'UTC');

Result:

┌─fromUnixTimestampInJodaSyntax(1669804872, 'yyyy-MM-dd HH:mm:ss', 'UTC')─┐
│ 2022-11-30 10:41:12                                                        │
└────────────────────────────────────────────────────────────────────────────┘

toModifiedJulianDay

Converts a Proleptic Gregorian calendar date in text form YYYY-MM-DD to a Modified Julian Day number in Int32. This function supports date from 0000-01-01 to 9999-12-31. It raises an exception if the argument cannot be parsed as a date, or the date is invalid.

Syntax

toModifiedJulianDay(date)

Arguments

Returned value

  • Modified Julian Day number.

Type: Int32.

Example

Query:

SELECT toModifiedJulianDay('2020-01-01');

Result:

┌─toModifiedJulianDay('2020-01-01')─┐
│                             58849 │
└───────────────────────────────────┘

toModifiedJulianDayOrNull

Similar to toModifiedJulianDay(), but instead of raising exceptions it returns NULL.

Syntax

toModifiedJulianDayOrNull(date)

Arguments

Returned value

  • Modified Julian Day number.

Type: Nullable(Int32).

Example

Query:

SELECT toModifiedJulianDayOrNull('2020-01-01');

Result:

┌─toModifiedJulianDayOrNull('2020-01-01')─┐
│                                   58849 │
└─────────────────────────────────────────┘

fromModifiedJulianDay

Converts a Modified Julian Day number to a Proleptic Gregorian calendar date in text form YYYY-MM-DD. This function supports day number from -678941 to 2973119 (which represent 0000-01-01 and 9999-12-31 respectively). It raises an exception if the day number is outside of the supported range.

Syntax

fromModifiedJulianDay(day)

Arguments

Returned value

  • Date in text form.

Type: String

Example

Query:

SELECT fromModifiedJulianDay(58849);

Result:

┌─fromModifiedJulianDay(58849)─┐
│ 2020-01-01                   │
└──────────────────────────────┘

fromModifiedJulianDayOrNull

Similar to fromModifiedJulianDayOrNull(), but instead of raising exceptions it returns NULL.

Syntax

fromModifiedJulianDayOrNull(day)

Arguments

Returned value

  • Date in text form.

Type: Nullable(String)

Example

Query:

SELECT fromModifiedJulianDayOrNull(58849);

Result:

┌─fromModifiedJulianDayOrNull(58849)─┐
│ 2020-01-01                         │
└────────────────────────────────────┘

DICTIONARIES

For information on connecting and configuring dictionaries, see Dictionaries.

dictGet, dictGetOrDefault, dictGetOrNull

Retrieves values from a dictionary.

dictGet('dict_name', attr_names, id_expr)
dictGetOrDefault('dict_name', attr_names, id_expr, default_value_expr)
dictGetOrNull('dict_name', attr_name, id_expr)

Arguments

  • dict_name — Name of the dictionary. String literal.

  • attr_names — Name of the column of the dictionary, String literal, or tuple of column names, Tuple(String literal).

  • id_expr — Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.

  • default_value_expr — Values returned if the dictionary does not contain a row with the id_expr key. Expression or Tuple(Expression), returning the value (or values) in the data types configured for the attr_names attribute.

Returned value

  • If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to id_expr.

  • If there is no the key, corresponding to id_expr, in the dictionary, then:

    - `dictGet` returns the content of the `<null_value>` element specified for the attribute in the dictionary configuration.
    - `dictGetOrDefault` returns the value passed as the `default_value_expr` parameter.
    - `dictGetOrNull` returns `NULL` in case key was not found in dictionary.

ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.

Example for simple key dictionary

Create a text file ext-dict-test.csv containing the following:

1,1
2,2

The first column is id, the second column is c1.

Configure the dictionary:

<clickhouse>
    <dictionary>
        <name>ext-dict-test</name>
        <source>
            <file>
                <path>/path-to/ext-dict-test.csv</path>
                <format>CSV</format>
            </file>
        </source>
        <layout>
            <flat />
        </layout>
        <structure>
            <id>
                <name>id</name>
            </id>
            <attribute>
                <name>c1</name>
                <type>UInt32</type>
                <null_value></null_value>
            </attribute>
        </structure>
        <lifetime>0</lifetime>
    </dictionary>
</clickhouse>

Perform the query:

SELECT
    dictGetOrDefault('ext-dict-test', 'c1', number + 1, toUInt32(number * 10)) AS val,
    toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─┬─type───┐
│   1 │ UInt32 │
│   2 │ UInt32 │
│  20 │ UInt32 │
└─────┴────────┘

Example for complex key dictionary

Create a text file ext-dict-mult.csv containing the following:

1,1,'1'
2,2,'2'
3,3,'3'

The first column is id, the second is c1, the third is c2.

Configure the dictionary:

<clickhouse>
    <dictionary>
        <name>ext-dict-mult</name>
        <source>
            <file>
                <path>/path-to/ext-dict-mult.csv</path>
                <format>CSV</format>
            </file>
        </source>
        <layout>
            <flat />
        </layout>
        <structure>
            <id>
                <name>id</name>
            </id>
            <attribute>
                <name>c1</name>
                <type>UInt32</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>c2</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>
        <lifetime>0</lifetime>
    </dictionary>
</clickhouse>

Perform the query:

SELECT
    dictGet('ext-dict-mult', ('c1','c2'), number + 1) AS val,
    toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─────┬─type──────────────────┐
│ (1,'1') │ Tuple(UInt8, String)  │
│ (2,'2') │ Tuple(UInt8, String)  │
│ (3,'3') │ Tuple(UInt8, String)  │
└─────────┴───────────────────────┘

Example for range key dictionary

Input table:

CREATE TABLE range_key_dictionary_source_table
(
    key UInt64,
    start_date Date,
    end_date Date,
    value String,
    value_nullable Nullable(String)
)
ENGINE = TinyLog();

INSERT INTO range_key_dictionary_source_table VALUES(1, toDate('2019-05-20'), toDate('2019-05-20'), 'First', 'First');
INSERT INTO range_key_dictionary_source_table VALUES(2, toDate('2019-05-20'), toDate('2019-05-20'), 'Second', NULL);
INSERT INTO range_key_dictionary_source_table VALUES(3, toDate('2019-05-20'), toDate('2019-05-20'), 'Third', 'Third');

Create the dictionary:

CREATE DICTIONARY range_key_dictionary
(
    key UInt64,
    start_date Date,
    end_date Date,
    value String,
    value_nullable Nullable(String)
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() TABLE 'range_key_dictionary_source_table'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(RANGE_HASHED())
RANGE(MIN start_date MAX end_date);

Perform the query:

SELECT
    (number, toDate('2019-05-20')),
    dictHas('range_key_dictionary', number, toDate('2019-05-20')),
    dictGetOrNull('range_key_dictionary', 'value', number, toDate('2019-05-20')),
    dictGetOrNull('range_key_dictionary', 'value_nullable', number, toDate('2019-05-20')),
    dictGetOrNull('range_key_dictionary', ('value', 'value_nullable'), number, toDate('2019-05-20'))
FROM system.numbers LIMIT 5 FORMAT TabSeparated;

Result:

(0,'2019-05-20')        0       \N      \N      (NULL,NULL)
(1,'2019-05-20')        1       First   First   ('First','First')
(2,'2019-05-20')        1       Second  \N      ('Second',NULL)
(3,'2019-05-20')        1       Third   Third   ('Third','Third')
(4,'2019-05-20')        0       \N      \N      (NULL,NULL)

See Also

dictHas

Checks whether a key is present in a dictionary.

dictHas('dict_name', id_expr)

Arguments

  • dict_name — Name of the dictionary. String literal.

  • id_expr — Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.

Returned value

  • 0, if there is no key.

  • 1, if there is a key.

Type: UInt8.

dictGetHierarchy

Creates an array, containing all the parents of a key in the hierarchical dictionary.

Syntax

dictGetHierarchy('dict_name', key)

Arguments

Returned value

  • Parents for the key.

Type: Array(UInt64).

dictIsIn

Checks the ancestor of a key through the whole hierarchical chain in the dictionary.

dictIsIn('dict_name', child_id_expr, ancestor_id_expr)

Arguments

  • dict_name — Name of the dictionary. String literal.

  • child_id_expr — Key to be checked. Expression returning a UInt64-type value.

  • ancestor_id_expr — Alleged ancestor of the child_id_expr key. Expression returning a UInt64-type value.

Returned value

  • 0, if child_id_expr is not a child of ancestor_id_expr.

  • 1, if child_id_expr is a child of ancestor_id_expr or if child_id_expr is an ancestor_id_expr.

Type: UInt8.

Other Functions

ClickHouse supports specialized functions that convert dictionary attribute values to a specific data type regardless of the dictionary configuration.

Functions:

  • dictGetInt8, dictGetInt16, dictGetInt32, dictGetInt64

  • dictGetUInt8, dictGetUInt16, dictGetUInt32, dictGetUInt64

  • dictGetFloat32, dictGetFloat64

  • dictGetDate

  • dictGetDateTime

  • dictGetUUID

  • dictGetString

All these functions have the OrDefault modification. For example, dictGetDateOrDefault.

Syntax:

dictGet[Type]('dict_name', 'attr_name', id_expr)
dictGet[Type]OrDefault('dict_name', 'attr_name', id_expr, default_value_expr)

Arguments

  • dict_name — Name of the dictionary. String literal.

  • attr_name — Name of the column of the dictionary. String literal.

  • id_expr — Key value. Expression returning a UInt64 or Tuple-type value depending on the dictionary configuration.

  • default_value_expr — Value returned if the dictionary does not contain a row with the id_expr key. Expression returning the value in the data type configured for the attr_name attribute.

Returned value

  • If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to id_expr.

  • If there is no requested id_expr in the dictionary then:

    - `dictGet[Type]` returns the content of the `<null_value>` element specified for the attribute in the dictionary configuration.
    - `dictGet[Type]OrDefault` returns the value passed as the `default_value_expr` parameter.

ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.

ENCODING

char

Returns the string with the length as the number of passed arguments and each byte has the value of corresponding argument. Accepts multiple arguments of numeric types. If the value of argument is out of range of UInt8 data type, it is converted to UInt8 with possible rounding and overflow.

Syntax

char(number_1, [number_2, ..., number_n]);

Arguments

  • number_1, number_2, ..., number_n — Numerical arguments interpreted as integers. Types: Int, Float.

Returned value

  • a string of given bytes.

Type: String.

Example

Query:

SELECT char(104.1, 101, 108.9, 108.9, 111) AS hello;

Result:

┌─hello─┐
│ hello │
└───────┘

You can construct a string of arbitrary encoding by passing the corresponding bytes. Here is example for UTF-8:

Query:

SELECT char(0xD0, 0xBF, 0xD1, 0x80, 0xD0, 0xB8, 0xD0, 0xB2, 0xD0, 0xB5, 0xD1, 0x82) AS hello;

Result:

┌─hello──┐
│ привет │
└────────┘

Query:

SELECT char(0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD) AS hello;

Result:

┌─hello─┐
│ 你好  │
└───────┘

hex

Returns a string containing the argument’s hexadecimal representation.

Alias: HEX.

Syntax

hex(arg)

The function is using uppercase letters A-F and not using any prefixes (like 0x) or suffixes (like h).

For integer arguments, it prints hex digits (“nibbles”) from the most significant to least significant (big-endian or “human-readable” order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero.

Values of type Date and DateTime are formatted as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime).

For String and FixedString, all bytes are simply encoded as two hexadecimal numbers. Zero bytes are not omitted.

Values of Float and Decimal types are encoded as their representation in memory. As we support little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted.

Values of UUID type are encoded as big-endian order string.

Arguments

Returned value

  • A string with the hexadecimal representation of the argument.

Type: String.

Examples

Query:

SELECT hex(1);

Result:

01

Query:

SELECT hex(toFloat32(number)) AS hex_presentation FROM numbers(15, 2);

Result:

┌─hex_presentation─┐
│ 00007041         │
│ 00008041         │
└──────────────────┘

Query:

SELECT hex(toFloat64(number)) AS hex_presentation FROM numbers(15, 2);

Result:

┌─hex_presentation─┐
│ 0000000000002E40 │
│ 0000000000003040 │
└──────────────────┘

Query:

SELECT lower(hex(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0'))) as uuid_hex

Result:

┌─uuid_hex─────────────────────────┐
│ 61f0c4045cb311e7907ba6006ad3dba0 │
└──────────────────────────────────┘

unhex

Performs the opposite operation of hex. It interprets each pair of hexadecimal digits (in the argument) as a number and converts it to the byte represented by the number. The return value is a binary string (BLOB).

If you want to convert the result to a number, you can use the reverse and reinterpretAs<Type> functions.

NOTE

If unhex is invoked from within the clickhouse-client, binary strings display using UTF-8.

Alias: UNHEX.

Syntax

unhex(arg)

Arguments

  • arg — A string containing any number of hexadecimal digits. Type: String, FixedString.

Supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the least significant half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn’t thrown). For a numeric argument the inverse of hex(N) is not performed by unhex().

Returned value

  • A binary string (BLOB).

Type: String.

Example

Query:

SELECT unhex('303132'), UNHEX('4D7953514C');

Result:

┌─unhex('303132')─┬─unhex('4D7953514C')─┐
│ 012             │ MySQL               │
└─────────────────┴─────────────────────┘

Query:

SELECT reinterpretAsUInt64(reverse(unhex('FFF'))) AS num;

Result:

┌──num─┐
│ 4095 │
└──────┘

bitmaskToList(num)

Accepts an integer. Returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.

bitmaskToArray(num)

Accepts an integer. Returns an array of UInt64 numbers containing the list of powers of two that total the source number when summed. Numbers in the array are in ascending order.

ENCRYPTION

These functions implement encryption and decryption of data with AES (Advanced Encryption Standard) algorithm.

Key length depends on encryption mode. It is 16, 24, and 32 bytes long for -128-, -196-, and -256- modes respectively.

Initialization vector length is always 16 bytes (bytes in excess of 16 are ignored).

Note that these functions work slowly until ClickHouse 21.1.

encrypt

This function encrypts data using these modes:

  • aes-128-ecb, aes-192-ecb, aes-256-ecb

  • aes-128-cbc, aes-192-cbc, aes-256-cbc

  • aes-128-ofb, aes-192-ofb, aes-256-ofb

  • aes-128-gcm, aes-192-gcm, aes-256-gcm

  • aes-128-ctr, aes-192-ctr, aes-256-ctr

Syntax

encrypt('mode', 'plaintext', 'key' [, iv, aad])

Arguments

  • mode — Encryption mode. String.

  • plaintext — Text thats need to be encrypted. String.

  • key — Encryption key. String.

  • iv — Initialization vector. Required for -gcm modes, optinal for others. String.

  • aad — Additional authenticated data. It isn't encrypted, but it affects decryption. Works only in -gcm modes, for others would throw an exception. String.

Returned value

  • Ciphertext binary string. String.

Examples

Create this table:

Query:

CREATE TABLE encryption_test
(
    `comment` String,
    `secret` String
)
ENGINE = Memory;

Insert some data (please avoid storing the keys/ivs in the database as this undermines the whole concept of encryption), also storing 'hints' is unsafe too and used only for illustrative purposes:

Query:

INSERT INTO encryption_test VALUES('aes-256-ofb no IV', encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212')),\
('aes-256-ofb no IV, different key', encrypt('aes-256-ofb', 'Secret', 'keykeykeykeykeykeykeykeykeykeyke')),\
('aes-256-ofb with IV', encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv')),\
('aes-256-cbc no IV', encrypt('aes-256-cbc', 'Secret', '12345678910121314151617181920212'));

Query:

SELECT comment, hex(secret) FROM encryption_test;

Result:

┌─comment──────────────────────────┬─hex(secret)──────────────────────┐
│ aes-256-ofb no IV                │ B4972BDC4459                     │
│ aes-256-ofb no IV, different key │ 2FF57C092DC9                     │
│ aes-256-ofb with IV              │ 5E6CB398F653                     │
│ aes-256-cbc no IV                │ 1BC0629A92450D9E73A00E7D02CF4142 │
└──────────────────────────────────┴──────────────────────────────────┘

Example with -gcm:

Query:

INSERT INTO encryption_test VALUES('aes-256-gcm', encrypt('aes-256-gcm', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv')), \
('aes-256-gcm with AAD', encrypt('aes-256-gcm', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv', 'aad'));

SELECT comment, hex(secret) FROM encryption_test WHERE comment LIKE '%gcm%';

Result:

┌─comment──────────────┬─hex(secret)──────────────────────────────────┐
│ aes-256-gcm          │ A8A3CCBC6426CFEEB60E4EAE03D3E94204C1B09E0254 │
│ aes-256-gcm with AAD │ A8A3CCBC6426D9A1017A0A932322F1852260A4AD6837 │
└──────────────────────┴──────────────────────────────────────────────┘

aes_encrypt_mysql

Compatible with mysql encryption and resulting ciphertext can be decrypted with AES_DECRYPT function.

Will produce the same ciphertext as encrypt on equal inputs. But when key or iv are longer than they should normally be, aes_encrypt_mysql will stick to what MySQL's aes_encrypt does: 'fold' key and ignore excess bits of iv.

Supported encryption modes:

  • aes-128-ecb, aes-192-ecb, aes-256-ecb

  • aes-128-cbc, aes-192-cbc, aes-256-cbc

  • aes-128-ofb, aes-192-ofb, aes-256-ofb

Syntax

aes_encrypt_mysql('mode', 'plaintext', 'key' [, iv])

Arguments

  • mode — Encryption mode. String.

  • plaintext — Text that needs to be encrypted. String.

  • key — Encryption key. If key is longer than required by mode, MySQL-specific key folding is performed. String.

  • iv — Initialization vector. Optional, only first 16 bytes are taken into account String.

Returned value

  • Ciphertext binary string. String.

Examples

Given equal input encrypt and aes_encrypt_mysql produce the same ciphertext:

Query:

SELECT encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv') = aes_encrypt_mysql('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv') AS ciphertexts_equal;

Result:

┌─ciphertexts_equal─┐
│                 1 │
└───────────────────┘

But encrypt fails when key or iv is longer than expected:

Query:

SELECT encrypt('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123');

Result:

Received exception from server (version 22.6.1):
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123').

While aes_encrypt_mysql produces MySQL-compatitalbe output:

Query:

SELECT hex(aes_encrypt_mysql('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123')) AS ciphertext;

Result:

┌─ciphertext───┐
│ 24E9E4966469 │
└──────────────┘

Notice how supplying even longer IV produces the same result

Query:

SELECT hex(aes_encrypt_mysql('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456')) AS ciphertext

Result:

┌─ciphertext───┐
│ 24E9E4966469 │
└──────────────┘

Which is binary equal to what MySQL produces on same inputs:

mysql> SET  block_encryption_mode='aes-256-ofb';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT aes_encrypt('Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456') as ciphertext;
+------------------------+
| ciphertext             |
+------------------------+
| 0x24E9E4966469         |
+------------------------+
1 row in set (0.00 sec)

decrypt

This function decrypts ciphertext into a plaintext using these modes:

  • aes-128-ecb, aes-192-ecb, aes-256-ecb

  • aes-128-cbc, aes-192-cbc, aes-256-cbc

  • aes-128-ofb, aes-192-ofb, aes-256-ofb

  • aes-128-gcm, aes-192-gcm, aes-256-gcm

  • aes-128-ctr, aes-192-ctr, aes-256-ctr

Syntax

decrypt('mode', 'ciphertext', 'key' [, iv, aad])

Arguments

  • mode — Decryption mode. String.

  • ciphertext — Encrypted text that needs to be decrypted. String.

  • key — Decryption key. String.

  • iv — Initialization vector. Required for -gcm modes, optinal for others. String.

  • aad — Additional authenticated data. Won't decrypt if this value is incorrect. Works only in -gcm modes, for others would throw an exception. String.

Returned value

Examples

Re-using table from encrypt.

Query:

SELECT comment, hex(secret) FROM encryption_test;

Result:

┌─comment──────────────┬─hex(secret)──────────────────────────────────┐
│ aes-256-gcm          │ A8A3CCBC6426CFEEB60E4EAE03D3E94204C1B09E0254 │
│ aes-256-gcm with AAD │ A8A3CCBC6426D9A1017A0A932322F1852260A4AD6837 │
└──────────────────────┴──────────────────────────────────────────────┘
┌─comment──────────────────────────┬─hex(secret)──────────────────────┐
│ aes-256-ofb no IV                │ B4972BDC4459                     │
│ aes-256-ofb no IV, different key │ 2FF57C092DC9                     │
│ aes-256-ofb with IV              │ 5E6CB398F653                     │
│ aes-256-cbc no IV                │ 1BC0629A92450D9E73A00E7D02CF4142 │
└──────────────────────────────────┴──────────────────────────────────┘

Now let's try to decrypt all that data.

Query:

SELECT comment, decrypt('aes-256-cfb128', secret, '12345678910121314151617181920212') as plaintext FROM encryption_test

Result:

┌─comment──────────────┬─plaintext──┐
│ aes-256-gcm          │ OQ�E
                             �t�7T�\���\�   │
│ aes-256-gcm with AAD │ OQ�E
                             �\��si����;�o�� │
└──────────────────────┴────────────┘
┌─comment──────────────────────────┬─plaintext─┐
│ aes-256-ofb no IV                │ Secret    │
│ aes-256-ofb no IV, different key │ �4�
                                        �         │
│ aes-256-ofb with IV              │ ���6�~        │
 │aes-256-cbc no IV                │ �2*4�h3c�4w��@
└──────────────────────────────────┴───────────┘

Notice how only a portion of the data was properly decrypted, and the rest is gibberish since either mode, key, or iv were different upon encryption.

aes_decrypt_mysql

Compatible with mysql encryption and decrypts data encrypted with AES_ENCRYPT function.

Will produce same plaintext as decrypt on equal inputs. But when key or iv are longer than they should normally be, aes_decrypt_mysql will stick to what MySQL's aes_decrypt does: 'fold' key and ignore excess bits of IV.

Supported decryption modes:

  • aes-128-ecb, aes-192-ecb, aes-256-ecb

  • aes-128-cbc, aes-192-cbc, aes-256-cbc

  • aes-128-cfb128

  • aes-128-ofb, aes-192-ofb, aes-256-ofb

Syntax

aes_decrypt_mysql('mode', 'ciphertext', 'key' [, iv])

Arguments

  • mode — Decryption mode. String.

  • ciphertext — Encrypted text that needs to be decrypted. String.

  • key — Decryption key. String.

  • iv — Initialization vector. Optinal. String.

Returned value

Examples

Let's decrypt data we've previously encrypted with MySQL:

mysql> SET  block_encryption_mode='aes-256-ofb';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT aes_encrypt('Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456') as ciphertext;
+------------------------+
| ciphertext             |
+------------------------+
| 0x24E9E4966469         |
+------------------------+
1 row in set (0.00 sec)

Query:

SELECT aes_decrypt_mysql('aes-256-ofb', unhex('24E9E4966469'), '123456789101213141516171819202122', 'iviviviviviviviv123456') AS plaintext

Result:

┌─plaintext─┐
│ Secret    │
└───────────┘

FILE

file

Reads file as a String. The file content is not parsed, so any information is read as one string and placed into the specified column.

Syntax

file(path[, default])

Arguments

  • path — The relative path to the file from user_files_path. Path to file support following wildcards: *, ?, {abc,def} and {N..M} where N, M — numbers, 'abc', 'def' — strings.

  • default — The value that will be returned in the case when a file does not exist or cannot be accessed. Data types supported: String and NULL.

Example

Inserting data from files a.txt and b.txt into a table as strings:

Query:

INSERT INTO table SELECT file('a.txt'), file('b.txt');

See Also

GEOGRAPHICAL COORDINATES

greatCircleDistance

Calculates the distance between two points on the Earth’s surface using the great-circle formula.

greatCircleDistance(lon1Deg, lat1Deg, lon2Deg, lat2Deg)

Input parameters

  • lon1Deg — Longitude of the first point in degrees. Range: [-180°, 180°].

  • lat1Deg — Latitude of the first point in degrees. Range: [-90°, 90°].

  • lon2Deg — Longitude of the second point in degrees. Range: [-180°, 180°].

  • lat2Deg — Latitude of the second point in degrees. Range: [-90°, 90°].

Positive values correspond to North latitude and East longitude, and negative values correspond to South latitude and West longitude.

Returned value

The distance between two points on the Earth’s surface, in meters.

Generates an exception when the input parameter values fall outside of the range.

Example

SELECT greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)
┌─greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)─┐
│                                                14132374.194975413 │
└───────────────────────────────────────────────────────────────────┘

geoDistance

Similar to greatCircleDistance but calculates the distance on WGS-84 ellipsoid instead of sphere. This is more precise approximation of the Earth Geoid. The performance is the same as for greatCircleDistance (no performance drawback). It is recommended to use geoDistance to calculate the distances on Earth.

Technical note: for close enough points we calculate the distance using planar approximation with the metric on the tangent plane at the midpoint of the coordinates.

greatCircleAngle

Calculates the central angle between two points on the Earth’s surface using the great-circle formula.

greatCircleAngle(lon1Deg, lat1Deg, lon2Deg, lat2Deg)

Input parameters

  • lon1Deg — Longitude of the first point in degrees.

  • lat1Deg — Latitude of the first point in degrees.

  • lon2Deg — Longitude of the second point in degrees.

  • lat2Deg — Latitude of the second point in degrees.

Returned value

The central angle between two points in degrees.

Example

SELECT greatCircleAngle(0, 0, 45, 0) AS arc
┌─arc─┐
│  45 │
└─────┘

pointInEllipses

Checks whether the point belongs to at least one of the ellipses. Coordinates are geometric in the Cartesian coordinate system.

pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)

Input parameters

  • x, y — Coordinates of a point on the plane.

  • xᵢ, yᵢ — Coordinates of the center of the i-th ellipsis.

  • aᵢ, bᵢ — Axes of the i-th ellipsis in units of x, y coordinates.

The input parameters must be 2+4⋅n, where n is the number of ellipses.

Returned values

1 if the point is inside at least one of the ellipses; 0if it is not.

Example

SELECT pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)
┌─pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)─┐
│                                               1 │
└─────────────────────────────────────────────────┘

pointInPolygon

Checks whether the point belongs to the polygon on the plane.

pointInPolygon((x, y), [(a, b), (c, d) ...], ...)

Input values

  • (x, y) — Coordinates of a point on the plane. Data type — Tuple — A tuple of two numbers.

  • [(a, b), (c, d) ...] — Polygon vertices. Data type — Array. Each vertex is represented by a pair of coordinates (a, b). Vertices should be specified in a clockwise or counterclockwise order. The minimum number of vertices is 3. The polygon must be constant.

  • The function also supports polygons with holes (cut out sections). In this case, add polygons that define the cut out sections using additional arguments of the function. The function does not support non-simply-connected polygons.

Returned values

1 if the point is inside the polygon, 0 if it is not. If the point is on the polygon boundary, the function may return either 0 or 1.

Example

SELECT pointInPolygon((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]) AS res
┌─res─┐
│   1 │
└─────┘

GEOHASH

Geohash is the geocode system, which subdivides Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location.

If you need to manually convert geographic coordinates to geohash strings, you can use geohash.org.

geohashEncode

Encodes latitude and longitude as a geohash-string.

geohashEncode(longitude, latitude, [precision])

Input values

  • longitude - longitude part of the coordinate you want to encode. Floating in range[-180°, 180°]

  • latitude - latitude part of the coordinate you want to encode. Floating in range [-90°, 90°]

  • precision - Optional, length of the resulting encoded string, defaults to 12. Integer in range [1, 12]. Any value less than 1 or greater than 12 is silently converted to 12.

Returned values

  • alphanumeric String of encoded coordinate (modified version of the base32-encoding alphabet is used).

Example

SELECT geohashEncode(-5.60302734375, 42.593994140625, 0) AS res;
┌─res──────────┐
│ ezs42d000000 │
└──────────────┘

geohashDecode

Decodes any geohash-encoded string into longitude and latitude.

Input values

  • encoded string - geohash-encoded string.

Returned values

  • (longitude, latitude) - 2-tuple of Float64 values of longitude and latitude.

Example

SELECT geohashDecode('ezs42') AS res;
┌─res─────────────────────────────┐
│ (-5.60302734375,42.60498046875) │
└─────────────────────────────────┘

geohashesInBox

Returns an array of geohash-encoded strings of given precision that fall inside and intersect boundaries of given box, basically a 2D grid flattened into array.

Syntax

geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision)

Arguments

  • longitude_min — Minimum longitude. Range: [-180°, 180°]. Type: Float.

  • latitude_min — Minimum latitude. Range: [-90°, 90°]. Type: Float.

  • longitude_max — Maximum longitude. Range: [-180°, 180°]. Type: Float.

  • latitude_max — Maximum latitude. Range: [-90°, 90°]. Type: Float.

  • precision — Geohash precision. Range: [1, 12]. Type: UInt8.

NOTE

All coordinate parameters must be of the same type: either Float32 or Float64.

Returned values

  • Array of precision-long strings of geohash-boxes covering provided area, you should not rely on order of items.

  • [] - Empty array if minimum latitude and longitude values aren’t less than corresponding maximum values.

Type: Array(String).

NOTE

Function throws an exception if resulting array is over 10’000’000 items long.

Example

Query:

SELECT geohashesInBox(24.48, 40.56, 24.785, 40.81, 4) AS thasos;

Result:

┌─thasos──────────────────────────────────────┐
│ ['sx1q','sx1r','sx32','sx1w','sx1x','sx38'] │
└─────────────────────────────────────────────┘

H3 INDEXES

H3 is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be split into seven even but smaller ones ("children"), and so on.

The level of the hierarchy is called resolution and can receive a value from 0 till 15, where 0 is the base level with the largest and coarsest cells.

A latitude and longitude pair can be transformed to a 64-bit H3 index, identifying a grid cell.

The H3 index is used primarily for bucketing locations and other geospatial manipulations.

The full description of the H3 system is available at the Uber Engeneering site.

h3IsValid

Verifies whether the number is a valid H3 index.

Syntax

h3IsValid(h3index)

Parameter

  • h3index — Hexagon index number. Type: UInt64.

Returned values

  • 1 — The number is a valid H3 index.

  • 0 — The number is not a valid H3 index.

Type: UInt8.

Example

Query:

SELECT h3IsValid(630814730351855103) AS h3IsValid;

Result:

┌─h3IsValid─┐
│         1 │
└───────────┘

h3GetResolution

Defines the resolution of the given H3 index.

Syntax

h3GetResolution(h3index)

Parameter

  • h3index — Hexagon index number. Type: UInt64.

Returned values

  • Index resolution. Range: [0, 15].

  • If the index is not valid, the function returns a random value. Use h3IsValid to verify the index.

Type: UInt8.

Example

Query:

SELECT h3GetResolution(639821929606596015) AS resolution;

Result:

┌─resolution─┐
│         14 │
└────────────┘

h3EdgeAngle

Calculates the average length of the H3 hexagon edge in grades.

Syntax

h3EdgeAngle(resolution)

Parameter

  • resolution — Index resolution. Type: UInt8. Range: [0, 15].

Returned values

  • The average length of the H3 hexagon edge in grades. Type: Float64.

Example

Query:

SELECT h3EdgeAngle(10) AS edgeAngle;

Result:

┌───────h3EdgeAngle(10)─┐
│ 0.0005927224846720883 │
└───────────────────────┘

h3EdgeLengthM

Calculates the average length of the H3 hexagon edge in meters.

Syntax

h3EdgeLengthM(resolution)

Parameter

  • resolution — Index resolution. Type: UInt8. Range: [0, 15].

Returned values

  • The average length of the H3 hexagon edge in meters. Type: Float64.

Example

Query:

SELECT h3EdgeLengthM(15) AS edgeLengthM;

Result:

┌─edgeLengthM─┐
│ 0.509713273 │
└─────────────┘

geoToH3

Returns H3 point index (lon, lat) with specified resolution.

Syntax

geoToH3(lon, lat, resolution)

Arguments

  • lon — Longitude. Type: Float64.

  • lat — Latitude. Type: Float64.

  • resolution — Index resolution. Range: [0, 15]. Type: UInt8.

Returned values

  • Hexagon index number.

  • 0 in case of error.

Type: UInt64.

Example

Query:

SELECT geoToH3(37.79506683, 55.71290588, 15) AS h3Index;

Result:

┌────────────h3Index─┐
│ 644325524701193974 │
└────────────────────┘

h3kRing

Lists all the H3 hexagons in the raduis of k from the given hexagon in random order.

Syntax

h3kRing(h3index, k)

Arguments

  • h3index — Hexagon index number. Type: UInt64.

  • k — Radius. Type: integer

Returned values

  • Array of H3 indexes.

Type: Array(UInt64).

Example

Query:

SELECT arrayJoin(h3kRing(644325529233966508, 1)) AS h3index;

Result:

┌────────────h3index─┐
│ 644325529233966508 │
│ 644325529233966497 │
│ 644325529233966510 │
│ 644325529233966504 │
│ 644325529233966509 │
│ 644325529233966355 │
│ 644325529233966354 │
└────────────────────┘

h3GetBaseCell

Returns the base cell number of the H3 index.

Syntax

h3GetBaseCell(index)

Parameter

  • index — Hexagon index number. Type: UInt64.

Returned value

  • Hexagon base cell number.

Type: UInt8.

Example

Query:

SELECT h3GetBaseCell(612916788725809151) AS basecell;

Result:

┌─basecell─┐
│       12 │
└──────────┘

h3HexAreaM2

Returns average hexagon area in square meters at the given resolution.

Syntax

h3HexAreaM2(resolution)

Parameter

  • resolution — Index resolution. Range: [0, 15]. Type: UInt8.

Returned value

  • Area in square meters.

Type: Float64.

Example

Query:

SELECT h3HexAreaM2(13) AS area;

Result:

┌─area─┐
│ 43.9 │
└──────┘

h3IndexesAreNeighbors

Returns whether or not the provided H3 indexes are neighbors.

Syntax

h3IndexesAreNeighbors(index1, index2)

Arguments

  • index1 — Hexagon index number. Type: UInt64.

  • index2 — Hexagon index number. Type: UInt64.

Returned value

  • 1 — Indexes are neighbours.

  • 0 — Indexes are not neighbours.

Type: UInt8.

Example

Query:

SELECT h3IndexesAreNeighbors(617420388351344639, 617420388352655359) AS n;

Result:

┌─n─┐
│ 1 │
└───┘

h3ToChildren

Returns an array of child indexes for the given H3 index.

Syntax

h3ToChildren(index, resolution)

Arguments

  • index — Hexagon index number. Type: UInt64.

  • resolution — Index resolution. Range: [0, 15]. Type: UInt8.

Returned values

  • Array of the child H3-indexes.

Type: Array(UInt64).

Example

Query:

SELECT h3ToChildren(599405990164561919, 6) AS children;

Result:

┌─children───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [603909588852408319,603909588986626047,603909589120843775,603909589255061503,603909589389279231,603909589523496959,603909589657714687] │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

h3ToParent

Returns the parent (coarser) index containing the given H3 index.

Syntax

h3ToParent(index, resolution)

Arguments

  • index — Hexagon index number. Type: UInt64.

  • resolution — Index resolution. Range: [0, 15]. Type: UInt8.

Returned value

  • Parent H3 index.

Type: UInt64.

Example

Query:

SELECT h3ToParent(599405990164561919, 3) AS parent;

Result:

┌─────────────parent─┐
│ 590398848891879423 │
└────────────────────┘

h3ToString

Converts the H3Index representation of the index to the string representation.

h3ToString(index)

Parameter

  • index — Hexagon index number. Type: UInt64.

Returned value

  • String representation of the H3 index.

Type: String.

Example

Query:

SELECT h3ToString(617420388352917503) AS h3_string;

Result:

┌─h3_string───────┐
│ 89184926cdbffff │
└─────────────────┘

stringToH3

Converts the string representation to the H3Index (UInt64) representation.

Syntax

stringToH3(index_str)

Parameter

  • index_str — String representation of the H3 index. Type: String.

Returned value

  • Hexagon index number. Returns 0 on error. Type: UInt64.

Example

Query:

SELECT stringToH3('89184926cc3ffff') AS index;

Result:

┌──────────────index─┐
│ 617420388351344639 │
└────────────────────┘

HASH

Hash functions can be used for the deterministic pseudo-random shuffling of elements.

Simhash is a hash function, which returns close hash values for close (similar) arguments.

halfMD5

Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64 in big-endian byte order.

halfMD5(par1, ...)

The function is relatively slow (5 million short strings per second per processor core). Consider using the sipHash64 function instead.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

SELECT halfMD5(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS halfMD5hash, toTypeName(halfMD5hash) AS type;
┌────────halfMD5hash─┬─type───┐
│ 186182704141653334 │ UInt64 │
└────────────────────┴────────┘

MD5

Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16). If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘sipHash128’ function instead. If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).

sipHash64

Produces a 64-bit SipHash hash value.

sipHash64(par1,...)

This is a cryptographic hash function. It works at least three times faster than the MD5 function.

Function interprets all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:

  1. After hashing all the input parameters, the function gets the array of hashes.

  2. Function takes the first and the second elements and calculates a hash for the array of them.

  3. Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.

  4. The previous step is repeated for all the remaining elements of the initial hash array.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

SELECT sipHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS SipHash, toTypeName(SipHash) AS type;
┌──────────────SipHash─┬─type───┐
│ 13726873534472839665 │ UInt64 │
└──────────────────────┴────────┘

sipHash128

Produces a 128-bit SipHash hash value. Differs from sipHash64 in that the final xor-folding state is done up to 128 bits.

Syntax

sipHash128(par1,...)

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned value

A 128-bit SipHash hash value.

Type: FixedString(16).

Example

Query:

SELECT hex(sipHash128('foo', '\x01', 3));

Result:

┌─hex(sipHash128('foo', '', 3))────┐
│ 9DE516A64A414D4B1B609415E4523F24 │
└──────────────────────────────────┘

cityHash64

Produces a 64-bit CityHash hash value.

cityHash64(par1,...)

This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Examples

Call example:

SELECT cityHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS CityHash, toTypeName(CityHash) AS type;
┌─────────────CityHash─┬─type───┐
│ 12072650598913549138 │ UInt64 │
└──────────────────────┴────────┘

The following example shows how to compute the checksum of the entire table with accuracy up to the row order:

SELECT groupBitXor(cityHash64(*)) FROM table

intHash32

Calculates a 32-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers.

intHash64

Calculates a 64-bit hash code from any type of integer. It works faster than intHash32. Average quality.

SHA1, SHA224, SHA256, SHA512

Calculates SHA-1, SHA-224, SHA-256, SHA-512 hash from a string and returns the resulting set of bytes as FixedString.

Syntax

SHA1('s')
...
SHA512('s')

The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million). We recommend using this function only in cases when you need a specific hash function and you can’t select it. Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECT queries.

Arguments

  • s — Input string for SHA hash calculation. String.

Returned value

  • SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64).

Type: FixedString.

Example

Use the hex function to represent the result as a hex-encoded string.

Query:

SELECT hex(SHA1('abc'));

Result:

┌─hex(SHA1('abc'))─────────────────────────┐
│ A9993E364706816ABA3E25717850C26C9CD0D89D │
└──────────────────────────────────────────┘

URLHash(url[, N])

A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization. URLHash(s) – Calculates a hash from a string without one of the trailing symbols /,? or # at the end, if present. URLHash(s, N) – Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /,? or # at the end, if present. Levels are the same as in URLHierarchy.

farmFingerprint64

farmHash64

Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64 is preferred for a stable and portable value.

farmFingerprint64(par1, ...)
farmHash64(par1, ...)

These functions use the Fingerprint64 and Hash64 methods respectively from all available methods.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data)..

Returned Value

A UInt64 data type hash value.

Example

SELECT farmHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS FarmHash, toTypeName(FarmHash) AS type;
┌─────────────FarmHash─┬─type───┐
│ 17790458267262532859 │ UInt64 │
└──────────────────────┴────────┘

javaHash

Calculates JavaHash from a string, Byte, Short, Integer, Long. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.

Syntax

SELECT javaHash('')

Returned value

A Int32 data type hash value.

Example

Query:

SELECT javaHash(toInt32(123));

Result:

┌─javaHash(toInt32(123))─┐
│               123      │
└────────────────────────┘

Query:

SELECT javaHash('Hello, world!');

Result:

┌─javaHash('Hello, world!')─┐
│               -1880044555 │
└───────────────────────────┘

javaHashUTF16LE

Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.

Syntax

javaHashUTF16LE(stringUtf16le)

Arguments

  • stringUtf16le — a string in UTF-16LE encoding.

Returned value

A Int32 data type hash value.

Example

Correct query with UTF-16LE encoded string.

Query:

SELECT javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'));

Result:

┌─javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))─┐
│                                                      3556498 │
└──────────────────────────────────────────────────────────────┘

hiveHash

Calculates HiveHash from a string.

SELECT hiveHash('')

This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Returned value

A Int32 data type hash value.

Type: hiveHash.

Example

Query:

SELECT hiveHash('Hello, world!');

Result:

┌─hiveHash('Hello, world!')─┐
│                 267439093 │
└───────────────────────────┘

metroHash64

Produces a 64-bit MetroHash hash value.

metroHash64(par1, ...)

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

SELECT metroHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MetroHash, toTypeName(MetroHash) AS type;
┌────────────MetroHash─┬─type───┐
│ 14235658766382344533 │ UInt64 │
└──────────────────────┴────────┘

jumpConsistentHash

Calculates JumpConsistentHash form a UInt64. Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32. For more information, see the link: JumpConsistentHash

murmurHash2_32, murmurHash2_64

Produces a MurmurHash2 hash value.

murmurHash2_32(par1, ...)
murmurHash2_64(par1, ...)

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

  • The murmurHash2_32 function returns hash value having the UInt32 data type.

  • The murmurHash2_64 function returns hash value having the UInt64 data type.

Example

SELECT murmurHash2_64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash2, toTypeName(MurmurHash2) AS type;
┌──────────MurmurHash2─┬─type───┐
│ 11832096901709403633 │ UInt64 │
└──────────────────────┴────────┘

gccMurmurHash

Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between CLang and GCC builds.

Syntax

gccMurmurHash(par1, ...)

Arguments

Returned value

  • Calculated hash value.

Type: UInt64.

Example

Query:

SELECT
    gccMurmurHash(1, 2, 3) AS res1,
    gccMurmurHash(('a', [1, 2, 3], 4, (4, ['foo', 'bar'], 1, (1, 2)))) AS res2

Result:

┌─────────────────res1─┬────────────────res2─┐
│ 12384823029245979431 │ 1188926775431157506 │
└──────────────────────┴─────────────────────┘

murmurHash3_32, murmurHash3_64

Produces a MurmurHash3 hash value.

murmurHash3_32(par1, ...)
murmurHash3_64(par1, ...)

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

  • The murmurHash3_32 function returns a UInt32 data type hash value.

  • The murmurHash3_64 function returns a UInt64 data type hash value.

Example

SELECT murmurHash3_32(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash3, toTypeName(MurmurHash3) AS type;
┌─MurmurHash3─┬─type───┐
│     2152717 │ UInt32 │
└─────────────┴────────┘

murmurHash3_128

Produces a 128-bit MurmurHash3 hash value.

Syntax

murmurHash3_128(expr)

Arguments

Returned value

A 128-bit MurmurHash3 hash value.

Type: FixedString(16).

Example

Query:

SELECT hex(murmurHash3_128('foo', 'foo', 'foo'));

Result:

┌─hex(murmurHash3_128('foo', 'foo', 'foo'))─┐
│ F8F7AD9B6CD4CF117A71E277E2EC2931          │
└───────────────────────────────────────────┘

xxHash32, xxHash64

Calculates xxHash from a string. It is proposed in two flavors, 32 and 64 bits.

SELECT xxHash32('')

OR

SELECT xxHash64('')

Returned value

A UInt32 or UInt64 data type hash value.

Type: UInt32 for xxHash32 and UInt64 for xxHash64.

Example

Query:

SELECT xxHash32('Hello, world!');

Result:

┌─xxHash32('Hello, world!')─┐
│                 834093149 │
└───────────────────────────┘

See Also

ngramSimHash

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

ngramSimHash(string[, ngramsize])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT ngramSimHash('ClickHouse') AS Hash;

Result:

┌───────Hash─┐
│ 1627567969 │
└────────────┘

ngramSimHashCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

ngramSimHashCaseInsensitive(string[, ngramsize])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT ngramSimHashCaseInsensitive('ClickHouse') AS Hash;

Result:

┌──────Hash─┐
│ 562180645 │
└───────────┘

ngramSimHashUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

ngramSimHashUTF8(string[, ngramsize])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT ngramSimHashUTF8('ClickHouse') AS Hash;

Result:

┌───────Hash─┐
│ 1628157797 │
└────────────┘

ngramSimHashCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

ngramSimHashCaseInsensitiveUTF8(string[, ngramsize])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT ngramSimHashCaseInsensitiveUTF8('ClickHouse') AS Hash;

Result:

┌───────Hash─┐
│ 1636742693 │
└────────────┘

wordShingleSimHash

Splits a ASCII string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

wordShingleSimHash(string[, shinglesize])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT wordShingleSimHash('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;

Result:

┌───────Hash─┐
│ 2328277067 │
└────────────┘

wordShingleSimHashCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

wordShingleSimHashCaseInsensitive(string[, shinglesize])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT wordShingleSimHashCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;

Result:

┌───────Hash─┐
│ 2194812424 │
└────────────┘

wordShingleSimHashUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

wordShingleSimHashUTF8(string[, shinglesize])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optinal. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT wordShingleSimHashUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;

Result:

┌───────Hash─┐
│ 2328277067 │
└────────────┘

wordShingleSimHashCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

wordShingleSimHashCaseInsensitiveUTF8(string[, shinglesize])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT wordShingleSimHashCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;

Result:

┌───────Hash─┐
│ 2194812424 │
└────────────┘

ngramMinHash

Splits a ASCII string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

ngramMinHash(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT ngramMinHash('ClickHouse') AS Tuple;

Result:

┌─Tuple──────────────────────────────────────┐
│ (18333312859352735453,9054248444481805918) │
└────────────────────────────────────────────┘

ngramMinHashCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

ngramMinHashCaseInsensitive(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT ngramMinHashCaseInsensitive('ClickHouse') AS Tuple;

Result:

┌─Tuple──────────────────────────────────────┐
│ (2106263556442004574,13203602793651726206) │
└────────────────────────────────────────────┘

ngramMinHashUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

ngramMinHashUTF8(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT ngramMinHashUTF8('ClickHouse') AS Tuple;

Result:

┌─Tuple──────────────────────────────────────┐
│ (18333312859352735453,6742163577938632877) │
└────────────────────────────────────────────┘

ngramMinHashCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

ngramMinHashCaseInsensitiveUTF8(string [, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT ngramMinHashCaseInsensitiveUTF8('ClickHouse') AS Tuple;

Result:

┌─Tuple───────────────────────────────────────┐
│ (12493625717655877135,13203602793651726206) │
└─────────────────────────────────────────────┘

ngramMinHashArg

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash function with the same input. Is case sensitive.

Syntax

ngramMinHashArg(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum n-grams each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT ngramMinHashArg('ClickHouse') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','Hou','kHo','use'),('Hou','lic','ick','ous','ckH','Cli')) │
└───────────────────────────────────────────────────────────────────────────────┘

ngramMinHashArgCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive function with the same input. Is case insensitive.

Syntax

ngramMinHashArgCaseInsensitive(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum n-grams each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT ngramMinHashArgCaseInsensitive('ClickHouse') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','kHo','use','Cli'),('kHo','lic','ick','ous','ckH','Hou')) │
└───────────────────────────────────────────────────────────────────────────────┘

ngramMinHashArgUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8 function with the same input. Is case sensitive.

Syntax

ngramMinHashArgUTF8(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum n-grams each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT ngramMinHashArgUTF8('ClickHouse') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','Hou','kHo','use'),('kHo','Hou','lic','ick','ous','ckH')) │
└───────────────────────────────────────────────────────────────────────────────┘

ngramMinHashArgCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.

Syntax

ngramMinHashArgCaseInsensitiveUTF8(string[, ngramsize, hashnum])

Arguments

  • string — String. String.

  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum n-grams each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT ngramMinHashArgCaseInsensitiveUTF8('ClickHouse') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ckH','ous','ick','lic','kHo','use'),('kHo','lic','ick','ous','ckH','Hou')) │
└───────────────────────────────────────────────────────────────────────────────┘

wordShingleMinHash

Splits a ASCII string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

wordShingleMinHash(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT wordShingleMinHash('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;

Result:

┌─Tuple──────────────────────────────────────┐
│ (16452112859864147620,5844417301642981317) │
└────────────────────────────────────────────┘

wordShingleMinHashCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

wordShingleMinHashCaseInsensitive(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT wordShingleMinHashCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────┐
│ (3065874883688416519,1634050779997673240) │
└───────────────────────────────────────────┘

wordShingleMinHashUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

wordShingleMinHashUTF8(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT wordShingleMinHashUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;

Result:

┌─Tuple──────────────────────────────────────┐
│ (16452112859864147620,5844417301642981317) │
└────────────────────────────────────────────┘

wordShingleMinHashCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

wordShingleMinHashCaseInsensitiveUTF8(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two hashes — the minimum and the maximum.

Type: Tuple(UInt64, UInt64).

Example

Query:

SELECT wordShingleMinHashCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;

Result:

┌─Tuple─────────────────────────────────────┐
│ (3065874883688416519,1634050779997673240) │
└───────────────────────────────────────────┘

wordShingleMinHashArg

Splits a ASCII string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordshingleMinHash function with the same input. Is case sensitive.

Syntax

wordShingleMinHashArg(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum word shingles each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT wordShingleMinHashArg('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────┐
│ (('OLAP','database','analytical'),('online','oriented','processing')) │
└───────────────────────────────────────────────────────────────────────┘

wordShingleMinHashArgCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive function with the same input. Is case insensitive.

Syntax

wordShingleMinHashArgCaseInsensitive(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum word shingles each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT wordShingleMinHashArgCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;

Result:

┌─Tuple──────────────────────────────────────────────────────────────────┐
│ (('queries','database','analytical'),('oriented','processing','DBMS')) │
└────────────────────────────────────────────────────────────────────────┘

wordShingleMinHashArgUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8 function with the same input. Is case sensitive.

Syntax

wordShingleMinHashArgUTF8(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum word shingles each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT wordShingleMinHashArgUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;

Result:

┌─Tuple─────────────────────────────────────────────────────────────────┐
│ (('OLAP','database','analytical'),('online','oriented','processing')) │
└───────────────────────────────────────────────────────────────────────┘

wordShingleMinHashArgCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.

Syntax

wordShingleMinHashArgCaseInsensitiveUTF8(string[, shinglesize, hashnum])

Arguments

  • string — String. String.

  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

  • Tuple with two tuples with hashnum word shingles each.

Type: Tuple(Tuple(String), Tuple(String)).

Example

Query:

SELECT wordShingleMinHashArgCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;

Result:

┌─Tuple──────────────────────────────────────────────────────────────────┐
│ (('queries','database','analytical'),('oriented','processing','DBMS')) │
└────────────────────────────────────────────────────────────────────────┘

INTROSPECTION FUNCTIONS

You can use functions described in this chapter to introspect ELF and DWARF for query profiling.

WARNING

These functions are slow and may impose security considerations.

For proper operation of introspection functions:

  • Install the clickhouse-common-static-dbg package.

  • Set the allow_introspection_functions setting to 1.

    For security reasons introspection functions are disabled by default.

ClickHouse saves profiler reports to the trace_log system table. Make sure the table and profiler are configured properly.

addressToLine

Converts virtual memory address inside ClickHouse server process to the filename and the line number in ClickHouse source code.

If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg package.

Syntax

addressToLine(address_of_binary_instruction)

Arguments

  • address_of_binary_instruction (UInt64) — Address of instruction in a running process.

Returned value

  • Source code filename and the line number in this file delimited by colon.

    For example, `/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199`, where `199` is a line number.
  • Name of a binary, if the function couldn’t find the debug information.

  • Empty string, if the address is not valid.

Type: String.

Example

Enabling introspection functions:

SET allow_introspection_functions=1;

Selecting the first string from the trace_log system table:

SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date:              2019-11-19
event_time:              2019-11-19 18:57:23
revision:                54429
timer_type:              Real
thread_number:           48
query_id:                421b6855-1858-45a5-8f37-f383409d6d72
trace:                   [140658411141617,94784174532828,94784076370703,94784076372094,94784076361020,94784175007680,140658411116251,140658403895439]

The trace field contains the stack trace at the moment of sampling.

Getting the source code filename and the line number for a single address:

SELECT addressToLine(94784076370703) \G;
Row 1:
──────
addressToLine(94784076370703): /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199

Applying the function to the whole stack trace:

SELECT
    arrayStringConcat(arrayMap(x -> addressToLine(x), trace), '\n') AS trace_source_code_lines
FROM system.trace_log
LIMIT 1
\G

The arrayMap function allows to process each individual element of the trace array by the addressToLine function. The result of this processing you see in the trace_source_code_lines column of output.

Row 1:
──────
trace_source_code_lines: /lib/x86_64-linux-gnu/libpthread-2.27.so
/usr/lib/debug/usr/bin/clickhouse
/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199
/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.h:155
/usr/include/c++/9/bits/atomic_base.h:551
/usr/lib/debug/usr/bin/clickhouse
/lib/x86_64-linux-gnu/libpthread-2.27.so
/build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97

addressToLineWithInlines

Similar to addressToLine, but it will return an Array with all inline functions, and will be much slower as a price.

If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg package.

Syntax

addressToLineWithInlines(address_of_binary_instruction)

Arguments

  • address_of_binary_instruction (UInt64) — Address of instruction in a running process.

Returned value

  • Array which first element is source code filename and the line number in this file delimited by colon. And from second element, inline functions' source code filename and line number and function name are listed.

  • Array with single element which is name of a binary, if the function couldn’t find the debug information.

  • Empty array, if the address is not valid.

Type: Array(String).

Example

Enabling introspection functions:

SET allow_introspection_functions=1;

Applying the function to address.

SELECT addressToLineWithInlines(531055181::UInt64);
┌─addressToLineWithInlines(CAST('531055181', 'UInt64'))────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ['./src/Functions/addressToLineWithInlines.cpp:98','./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:176:DB::(anonymous namespace)::FunctionAddressToLineWithInlines::implCached(unsigned long) const'] │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Applying the function to the whole stack trace:

SELECT
    ta, addressToLineWithInlines(arrayJoin(trace) as ta)
FROM system.trace_log
WHERE
    query_id = '5e173544-2020-45de-b645-5deebe2aae54';

The arrayJoin functions will split array to rows.

┌────────ta─┬─addressToLineWithInlines(arrayJoin(trace))───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 365497529 │ ['./build_normal_debug/./contrib/libcxx/include/string_view:252']                                                                                                                                                        │
│ 365593602 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:191']                                                                                                                                                                      │
│ 365593866 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365592528 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365591003 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:477']                                                                                                                                                                      │
│ 365590479 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:442']                                                                                                                                                                      │
│ 365590600 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:457']                                                                                                                                                                      │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451']                                                                                                                                                                      │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451']                                                                                                                                                                      │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451']                                                                                                                                                                      │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0']                                                                                                                                                                        │
│ 365597289 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:807']                                                                                                                                                                      │
│ 365599840 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:1118']                                                                                                                                                                     │
│ 531058145 │ ['./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:152']                                                                                                                                                │
│ 531055181 │ ['./src/Functions/addressToLineWithInlines.cpp:98','./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:176:DB::(anonymous namespace)::FunctionAddressToLineWithInlines::implCached(unsigned long) const'] │
│ 422333613 │ ['./build_normal_debug/./src/Functions/IFunctionAdaptors.h:21']                                                                                                                                                          │
│ 586866022 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:216']                                                                                                                                                               │
│ 586869053 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:264']                                                                                                                                                               │
│ 586873237 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:334']                                                                                                                                                               │
│ 597901620 │ ['./build_normal_debug/./src/Interpreters/ExpressionActions.cpp:601']                                                                                                                                                    │
│ 597898534 │ ['./build_normal_debug/./src/Interpreters/ExpressionActions.cpp:718']                                                                                                                                                    │
│ 630442912 │ ['./build_normal_debug/./src/Processors/Transforms/ExpressionTransform.cpp:23']                                                                                                                                          │
│ 546354050 │ ['./build_normal_debug/./src/Processors/ISimpleTransform.h:38']                                                                                                                                                          │
│ 626026993 │ ['./build_normal_debug/./src/Processors/ISimpleTransform.cpp:89']                                                                                                                                                        │
│ 626294022 │ ['./build_normal_debug/./src/Processors/Executors/ExecutionThreadContext.cpp:45']                                                                                                                                        │
│ 626293730 │ ['./build_normal_debug/./src/Processors/Executors/ExecutionThreadContext.cpp:63']                                                                                                                                        │
│ 626169525 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:213']                                                                                                                                             │
│ 626170308 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:178']                                                                                                                                             │
│ 626166348 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:329']                                                                                                                                             │
│ 626163461 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:84']                                                                                                                                              │
│ 626323536 │ ['./build_normal_debug/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:85']                                                                                                                                  │
│ 626323277 │ ['./build_normal_debug/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:112']                                                                                                                                 │
│ 626323133 │ ['./build_normal_debug/./contrib/libcxx/include/type_traits:3682']                                                                                                                                                       │
│ 626323041 │ ['./build_normal_debug/./contrib/libcxx/include/tuple:1415']                                                                                                                                                             │
└───────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

addressToSymbol

Converts virtual memory address inside ClickHouse server process to the symbol from ClickHouse object files.

Syntax

addressToSymbol(address_of_binary_instruction)

Arguments

  • address_of_binary_instruction (UInt64) — Address of instruction in a running process.

Returned value

  • Symbol from ClickHouse object files.

  • Empty string, if the address is not valid.

Type: String.

Example

Enabling introspection functions:

SET allow_introspection_functions=1;

Selecting the first string from the trace_log system table:

SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date:    2019-11-20
event_time:    2019-11-20 16:57:59
revision:      54429
timer_type:    Real
thread_number: 48
query_id:      724028bf-f550-45aa-910d-2af6212b94ac
trace:         [94138803686098,94138815010911,94138815096522,94138815101224,94138815102091,94138814222988,94138806823642,94138814457211,94138806823642,94138814457211,94138806823642,94138806795179,94138806796144,94138753770094,94138753771646,94138753760572,94138852407232,140399185266395,140399178045583]

The trace field contains the stack trace at the moment of sampling.

Getting a symbol for a single address:

SELECT addressToSymbol(94138803686098) \G;
Row 1:
──────
addressToSymbol(94138803686098): _ZNK2DB24IAggregateFunctionHelperINS_20AggregateFunctionSumImmNS_24AggregateFunctionSumDataImEEEEE19addBatchSinglePlaceEmPcPPKNS_7IColumnEPNS_5ArenaE

Applying the function to the whole stack trace:

SELECT
    arrayStringConcat(arrayMap(x -> addressToSymbol(x), trace), '\n') AS trace_symbols
FROM system.trace_log
LIMIT 1
\G

The arrayMap function allows to process each individual element of the trace array by the addressToSymbols function. The result of this processing you see in the trace_symbols column of output.

Row 1:
──────
trace_symbols: _ZNK2DB24IAggregateFunctionHelperINS_20AggregateFunctionSumImmNS_24AggregateFunctionSumDataImEEEEE19addBatchSinglePlaceEmPcPPKNS_7IColumnEPNS_5ArenaE
_ZNK2DB10Aggregator21executeWithoutKeyImplERPcmPNS0_28AggregateFunctionInstructionEPNS_5ArenaE
_ZN2DB10Aggregator14executeOnBlockESt6vectorIN3COWINS_7IColumnEE13immutable_ptrIS3_EESaIS6_EEmRNS_22AggregatedDataVariantsERS1_IPKS3_SaISC_EERS1_ISE_SaISE_EERb
_ZN2DB10Aggregator14executeOnBlockERKNS_5BlockERNS_22AggregatedDataVariantsERSt6vectorIPKNS_7IColumnESaIS9_EERS6_ISB_SaISB_EERb
_ZN2DB10Aggregator7executeERKSt10shared_ptrINS_17IBlockInputStreamEERNS_22AggregatedDataVariantsE
_ZN2DB27AggregatingBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB26ExpressionBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB26ExpressionBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB28AsynchronousBlockInputStream9calculateEv
_ZNSt17_Function_handlerIFvvEZN2DB28AsynchronousBlockInputStream4nextEvEUlvE_E9_M_invokeERKSt9_Any_data
_ZN14ThreadPoolImplI20ThreadFromGlobalPoolE6workerESt14_List_iteratorIS0_E
_ZZN20ThreadFromGlobalPoolC4IZN14ThreadPoolImplIS_E12scheduleImplIvEET_St8functionIFvvEEiSt8optionalImEEUlvE1_JEEEOS4_DpOT0_ENKUlvE_clEv
_ZN14ThreadPoolImplISt6threadE6workerESt14_List_iteratorIS0_E
execute_native_thread_routine
start_thread
clone

demangle

Converts a symbol that you can get using the addressToSymbol function to the C++ function name.

Syntax

demangle(symbol)

Arguments

  • symbol (String) — Symbol from an object file.

Returned value

  • Name of the C++ function.

  • Empty string if a symbol is not valid.

Type: String.

Example

Enabling introspection functions:

SET allow_introspection_functions=1;

Selecting the first string from the trace_log system table:

SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date:    2019-11-20
event_time:    2019-11-20 16:57:59
revision:      54429
timer_type:    Real
thread_number: 48
query_id:      724028bf-f550-45aa-910d-2af6212b94ac
trace:         [94138803686098,94138815010911,94138815096522,94138815101224,94138815102091,94138814222988,94138806823642,94138814457211,94138806823642,94138814457211,94138806823642,94138806795179,94138806796144,94138753770094,94138753771646,94138753760572,94138852407232,140399185266395,140399178045583]

The trace field contains the stack trace at the moment of sampling.

Getting a function name for a single address:

SELECT demangle(addressToSymbol(94138803686098)) \G;
Row 1:
──────
demangle(addressToSymbol(94138803686098)): DB::IAggregateFunctionHelper<DB::AggregateFunctionSum<unsigned long, unsigned long, DB::AggregateFunctionSumData<unsigned long> > >::addBatchSinglePlace(unsigned long, char*, DB::IColumn const**, DB::Arena*) const

Applying the function to the whole stack trace:

SELECT
    arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS trace_functions
FROM system.trace_log
LIMIT 1
\G

The arrayMap function allows to process each individual element of the trace array by the demangle function. The result of this processing you see in the trace_functions column of output.

Row 1:
──────
trace_functions: DB::IAggregateFunctionHelper<DB::AggregateFunctionSum<unsigned long, unsigned long, DB::AggregateFunctionSumData<unsigned long> > >::addBatchSinglePlace(unsigned long, char*, DB::IColumn const**, DB::Arena*) const
DB::Aggregator::executeWithoutKeyImpl(char*&, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, DB::Arena*) const
DB::Aggregator::executeOnBlock(std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn> > >, unsigned long, DB::AggregatedDataVariants&, std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >&, std::vector<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >, std::allocator<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> > > >&, bool&)
DB::Aggregator::executeOnBlock(DB::Block const&, DB::AggregatedDataVariants&, std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >&, std::vector<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >, std::allocator<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> > > >&, bool&)
DB::Aggregator::execute(std::shared_ptr<DB::IBlockInputStream> const&, DB::AggregatedDataVariants&)
DB::AggregatingBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::ExpressionBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::ExpressionBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::AsynchronousBlockInputStream::calculate()
std::_Function_handler<void (), DB::AsynchronousBlockInputStream::next()::{lambda()#1}>::_M_invoke(std::_Any_data const&)
ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::_List_iterator<ThreadFromGlobalPool>)
ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}&&)::{lambda()#1}::operator()() const
ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)
execute_native_thread_routine
start_thread
clone

tid

Returns id of the thread, in which current Block is processed.

Syntax

tid()

Returned value

Example

Query:

SELECT tid();

Result:

┌─tid()─┐
│  3878 │
└───────┘

logTrace

Emits trace log message to server log for each Block.

Syntax

logTrace('message')

Arguments

  • message — Message that is emitted to server log. String.

Returned value

  • Always returns 0.

Example

Query:

SELECT logTrace('logTrace message');

Result:

┌─logTrace('logTrace message')─┐
│                            0 │
└──────────────────────────────┘

IP ADDRESSES

IPv4NumToString(num)

Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).

Alias: INET_NTOA.

IPv4StringToNum(s)

The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.

Alias: INET_ATON.

IPv4NumToStringClassC(num)

Similar to IPv4NumToString, but using xxx instead of the last octet.

Example:

SELECT
    IPv4NumToStringClassC(ClientIP) AS k,
    count() AS c
FROM test.hits
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─k──────────────┬─────c─┐
│ 83.149.9.xxx   │ 26238 │
│ 217.118.81.xxx │ 26074 │
│ 213.87.129.xxx │ 25481 │
│ 83.149.8.xxx   │ 24984 │
│ 217.118.83.xxx │ 22797 │
│ 78.25.120.xxx  │ 22354 │
│ 213.87.131.xxx │ 21285 │
│ 78.25.121.xxx  │ 20887 │
│ 188.162.65.xxx │ 19694 │
│ 83.149.48.xxx  │ 17406 │
└────────────────┴───────┘

Since using ‘xxx’ is highly unusual, this may be changed in the future. We recommend that you do not rely on the exact format of this fragment.

IPv6NumToString(x)

Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.

Alias: INET6_NTOA.

Examples:

SELECT IPv6NumToString(toFixedString(unhex('2A0206B8000000000000000000000011'), 16)) AS addr;
┌─addr─────────┐
│ 2a02:6b8::11 │
└──────────────┘
SELECT
    IPv6NumToString(ClientIP6 AS k),
    count() AS c
FROM hits_all
WHERE EventDate = today() AND substring(ClientIP6, 1, 12) != unhex('00000000000000000000FFFF')
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─IPv6NumToString(ClientIP6)──────────────┬─────c─┐
│ 2a02:2168:aaa:bbbb::2                   │ 24695 │
│ 2a02:2698:abcd:abcd:abcd:abcd:8888:5555 │ 22408 │
│ 2a02:6b8:0:fff::ff                      │ 16389 │
│ 2a01:4f8:111:6666::2                    │ 16016 │
│ 2a02:2168:888:222::1                    │ 15896 │
│ 2a01:7e00::ffff:ffff:ffff:222           │ 14774 │
│ 2a02:8109:eee:ee:eeee:eeee:eeee:eeee    │ 14443 │
│ 2a02:810b:8888:888:8888:8888:8888:8888  │ 14345 │
│ 2a02:6b8:0:444:4444:4444:4444:4444      │ 14279 │
│ 2a01:7e00::ffff:ffff:ffff:ffff          │ 13880 │
└─────────────────────────────────────────┴───────┘
SELECT
    IPv6NumToString(ClientIP6 AS k),
    count() AS c
FROM hits_all
WHERE EventDate = today()
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─IPv6NumToString(ClientIP6)─┬──────c─┐
│ ::ffff:94.26.111.111       │ 747440 │
│ ::ffff:37.143.222.4        │ 529483 │
│ ::ffff:5.166.111.99        │ 317707 │
│ ::ffff:46.38.11.77         │ 263086 │
│ ::ffff:79.105.111.111      │ 186611 │
│ ::ffff:93.92.111.88        │ 176773 │
│ ::ffff:84.53.111.33        │ 158709 │
│ ::ffff:217.118.11.22       │ 154004 │
│ ::ffff:217.118.11.33       │ 148449 │
│ ::ffff:217.118.11.44       │ 148243 │
└────────────────────────────┴────────┘

IPv6StringToNum

The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it throws exception.

If the input string contains a valid IPv4 address, returns its IPv6 equivalent. HEX can be uppercase or lowercase.

Alias: INET6_ATON.

Syntax

IPv6StringToNum(string)

Argument

  • string — IP address. String.

Returned value

  • IPv6 address in binary format.

Type: FixedString(16).

Example

Query:

SELECT addr, cutIPv6(IPv6StringToNum(addr), 0, 0) FROM (SELECT ['notaddress', '127.0.0.1', '1111::ffff'] AS addr) ARRAY JOIN addr;

Result:

┌─addr───────┬─cutIPv6(IPv6StringToNum(addr), 0, 0)─┐
│ notaddress │ ::                                   │
│ 127.0.0.1  │ ::ffff:127.0.0.1                     │
│ 1111::ffff │ 1111::ffff                           │
└────────────┴──────────────────────────────────────┘

See Also

IPv4ToIPv6(x)

Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16) value containing the IPv6 address in binary format. Examples:

SELECT IPv6NumToString(IPv4ToIPv6(IPv4StringToNum('192.168.0.1'))) AS addr;
┌─addr───────────────┐
│ ::ffff:192.168.0.1 │
└────────────────────┘

cutIPv6(x, bytesToCutForIPv6, bytesToCutForIPv4)

Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bytes removed in text format. For example:

WITH
    IPv6StringToNum('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D') AS ipv6,
    IPv4ToIPv6(IPv4StringToNum('192.168.0.1')) AS ipv4
SELECT
    cutIPv6(ipv6, 2, 0),
    cutIPv6(ipv4, 0, 2)
┌─cutIPv6(ipv6, 2, 0)─────────────────┬─cutIPv6(ipv4, 0, 2)─┐
│ 2001:db8:ac10:fe01:feed:babe:cafe:0 │ ::ffff:192.168.0.0  │
└─────────────────────────────────────┴─────────────────────┘

IPv4CIDRToRange(ipv4, Cidr),

Accepts an IPv4 and an UInt8 value containing the CIDR. Return a tuple with two IPv4 containing the lower range and the higher range of the subnet.

SELECT IPv4CIDRToRange(toIPv4('192.168.5.2'), 16);
┌─IPv4CIDRToRange(toIPv4('192.168.5.2'), 16)─┐
│ ('192.168.0.0','192.168.255.255')          │
└────────────────────────────────────────────┘

IPv6CIDRToRange(ipv6, Cidr),

Accepts an IPv6 and an UInt8 value containing the CIDR. Return a tuple with two IPv6 containing the lower range and the higher range of the subnet.

SELECT IPv6CIDRToRange(toIPv6('2001:0db8:0000:85a3:0000:0000:ac1f:8001'), 32);
┌─IPv6CIDRToRange(toIPv6('2001:0db8:0000:85a3:0000:0000:ac1f:8001'), 32)─┐
│ ('2001:db8::','2001:db8:ffff:ffff:ffff:ffff:ffff:ffff')                │
└────────────────────────────────────────────────────────────────────────┘

toIPv4(string)

An alias to IPv4StringToNum() that takes a string form of IPv4 address and returns value of IPv4 type, which is binary equal to value returned by IPv4StringToNum().

WITH
    '171.225.130.45' as IPv4_string
SELECT
    toTypeName(IPv4StringToNum(IPv4_string)),
    toTypeName(toIPv4(IPv4_string))
┌─toTypeName(IPv4StringToNum(IPv4_string))─┬─toTypeName(toIPv4(IPv4_string))─┐
│ UInt32                                   │ IPv4                            │
└──────────────────────────────────────────┴─────────────────────────────────┘
WITH
    '171.225.130.45' as IPv4_string
SELECT
    hex(IPv4StringToNum(IPv4_string)),
    hex(toIPv4(IPv4_string))
┌─hex(IPv4StringToNum(IPv4_string))─┬─hex(toIPv4(IPv4_string))─┐
│ ABE1822D                          │ ABE1822D                 │
└───────────────────────────────────┴──────────────────────────┘

toIPv6

Converts a string form of IPv6 address to IPv6 type. If the IPv6 address has an invalid format, returns an empty value. Similar to IPv6StringToNum function, which converts IPv6 address to binary format.

If the input string contains a valid IPv4 address, then the IPv6 equivalent of the IPv4 address is returned.

Syntax

toIPv6(string)

Argument

Returned value

  • IP address.

Type: IPv6.

Examples

Query:

WITH '2001:438:ffff::407d:1bc1' AS IPv6_string
SELECT
    hex(IPv6StringToNum(IPv6_string)),
    hex(toIPv6(IPv6_string));

Result:

┌─hex(IPv6StringToNum(IPv6_string))─┬─hex(toIPv6(IPv6_string))─────────┐
│ 20010438FFFF000000000000407D1BC1  │ 20010438FFFF000000000000407D1BC1 │
└───────────────────────────────────┴──────────────────────────────────┘

Query:

SELECT toIPv6('127.0.0.1');

Result:

┌─toIPv6('127.0.0.1')─┐
│ ::ffff:127.0.0.1    │
└─────────────────────┘

isIPv4String

Determines whether the input string is an IPv4 address or not. If string is IPv6 address returns 0.

Syntax

isIPv4String(string)

Arguments

  • string — IP address. String.

Returned value

  • 1 if string is IPv4 address, 0 otherwise.

Type: UInt8.

Examples

Query:

SELECT addr, isIPv4String(addr) FROM ( SELECT ['0.0.0.0', '127.0.0.1', '::ffff:127.0.0.1'] AS addr ) ARRAY JOIN addr;

Result:

┌─addr─────────────┬─isIPv4String(addr)─┐
│ 0.0.0.0          │                  1 │
│ 127.0.0.1        │                  1 │
│ ::ffff:127.0.0.1 │                  0 │
└──────────────────┴────────────────────┘

isIPv6String

Determines whether the input string is an IPv6 address or not. If string is IPv4 address returns 0.

Syntax

isIPv6String(string)

Arguments

  • string — IP address. String.

Returned value

  • 1 if string is IPv6 address, 0 otherwise.

Type: UInt8.

Examples

Query:

SELECT addr, isIPv6String(addr) FROM ( SELECT ['::', '1111::ffff', '::ffff:127.0.0.1', '127.0.0.1'] AS addr ) ARRAY JOIN addr;

Result:

┌─addr─────────────┬─isIPv6String(addr)─┐
│ ::               │                  1 │
│ 1111::ffff       │                  1 │
│ ::ffff:127.0.0.1 │                  1 │
│ 127.0.0.1        │                  0 │
└──────────────────┴────────────────────┘

isIPAddressInRange

Determines if an IP address is contained in a network represented in the CIDR notation. Returns 1 if true, or 0 otherwise.

Syntax

isIPAddressInRange(address, prefix)

This function accepts both IPv4 and IPv6 addresses (and networks) represented as strings. It returns 0 if the IP version of the address and the CIDR don't match.

Arguments

  • address — An IPv4 or IPv6 address. String.

  • prefix — An IPv4 or IPv6 network prefix in CIDR. String.

Returned value

  • 1 or 0.

Type: UInt8.

Example

Query:

SELECT isIPAddressInRange('127.0.0.1', '127.0.0.0/8');

Result:

┌─isIPAddressInRange('127.0.0.1', '127.0.0.0/8')─┐
│                                              1 │
└────────────────────────────────────────────────┘

Query:

SELECT isIPAddressInRange('127.0.0.1', 'ffff::/16');

Result:

┌─isIPAddressInRange('127.0.0.1', 'ffff::/16')─┐
│                                            0 │
└──────────────────────────────────────────────┘

Query:

SELECT isIPAddressInRange('::ffff:192.168.0.1', '::ffff:192.168.0.4/128');

Result:

┌─isIPAddressInRange('::ffff:192.168.0.1', '::ffff:192.168.0.4/128')─┐
│                                                                  0 │
└────────────────────────────────────────────────────────────────────┘

JSON

ClickHouse has special functions for working with this JSON. All the JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.

The following assumptions are made:

  1. The field name (function argument) must be a constant.

  2. The field name is somehow canonically encoded in JSON. For example: visitParamHas('{"abc":"def"}', 'abc') = 1, but visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0

  3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.

  4. The JSON does not have space characters outside of string literals.

visitParamHas(params, name)

Checks whether there is a field with the name name.

Alias: simpleJSONHas.

visitParamExtractUInt(params, name)

Parses UInt64 from the value of the field named name. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns 0.

Alias: simpleJSONExtractUInt.

visitParamExtractInt(params, name)

The same as for Int64.

Alias: simpleJSONExtractInt.

visitParamExtractFloat(params, name)

The same as for Float64.

Alias: simpleJSONExtractFloat.

visitParamExtractBool(params, name)

Parses a true/false value. The result is UInt8.

Alias: simpleJSONExtractBool.

visitParamExtractRaw(params, name)

Returns the value of a field, including separators.

Alias: simpleJSONExtractRaw.

Examples:

visitParamExtractRaw('{"abc":"\\n\\u0000"}', 'abc') = '"\\n\\u0000"';
visitParamExtractRaw('{"abc":{"def":[1,2,3]}}', 'abc') = '{"def":[1,2,3]}';

visitParamExtractString(params, name)

Parses the string in double quotes. The value is unescaped. If unescaping failed, it returns an empty string.

Alias: simpleJSONExtractString.

Examples:

visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0';
visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺';
visitParamExtractString('{"abc":"\\u263"}', 'abc') = '';
visitParamExtractString('{"abc":"hello}', 'abc') = '';

There is currently no support for code points in the format \uXXXX\uYYYY that are not from the basic multilingual plane (they are converted to CESU-8 instead of UTF-8).

The following functions are based on simdjson designed for more complex JSON parsing requirements. The assumption 2 mentioned above still applies.

isValidJSON(json)

Checks that passed string is a valid json.

Examples:

SELECT isValidJSON('{"a": "hello", "b": [-100, 200.0, 300]}') = 1
SELECT isValidJSON('not a json') = 0

JSONHas(json[, indices_or_keys]…)

If the value exists in the JSON document, 1 will be returned.

If the value does not exist, 0 will be returned.

Examples:

SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 1
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4) = 0

indices_or_keys is a list of zero or more arguments each of them can be either string or integer.

  • String = access object member by key.

  • Positive integer = access the n-th member/key from the beginning.

  • Negative integer = access the n-th member/key from the end.

Minimum index of the element is 1. Thus the element 0 does not exist.

You may use integers to access both JSON arrays and JSON objects.

So, for example:

SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'a'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 2) = 'b'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -1) = 'b'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -2) = 'a'
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'hello'

JSONLength(json[, indices_or_keys]…)

Return the length of a JSON array or a JSON object.

If the value does not exist or has a wrong type, 0 will be returned.

Examples:

SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 3
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}') = 2

JSONType(json[, indices_or_keys]…)

Return the type of a JSON value.

If the value does not exist, Null will be returned.

Examples:

SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}') = 'Object'
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'String'
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 'Array'

JSONExtractUInt(json[, indices_or_keys]…)

JSONExtractInt(json[, indices_or_keys]…)

JSONExtractFloat(json[, indices_or_keys]…)

JSONExtractBool(json[, indices_or_keys]…)

Parses a JSON and extract a value. These functions are similar to visitParam functions.

If the value does not exist or has a wrong type, 0 will be returned.

Examples:

SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 1) = -100
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) = 200.0
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) = 300

JSONExtractString(json[, indices_or_keys]…)

Parses a JSON and extract a string. This function is similar to visitParamExtractString functions.

If the value does not exist or has a wrong type, an empty string will be returned.

The value is unescaped. If unescaping failed, it returns an empty string.

Examples:

SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'hello'
SELECT JSONExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0'
SELECT JSONExtractString('{"abc":"\\u263a"}', 'abc') = '☺'
SELECT JSONExtractString('{"abc":"\\u263"}', 'abc') = ''
SELECT JSONExtractString('{"abc":"hello}', 'abc') = ''

JSONExtract(json[, indices_or_keys…], Return_type)

Parses a JSON and extract a value of the given ClickHouse data type.

This is a generalization of the previous JSONExtract<type> functions. This means JSONExtract(..., 'String') returns exactly the same as JSONExtractString(), JSONExtract(..., 'Float64') returns exactly the same as JSONExtractFloat().

Examples:

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(String, Array(Float64))') = ('hello',[-100,200,300])
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(b Array(Float64), a String)') = ([-100,200,300],'hello')
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 'Array(Nullable(Int8))') = [-100, NULL, NULL]
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4, 'Nullable(Int64)') = NULL
SELECT JSONExtract('{"passed": true}', 'passed', 'UInt8') = 1
SELECT JSONExtract('{"day": "Thursday"}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Thursday'
SELECT JSONExtract('{"day": 5}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Friday'

JSONExtractKeysAndValues(json[, indices_or_keys…], Value_type)

Parses key-value pairs from a JSON where the values are of the given ClickHouse data type.

Example:

SELECT JSONExtractKeysAndValues('{"x": {"a": 5, "b": 7, "c": 11}}', 'x', 'Int8') = [('a',5),('b',7),('c',11)];

JSONExtractKeys

Parses a JSON string and extracts the keys.

Syntax

JSONExtractKeys(json[, a, b, c...])

Arguments

  • jsonString with valid JSON.

  • a, b, c... — Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a String to get the field by the key or an Integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.

Returned value

Array with the keys of the JSON.

Type: Array(String).

Example

Query:

SELECT JSONExtractKeys('{"a": "hello", "b": [-100, 200.0, 300]}');

Result:

text
┌─JSONExtractKeys('{"a": "hello", "b": [-100, 200.0, 300]}')─┐
│ ['a','b']                                                  │
└────────────────────────────────────────────────────────────┘

JSONExtractRaw(json[, indices_or_keys]…)

Returns a part of JSON as unparsed string.

If the part does not exist or has a wrong type, an empty string will be returned.

Example:

SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = '[-100, 200.0, 300]';

JSONExtractArrayRaw(json[, indices_or_keys…])

Returns an array with elements of JSON array, each represented as unparsed string.

If the part does not exist or isn’t array, an empty array will be returned.

Example:

SELECT JSONExtractArrayRaw('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b') = ['-100', '200.0', '"hello"'];

JSONExtractKeysAndValuesRaw

Extracts raw data from a JSON object.

Syntax

JSONExtractKeysAndValuesRaw(json[, p, a, t, h])

Arguments

  • jsonString with valid JSON.

  • p, a, t, h — Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a string to get the field by the key or an integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.

Returned values

  • Array with ('key', 'value') tuples. Both tuple members are strings.

  • Empty array if the requested object does not exist, or input JSON is invalid.

Type: Array(Tuple(String, String).

Examples

Query:

SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}');

Result:

┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}')─┐
│ [('a','[-100,200]'),('b','{"c":{"d":"hello","f":"world"}}')]                                 │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

Query:

SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', 'b');

Result:

┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', 'b')─┐
│ [('c','{"d":"hello","f":"world"}')]                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘

Query:

SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', -1, 'c');

Result:

┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', -1, 'c')─┐
│ [('d','"hello"'),('f','"world"')]                                                                     │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘

JSON_EXISTS(json, path)

If the value exists in the JSON document, 1 will be returned.

If the value does not exist, 0 will be returned.

Examples:

SELECT JSON_EXISTS('{"hello":1}', '$.hello');
SELECT JSON_EXISTS('{"hello":{"world":1}}', '$.hello.world');
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[*]');
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[0]');

NOTE

Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)

JSON_QUERY(json, path)

Parses a JSON and extract a value as JSON array or JSON object.

If the value does not exist, an empty string will be returned.

Example:

SELECT JSON_QUERY('{"hello":"world"}', '$.hello');
SELECT JSON_QUERY('{"array":[[0, 1, 2, 3, 4, 5], [0, -1, -2, -3, -4, -5]]}', '$.array[*][0 to 2, 4]');
SELECT JSON_QUERY('{"hello":2}', '$.hello');
SELECT toTypeName(JSON_QUERY('{"hello":2}', '$.hello'));

Result:

["world"]
[0, 1, 4, 0, -1, -4]
[2]
String

NOTE

Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json)

JSON_VALUE(json, path)

Parses a JSON and extract a value as JSON scalar.

If the value does not exist, an empty string will be returned.

Example:

SELECT JSON_VALUE('{"hello":"world"}', '$.hello');
SELECT JSON_VALUE('{"array":[[0, 1, 2, 3, 4, 5], [0, -1, -2, -3, -4, -5]]}', '$.array[*][0 to 2, 4]');
SELECT JSON_VALUE('{"hello":2}', '$.hello');
SELECT toTypeName(JSON_VALUE('{"hello":2}', '$.hello'));

Result:

world
0
2
String

NOTE

Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json)

toJSONString

Serializes a value to its JSON representation. Various data types and nested structures are supported. 64-bit integers or bigger (like UInt64 or Int128) are enclosed in quotes by default. output_format_json_quote_64bit_integers controls this behavior. Special values NaN and inf are replaced with null. Enable output_format_json_quote_denormals setting to show them. When serializing an Enum value, the function outputs its name.

Syntax

toJSONString(value)

Arguments

  • value — Value to serialize. Value may be of any data type.

Returned value

  • JSON representation of the value.

Type: String.

Example

The first example shows serialization of a Map. The second example shows some special values wrapped into a Tuple.

Query:

SELECT toJSONString(map('key1', 1, 'key2', 2));
SELECT toJSONString(tuple(1.25, NULL, NaN, +inf, -inf, [])) SETTINGS output_format_json_quote_denormals = 1;

Result:

{"key1":1,"key2":2}
[1.25,null,"nan","inf","-inf",[]]

MACHINE LEARNING FUNCTIONS

evalMLMethod

Prediction using fitted regression models uses evalMLMethod function. See link in linearRegression.

stochasticLinearRegression

The stochasticLinearRegression aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses evalMLMethod to predict on new data.

stochasticLogisticRegression

The stochasticLogisticRegression aggregate function implements stochastic gradient descent method for binary classification problem. Uses evalMLMethod to predict on new data.

MAPS

map

Arranges key:value pairs into Map(key, value) data type.

Syntax

map(key1, value1[, key2, value2, ...])

Arguments

Returned value

  • Data structure as key:value pairs.

Type: Map(key, value).

Examples

Query:

SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);

Result:

┌─map('key1', number, 'key2', multiply(number, 2))─┐
│ {'key1':0,'key2':0}                              │
│ {'key1':1,'key2':2}                              │
│ {'key1':2,'key2':4}                              │
└──────────────────────────────────────────────────┘

Query:

CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a;
INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
SELECT a['key2'] FROM table_map;

Result:

┌─arrayElement(a, 'key2')─┐
│                       0 │
│                       2 │
│                       4 │
└─────────────────────────┘

See Also

mapAdd

Collect all the keys and sum corresponding values.

Syntax

mapAdd(arg1, arg2 [, ...])

Arguments

Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promoted to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.

Returned value

  • Depending on the arguments returns one map or tuple, where the first array contains the sorted keys and the second array contains values.

Example

Query with a tuple:

SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTypeName(res) as type;

Result:

┌─res───────────┬─type───────────────────────────────┐
│ ([1,2],[2,2]) │ Tuple(Array(UInt8), Array(UInt64)) │
└───────────────┴────────────────────────────────────┘

Query with Map type:

SELECT mapAdd(map(1,1), map(1,1));

Result:

┌─mapAdd(map(1, 1), map(1, 1))─┐
│ {1:2}                        │
└──────────────────────────────┘

mapSubtract

Collect all the keys and subtract corresponding values.

Syntax

mapSubtract(Tuple(Array, Array), Tuple(Array, Array) [, ...])

Arguments

Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promote to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.

Returned value

  • Depending on the arguments returns one map or tuple, where the first array contains the sorted keys and the second array contains values.

Example

Query with a tuple map:

SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt32(2), 1])) as res, toTypeName(res) as type;

Result:

┌─res────────────┬─type──────────────────────────────┐
│ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │
└────────────────┴───────────────────────────────────┘

Query with Map type:

SELECT mapSubtract(map(1,1), map(1,1));

Result:

┌─mapSubtract(map(1, 1), map(1, 1))─┐
│ {1:0}                             │
└───────────────────────────────────┘

mapPopulateSeries

Fills missing keys in the maps (key and value array pair), where keys are integers. Also, it supports specifying the max key, which is used to extend the keys array.

Syntax

mapPopulateSeries(keys, values[, max])
mapPopulateSeries(map[, max])

Generates a map (a tuple with two arrays or a value of Map type, depending on the arguments), where keys are a series of numbers, from minimum to maximum keys (or max argument if it specified) taken from the map with a step size of one, and corresponding values. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key.

For array arguments the number of elements in keys and values must be the same for each row.

Arguments

Arguments are maps or two arrays, where the first array represent keys, and the second array contains values for the each key.

Mapped arrays:

or

  • map — Map with integer keys. Map.

Returned value

  • Depending on the arguments returns a map or a tuple of two arrays: keys in sorted order, and values the corresponding keys.

Example

Query with mapped arrays:

SELECT mapPopulateSeries([1,2,4], [11,22,44], 5) AS res, toTypeName(res) AS type;

Result:

┌─res──────────────────────────┬─type──────────────────────────────┐
│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │
└──────────────────────────────┴───────────────────────────────────┘

Query with Map type:

SELECT mapPopulateSeries(map(1, 10, 5, 20), 6);

Result:

┌─mapPopulateSeries(map(1, 10, 5, 20), 6)─┐
│ {1:10,2:0,3:0,4:0,5:20,6:0}             │
└─────────────────────────────────────────┘

mapContains

Determines whether the map contains the key parameter.

Syntax

mapContains(map, key)

Parameters

  • map — Map. Map.

  • key — Key. Type matches the type of keys of map parameter.

Returned value

  • 1 if map contains key, 0 if not.

Type: UInt8.

Example

Query:

CREATE TABLE test (a Map(String,String)) ENGINE = Memory;

INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});

SELECT mapContains(a, 'name') FROM test;

Result:

┌─mapContains(a, 'name')─┐
│                      1 │
│                      0 │
└────────────────────────┘

mapKeys

Returns all keys from the map parameter.

Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1 the function reads only keys subcolumn instead of reading and processing the whole column data. The query SELECT mapKeys(m) FROM table transforms to SELECT m.keys FROM table.

Syntax

mapKeys(map)

Parameters

  • map — Map. Map.

Returned value

  • Array containing all keys from the map.

Type: Array.

Example

Query:

CREATE TABLE test (a Map(String,String)) ENGINE = Memory;

INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});

SELECT mapKeys(a) FROM test;

Result:

┌─mapKeys(a)────────────┐
│ ['name','age']        │
│ ['number','position'] │
└───────────────────────┘

mapValues

Returns all values from the map parameter.

Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1 the function reads only values subcolumn instead of reading and processing the whole column data. The query SELECT mapValues(m) FROM table transforms to SELECT m.values FROM table.

Syntax

mapValues(map)

Parameters

  • map — Map. Map.

Returned value

  • Array containing all the values from map.

Type: Array.

Example

Query:

CREATE TABLE test (a Map(String,String)) ENGINE = Memory;

INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});

SELECT mapValues(a) FROM test;

Result:

┌─mapValues(a)─────┐
│ ['eleven','11']  │
│ ['twelve','6.0'] │
└──────────────────┘

MATHEMATICAL

All the functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.

e()

Returns a Float64 number that is close to the number e.

exp(x)

Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.

log(x), ln(x)

Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.

exp2(x)

Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.

log2(x)

Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.

exp10(x)

Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.

log10(x)

Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.

sqrt(x)

Accepts a numeric argument and returns a Float64 number close to the square root of the argument.

cbrt(x)

Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.

erf(x)

If ‘x’ is non-negative, then erf(x / σ√2) is the probability that a random variable having a normal distribution with standard deviation ‘σ’ takes the value that is separated from the expected value by more than ‘x’.

Example (three sigma rule):

SELECT erf(3 / sqrt(2));
┌─erf(divide(3, sqrt(2)))─┐
│      0.9973002039367398 │
└─────────────────────────┘

erfc(x)

Accepts a numeric argument and returns a Float64 number close to 1 - erf(x), but without loss of precision for large ‘x’ values.

lgamma(x)

The logarithm of the gamma function.

tgamma(x)

Gamma function.

sin(x)

The sine.

cos(x)

The cosine.

tan(x)

The tangent.

asin(x)

The arc sine.

acos(x)

The arc cosine.

atan(x)

The arc tangent.

pow(x, y), power(x, y)

Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.

intExp2

Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.

intExp10

Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.

cosh(x)

Hyperbolic cosine.

Syntax

cosh(x)

Arguments

  • x — The angle, in radians. Values from the interval: -∞ < x < +∞. Float64.

Returned value

  • Values from the interval: 1 <= cosh(x) < +∞.

Type: Float64.

Example

Query:

SELECT cosh(0);

Result:

┌─cosh(0)──┐
│        1 │
└──────────┘

acosh(x)

Inverse hyperbolic cosine.

Syntax

acosh(x)

Arguments

  • x — Hyperbolic cosine of angle. Values from the interval: 1 <= x < +∞. Float64.

Returned value

  • The angle, in radians. Values from the interval: 0 <= acosh(x) < +∞.

Type: Float64.

Example

Query:

SELECT acosh(1);

Result:

┌─acosh(1)─┐
│        0 │
└──────────┘

See Also

sinh(x)

Hyperbolic sine.

Syntax

sinh(x)

Arguments

  • x — The angle, in radians. Values from the interval: -∞ < x < +∞. Float64.

Returned value

  • Values from the interval: -∞ < sinh(x) < +∞.

Type: Float64.

Example

Query:

SELECT sinh(0);

Result:

┌─sinh(0)──┐
│        0 │
└──────────┘

asinh(x)

Inverse hyperbolic sine.

Syntax

asinh(x)

Arguments

  • x — Hyperbolic sine of angle. Values from the interval: -∞ < x < +∞. Float64.

Returned value

  • The angle, in radians. Values from the interval: -∞ < asinh(x) < +∞.

Type: Float64.

Example

Query:

SELECT asinh(0);

Result:

┌─asinh(0)─┐
│        0 │
└──────────┘

See Also

atanh(x)

Inverse hyperbolic tangent.

Syntax

atanh(x)

Arguments

  • x — Hyperbolic tangent of angle. Values from the interval: –1 < x < 1. Float64.

Returned value

  • The angle, in radians. Values from the interval: -∞ < atanh(x) < +∞.

Type: Float64.

Example

Query:

SELECT atanh(0);

Result:

┌─atanh(0)─┐
│        0 │
└──────────┘

atan2(y, x)

The function calculates the angle in the Euclidean plane, given in radians, between the positive x axis and the ray to the point (x, y) ≠ (0, 0).

Syntax

atan2(y, x)

Arguments

  • y — y-coordinate of the point through which the ray passes. Float64.

  • x — x-coordinate of the point through which the ray passes. Float64.

Returned value

  • The angle θ such that −π < θ ≤ π, in radians.

Type: Float64.

Example

Query:

SELECT atan2(1, 1);

Result:

┌────────atan2(1, 1)─┐
│ 0.7853981633974483 │
└────────────────────┘

hypot(x, y)

Calculates the length of the hypotenuse of a right-angle triangle. The function avoids problems that occur when squaring very large or very small numbers.

Syntax

hypot(x, y)

Arguments

  • x — The first cathetus of a right-angle triangle. Float64.

  • y — The second cathetus of a right-angle triangle. Float64.

Returned value

  • The length of the hypotenuse of a right-angle triangle.

Type: Float64.

Example

Query:

SELECT hypot(1, 1);

Result:

┌────────hypot(1, 1)─┐
│ 1.4142135623730951 │
└────────────────────┘

log1p(x)

Calculates log(1+x). The function log1p(x) is more accurate than log(1+x) for small values of x.

Syntax

log1p(x)

Arguments

  • x — Values from the interval: -1 < x < +∞. Float64.

Returned value

  • Values from the interval: -∞ < log1p(x) < +∞.

Type: Float64.

Example

Query:

SELECT log1p(0);

Result:

┌─log1p(0)─┐
│        0 │
└──────────┘

See Also

sign(x)

Returns the sign of a real number.

Syntax

sign(x)

Arguments

  • x — Values from -∞ to +∞. Support all numeric types in ClickHouse.

Returned value

  • -1 for x < 0

  • 0 for x = 0

  • 1 for x > 0

Examples

Sign for the zero value:

SELECT sign(0);

Result:

┌─sign(0)─┐
│       0 │
└─────────┘

Sign for the positive value:

SELECT sign(1);

Result:

┌─sign(1)─┐
│       1 │
└─────────┘

Sign for the negative value:

SELECT sign(-1);

Result:

┌─sign(-1)─┐
│       -1 │
└──────────┘

NULLABLE

isNull

Checks whether the argument is NULL.

isNull(x)

Alias: ISNULL.

Arguments

  • x — A value with a non-compound data type.

Returned value

  • 1 if x is NULL.

  • 0 if x is not NULL.

Example

Input table

┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │    3 │
└───┴──────┘

Query

SELECT x FROM t_null WHERE isNull(y);
┌─x─┐
│ 1 │
└───┘

isNotNull

Checks whether the argument is NULL.

isNotNull(x)

Arguments:

  • x — A value with a non-compound data type.

Returned value

  • 0 if x is NULL.

  • 1 if x is not NULL.

Example

Input table

┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │    3 │
└───┴──────┘

Query

SELECT x FROM t_null WHERE isNotNull(y);
┌─x─┐
│ 2 │
└───┘

coalesce

Checks from left to right whether NULL arguments were passed and returns the first non-NULL argument.

coalesce(x,...)

Arguments:

  • Any number of parameters of a non-compound type. All parameters must be compatible by data type.

Returned values

  • The first non-NULL argument.

  • NULL, if all arguments are NULL.

Example

Consider a list of contacts that may specify multiple ways to contact a customer.

┌─name─────┬─mail─┬─phone─────┬──icq─┐
│ client 1 │ ᴺᵁᴸᴸ │ 123-45-67 │  123 │
│ client 2 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ      │ ᴺᵁᴸᴸ │
└──────────┴──────┴───────────┴──────┘

The mail and phone fields are of type String, but the icq field is UInt32, so it needs to be converted to String.

Get the first available contact method for the customer from the contact list:

SELECT name, coalesce(mail, phone, CAST(icq,'Nullable(String)')) FROM aBook;
┌─name─────┬─coalesce(mail, phone, CAST(icq, 'Nullable(String)'))─┐
│ client 1 │ 123-45-67                                            │
│ client 2 │ ᴺᵁᴸᴸ                                                 │
└──────────┴──────────────────────────────────────────────────────┘

ifNull

Returns an alternative value if the main argument is NULL.

ifNull(x,alt)

Arguments:

  • x — The value to check for NULL.

  • alt — The value that the function returns if x is NULL.

Returned values

  • The value x, if x is not NULL.

  • The value alt, if x is NULL.

Example

SELECT ifNull('a', 'b');
┌─ifNull('a', 'b')─┐
│ a                │
└──────────────────┘
SELECT ifNull(NULL, 'b');
┌─ifNull(NULL, 'b')─┐
│ b                 │
└───────────────────┘

nullIf

Returns NULL if the arguments are equal.

nullIf(x, y)

Arguments:

x, y — Values for comparison. They must be compatible types, or ClickHouse will generate an exception.

Returned values

  • NULL, if the arguments are equal.

  • The x value, if the arguments are not equal.

Example

SELECT nullIf(1, 1);
┌─nullIf(1, 1)─┐
│         ᴺᵁᴸᴸ │
└──────────────┘
SELECT nullIf(1, 2);
┌─nullIf(1, 2)─┐
│            1 │
└──────────────┘

assumeNotNull

Results in an equivalent non-Nullable value for a Nullable type. In case the original value is NULL the result is undetermined. See also ifNull and coalesce functions.

assumeNotNull(x)

Arguments:

  • x — The original value.

Returned values

  • The original value from the non-Nullable type, if it is not NULL.

  • Implementation specific result if the original value was NULL.

Example

Consider the t_null table.

SHOW CREATE TABLE t_null;
┌─statement─────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t_null ( x Int8,  y Nullable(Int8)) ENGINE = TinyLog │
└───────────────────────────────────────────────────────────────────────────┘
┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │    3 │
└───┴──────┘

Apply the assumeNotNull function to the y column.

SELECT assumeNotNull(y) FROM t_null;
┌─assumeNotNull(y)─┐
│                0 │
│                3 │
└──────────────────┘
SELECT toTypeName(assumeNotNull(y)) FROM t_null;
┌─toTypeName(assumeNotNull(y))─┐
│ Int8                         │
│ Int8                         │
└──────────────────────────────┘

toNullable

Converts the argument type to Nullable.

toNullable(x)

Arguments:

  • x — The value of any non-compound type.

Returned value

  • The input value with a Nullable type.

Example

SELECT toTypeName(10);
┌─toTypeName(10)─┐
│ UInt8          │
└────────────────┘
SELECT toTypeName(toNullable(10));
┌─toTypeName(toNullable(10))─┐
│ Nullable(UInt8)            │
└────────────────────────────┘

OTHERS

hostName()

Returns a string with the name of the host that this function was performed on. For distributed processing, this is the name of the remote server host, if the function is performed on a remote server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.

getMacro

Gets a named value from the macros section of the server configuration.

Syntax

getMacro(name);

Arguments

  • name — Name to retrieve from the macros section. String.

Returned value

  • Value of the specified macro.

Type: String.

Example

The example macros section in the server configuration file:

<macros>
    <test>Value</test>
</macros>

Query:

SELECT getMacro('test');

Result:

┌─getMacro('test')─┐
│ Value            │
└──────────────────┘

An alternative way to get the same value:

SELECT * FROM system.macros
WHERE macro = 'test';
┌─macro─┬─substitution─┐
│ test  │ Value        │
└───────┴──────────────┘

FQDN

Returns the fully qualified domain name.

Syntax

fqdn();

This function is case-insensitive.

Returned value

  • String with the fully qualified domain name.

Type: String.

Example

Query:

SELECT FQDN();

Result:

┌─FQDN()──────────────────────────┐
│ clickhouse.ru-central1.internal │
└─────────────────────────────────┘

basename

Extracts the trailing part of a string after the last slash or backslash. This function if often used to extract the filename from a path.

basename( expr )

Arguments

  • expr — Expression resulting in a String type value. All the backslashes must be escaped in the resulting value.

Returned Value

A string that contains:

  • The trailing part of a string after the last slash or backslash.

    If the input string contains a path ending with slash or backslash, for example, `/` or `c:\`, the function returns an empty string.
  • The original string if there are no slashes or backslashes.

Example

SELECT 'some/long/path/to/file' AS a, basename(a)
┌─a──────────────────────┬─basename('some\\long\\path\\to\\file')─┐
│ some\long\path\to\file │ file                                   │
└────────────────────────┴────────────────────────────────────────┘
SELECT 'some\\long\\path\\to\\file' AS a, basename(a)
┌─a──────────────────────┬─basename('some\\long\\path\\to\\file')─┐
│ some\long\path\to\file │ file                                   │
└────────────────────────┴────────────────────────────────────────┘
SELECT 'some-file-name' AS a, basename(a)
┌─a──────────────┬─basename('some-file-name')─┐
│ some-file-name │ some-file-name             │
└────────────────┴────────────────────────────┘

visibleWidth(x)

Calculates the approximate width when outputting values to the console in text format (tab-separated). This function is used by the system for implementing Pretty formats.

NULL is represented as a string corresponding to NULL in Pretty formats.

SELECT visibleWidth(NULL)
┌─visibleWidth(NULL)─┐
│                  4 │
└────────────────────┘

toTypeName(x)

Returns a string containing the type name of the passed argument.

If NULL is passed to the function as input, then it returns the Nullable(Nothing) type, which corresponds to an internal NULL representation in ClickHouse.

blockSize()

Gets the size of the block. In ClickHouse, queries are always run on blocks (sets of column parts). This function allows getting the size of the block that you called it for.

byteSize

Returns estimation of uncompressed byte size of its arguments in memory.

Syntax

byteSize(argument [, ...])

Arguments

  • argument — Value.

Returned value

  • Estimation of byte size of the arguments in memory.

Type: UInt64.

Examples

For String arguments the funtion returns the string length + 9 (terminating zero + length).

Query:

SELECT byteSize('string');

Result:

┌─byteSize('string')─┐
│                 15 │
└────────────────────┘

Query:

CREATE TABLE test
(
    `key` Int32,
    `u8` UInt8,
    `u16` UInt16,
    `u32` UInt32,
    `u64` UInt64,
    `i8` Int8,
    `i16` Int16,
    `i32` Int32,
    `i64` Int64,
    `f32` Float32,
    `f64` Float64
)
ENGINE = MergeTree
ORDER BY key;

INSERT INTO test VALUES(1, 8, 16, 32, 64,  -8, -16, -32, -64, 32.32, 64.64);

SELECT key, byteSize(u8) AS `byteSize(UInt8)`, byteSize(u16) AS `byteSize(UInt16)`, byteSize(u32) AS `byteSize(UInt32)`, byteSize(u64) AS `byteSize(UInt64)`, byteSize(i8) AS `byteSize(Int8)`, byteSize(i16) AS `byteSize(Int16)`, byteSize(i32) AS `byteSize(Int32)`, byteSize(i64) AS `byteSize(Int64)`, byteSize(f32) AS `byteSize(Float32)`, byteSize(f64) AS `byteSize(Float64)` FROM test ORDER BY key ASC FORMAT Vertical;

Result:

Row 1:
──────
key:               1
byteSize(UInt8):   1
byteSize(UInt16):  2
byteSize(UInt32):  4
byteSize(UInt64):  8
byteSize(Int8):    1
byteSize(Int16):   2
byteSize(Int32):   4
byteSize(Int64):   8
byteSize(Float32): 4
byteSize(Float64): 8

If the function takes multiple arguments, it returns their combined byte size.

Query:

SELECT byteSize(NULL, 1, 0.3, '');

Result:

┌─byteSize(NULL, 1, 0.3, '')─┐
│                         19 │
└────────────────────────────┘

materialize(x)

Turns a constant into a full column containing just one value. In ClickHouse, full columns and constants are represented differently in memory. Functions work differently for constant arguments and normal arguments (different code is executed), although the result is almost always the same. This function is for debugging this behavior.

ignore(…)

Accepts any arguments, including NULL. Always returns 0. However, the argument is still evaluated. This can be used for benchmarks.

sleep(seconds)

Sleeps ‘seconds’ seconds on each data block. You can specify an integer or a floating-point number.

sleepEachRow(seconds)

Sleeps ‘seconds’ seconds on each row. You can specify an integer or a floating-point number.

currentDatabase()

Returns the name of the current database. You can use this function in table engine parameters in a CREATE TABLE query where you need to specify the database.

currentUser()

Returns the login of current user. Login of user, that initiated query, will be returned in case distibuted query.

SELECT currentUser();

Alias: user(), USER().

Returned values

  • Login of current user.

  • Login of user that initiated query in case of disributed query.

Type: String.

Example

Query:

SELECT currentUser();

Result:

┌─currentUser()─┐
│ default       │
└───────────────┘

isConstant

Checks whether the argument is a constant expression.

A constant expression means an expression whose resulting value is known at the query analysis (i.e. before execution). For example, expressions over literals are constant expressions.

The function is intended for development, debugging and demonstration.

Syntax

isConstant(x)

Arguments

  • x — Expression to check.

Returned values

  • 1x is constant.

  • 0x is non-constant.

Type: UInt8.

Examples

Query:

SELECT isConstant(x + 1) FROM (SELECT 43 AS x)

Result:

┌─isConstant(plus(x, 1))─┐
│                      1 │
└────────────────────────┘

Query:

WITH 3.14 AS pi SELECT isConstant(cos(pi))

Result:

┌─isConstant(cos(pi))─┐
│                   1 │
└─────────────────────┘

Query:

SELECT isConstant(number) FROM numbers(1)

Result:

┌─isConstant(number)─┐
│                  0 │
└────────────────────┘

isFinite(x)

Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is not infinite and not a NaN, otherwise 0.

isInfinite(x)

Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is infinite, otherwise 0. Note that 0 is returned for a NaN.

ifNotFinite

Checks whether floating point value is finite.

Syntax

ifNotFinite(x,y)

Arguments

  • x — Value to be checked for infinity. Type: Float*.

  • y — Fallback value. Type: Float*.

Returned value

  • x if x is finite.

  • y if x is not finite.

Example

Query:

SELECT 1/0 as infimum, ifNotFinite(infimum,42)

Result:

┌─infimum─┬─ifNotFinite(divide(1, 0), 42)─┐
│     inf │                            42 │
└─────────┴───────────────────────────────┘

You can get similar result by using ternary operator: isFinite(x) ? x : y.

isNaN(x)

Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is a NaN, otherwise 0.

hasColumnInTable([‘hostname’[, ‘username’[, ‘password’]],] ‘database’, ‘table’, ‘column’)

Accepts constant strings: database name, table name, and column name. Returns a UInt8 constant expression equal to 1 if there is a column, otherwise 0. If the hostname parameter is set, the test will run on a remote server. The function throws an exception if the table does not exist. For elements in a nested data structure, the function checks for the existence of a column. For the nested data structure itself, the function returns 0.

bar

Allows building a unicode-art diagram.

bar(x, min, max, width) draws a band with a width proportional to (x - min) and equal to width characters when x = max.

Arguments

  • x — Size to display.

  • min, max — Integer constants. The value must fit in Int64.

  • width — Constant, positive integer, can be fractional.

The band is drawn with accuracy to one eighth of a symbol.

Example:

SELECT
    toHour(EventTime) AS h,
    count() AS c,
    bar(c, 0, 600000, 20) AS bar
FROM test.hits
GROUP BY h
ORDER BY h ASC
┌──h─┬──────c─┬─bar────────────────┐
│  0 │ 292907 │ █████████▋         │
│  1 │ 180563 │ ██████             │
│  2 │ 114861 │ ███▋               │
│  3 │  85069 │ ██▋                │
│  4 │  68543 │ ██▎                │
│  5 │  78116 │ ██▌                │
│  6 │ 113474 │ ███▋               │
│  7 │ 170678 │ █████▋             │
│  8 │ 278380 │ █████████▎         │
│  9 │ 391053 │ █████████████      │
│ 10 │ 457681 │ ███████████████▎   │
│ 11 │ 493667 │ ████████████████▍  │
│ 12 │ 509641 │ ████████████████▊  │
│ 13 │ 522947 │ █████████████████▍ │
│ 14 │ 539954 │ █████████████████▊ │
│ 15 │ 528460 │ █████████████████▌ │
│ 16 │ 539201 │ █████████████████▊ │
│ 17 │ 523539 │ █████████████████▍ │
│ 18 │ 506467 │ ████████████████▊  │
│ 19 │ 520915 │ █████████████████▎ │
│ 20 │ 521665 │ █████████████████▍ │
│ 21 │ 542078 │ ██████████████████ │
│ 22 │ 493642 │ ████████████████▍  │
│ 23 │ 400397 │ █████████████▎     │
└────┴────────┴────────────────────┘

transform

Transforms a value according to the explicitly defined mapping of some elements to other ones. There are two variations of this function:

transform(x, array_from, array_to, default)

x – What to transform.

array_from – Constant array of values for converting.

array_to – Constant array of values to convert the values in ‘from’ to.

default – Which value to use if ‘x’ is not equal to any of the values in ‘from’.

array_from and array_to – Arrays of the same size.

Types:

transform(T, Array(T), Array(U), U) -> U

T and U can be numeric, string, or Date or DateTime types. Where the same letter is indicated (T or U), for numeric types these might not be matching types, but types that have a common type. For example, the first argument can have the Int64 type, while the second has the Array(UInt16) type.

If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the existing element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘default’. If there are multiple matching elements in ‘array_from’, it returns one of the matches.

Example:

SELECT
    transform(SearchEngineID, [2, 3], ['Yandex', 'Google'], 'Other') AS title,
    count() AS c
FROM test.hits
WHERE SearchEngineID != 0
GROUP BY title
ORDER BY c DESC
┌─title─────┬──────c─┐
│ Yandex    │ 498635 │
│ Google    │ 229872 │
│ Other     │ 104472 │
└───────────┴────────┘

transform(x, array_from, array_to)

Differs from the first variation in that the ‘default’ argument is omitted. If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the matching element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘x’.

Types:

transform(T, Array(T), Array(T)) -> T

Example:

SELECT
    transform(domain(Referer), ['yandex.ru', 'google.ru', 'vkontakte.ru'], ['www.yandex', 'example.com', 'vk.com']) AS s,
    count() AS c
FROM test.hits
GROUP BY domain(Referer)
ORDER BY count() DESC
LIMIT 10
┌─s──────────────┬───────c─┐
│                │ 2906259 │
│ www.yandex     │  867767 │
│ ███████.ru     │  313599 │
│ mail.yandex.ru │  107147 │
│ ██████.ru      │  100355 │
│ █████████.ru   │   65040 │
│ news.yandex.ru │   64515 │
│ ██████.net     │   59141 │
│ example.com    │   57316 │
└────────────────┴─────────┘

formatReadableDecimalSize(x)

Accepts the size (number of bytes). Returns a rounded size with a suffix (KB, MB, etc.) as a string.

Example:

SELECT
    arrayJoin([1, 1024, 1024*1024, 192851925]) AS filesize_bytes,
    formatReadableDecimalSize(filesize_bytes) AS filesize
┌─filesize_bytes─┬─filesize───┐
│              1 │ 1.00 B     │
│           1024 │ 1.02 KB   │
│        1048576 │ 1.05 MB   │
│      192851925 │ 192.85 MB │
└────────────────┴────────────┘

formatReadableSize(x)

Accepts the size (number of bytes). Returns a rounded size with a suffix (KiB, MiB, etc.) as a string.

Example:

SELECT
    arrayJoin([1, 1024, 1024*1024, 192851925]) AS filesize_bytes,
    formatReadableSize(filesize_bytes) AS filesize
┌─filesize_bytes─┬─filesize───┐
│              1 │ 1.00 B     │
│           1024 │ 1.00 KiB   │
│        1048576 │ 1.00 MiB   │
│      192851925 │ 183.92 MiB │
└────────────────┴────────────┘

formatReadableQuantity(x)

Accepts the number. Returns a rounded number with a suffix (thousand, million, billion, etc.) as a string.

It is useful for reading big numbers by human.

Example:

SELECT
    arrayJoin([1024, 1234 * 1000, (4567 * 1000) * 1000, 98765432101234]) AS number,
    formatReadableQuantity(number) AS number_for_humans
┌─────────number─┬─number_for_humans─┐
│           1024 │ 1.02 thousand     │
│        1234000 │ 1.23 million      │
│     4567000000 │ 4.57 billion      │
│ 98765432101234 │ 98.77 trillion    │
└────────────────┴───────────────────┘

formatReadableTimeDelta

Accepts the time delta in seconds. Returns a time delta with (year, month, day, hour, minute, second) as a string.

Syntax

formatReadableTimeDelta(column[, maximum_unit])

Arguments

  • column — A column with numeric time delta.

  • maximum_unit — Optional. Maximum unit to show. Acceptable values seconds, minutes, hours, days, months, years.

Example:

SELECT
    arrayJoin([100, 12345, 432546534]) AS elapsed,
    formatReadableTimeDelta(elapsed) AS time_delta
┌────elapsed─┬─time_delta ─────────────────────────────────────────────────────┐
│        100 │ 1 minute and 40 seconds                                         │
│      12345 │ 3 hours, 25 minutes and 45 seconds                              │
│  432546534 │ 13 years, 8 months, 17 days, 7 hours, 48 minutes and 54 seconds │
└────────────┴─────────────────────────────────────────────────────────────────┘
SELECT
    arrayJoin([100, 12345, 432546534]) AS elapsed,
    formatReadableTimeDelta(elapsed, 'minutes') AS time_delta
┌────elapsed─┬─time_delta ─────────────────────────────────────────────────────┐
│        100 │ 1 minute and 40 seconds                                         │
│      12345 │ 205 minutes and 45 seconds                                      │
│  432546534 │ 7209108 minutes and 54 seconds                                  │
└────────────┴─────────────────────────────────────────────────────────────────┘

least(a, b)

Returns the smallest value from a and b.

greatest(a, b)

Returns the largest value of a and b.

uptime()

Returns the server’s uptime in seconds. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.

version()

Returns the version of the server as a string. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.

blockNumber

Returns the sequence number of the data block where the row is located.

rowNumberInBlock

Returns the ordinal number of the row in the data block. Different data blocks are always recalculated.

rowNumberInAllBlocks()

Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.

neighbor

The window function that provides access to a row at a specified offset which comes before or after the current row of a given column.

Syntax

neighbor(column, offset[, default_value])

The result of the function depends on the affected data blocks and the order of data in the block.

WARNING

It can reach the neighbor rows only inside the currently processed data block.

The rows order used during the calculation of neighbor can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.

Arguments

  • column — A column name or scalar expression.

  • offset — The number of rows forwards or backwards from the current row of column. Int64.

  • default_value — Optional. The value to be returned if offset goes beyond the scope of the block. Type of data blocks affected.

Returned values

  • Value for column in offset distance from current row if offset value is not outside block bounds.

  • Default value for column if offset value is outside block bounds. If default_value is given, then it will be used.

Type: type of data blocks affected or default value type.

Example

Query:

SELECT number, neighbor(number, 2) FROM system.numbers LIMIT 10;

Result:

┌─number─┬─neighbor(number, 2)─┐
│      0 │                   2 │
│      1 │                   3 │
│      2 │                   4 │
│      3 │                   5 │
│      4 │                   6 │
│      5 │                   7 │
│      6 │                   8 │
│      7 │                   9 │
│      8 │                   0 │
│      9 │                   0 │
└────────┴─────────────────────┘

Query:

SELECT number, neighbor(number, 2, 999) FROM system.numbers LIMIT 10;

Result:

┌─number─┬─neighbor(number, 2, 999)─┐
│      0 │                        2 │
│      1 │                        3 │
│      2 │                        4 │
│      3 │                        5 │
│      4 │                        6 │
│      5 │                        7 │
│      6 │                        8 │
│      7 │                        9 │
│      8 │                      999 │
│      9 │                      999 │
└────────┴──────────────────────────┘

This function can be used to compute year-over-year metric value:

Query:

WITH toDate('2018-01-01') AS start_date
SELECT
    toStartOfMonth(start_date + (number * 32)) AS month,
    toInt32(month) % 100 AS money,
    neighbor(money, -12) AS prev_year,
    round(prev_year / money, 2) AS year_over_year
FROM numbers(16)

Result:

┌──────month─┬─money─┬─prev_year─┬─year_over_year─┐
│ 2018-01-01 │    32 │         0 │              0 │
│ 2018-02-01 │    63 │         0 │              0 │
│ 2018-03-01 │    91 │         0 │              0 │
│ 2018-04-01 │    22 │         0 │              0 │
│ 2018-05-01 │    52 │         0 │              0 │
│ 2018-06-01 │    83 │         0 │              0 │
│ 2018-07-01 │    13 │         0 │              0 │
│ 2018-08-01 │    44 │         0 │              0 │
│ 2018-09-01 │    75 │         0 │              0 │
│ 2018-10-01 │     5 │         0 │              0 │
│ 2018-11-01 │    36 │         0 │              0 │
│ 2018-12-01 │    66 │         0 │              0 │
│ 2019-01-01 │    97 │        32 │           0.33 │
│ 2019-02-01 │    28 │        63 │           2.25 │
│ 2019-03-01 │    56 │        91 │           1.62 │
│ 2019-04-01 │    87 │        22 │           0.25 │
└────────────┴───────┴───────────┴────────────────┘

runningDifference(x)

Calculates the difference between successive row values ​​in the data block. Returns 0 for the first row and the difference from the previous row for each subsequent row.

WARNING

It can reach the previous row only inside the currently processed data block.

The result of the function depends on the affected data blocks and the order of data in the block.

The rows order used during the calculation of runningDifference can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.

Example:

SELECT
    EventID,
    EventTime,
    runningDifference(EventTime) AS delta
FROM
(
    SELECT
        EventID,
        EventTime
    FROM events
    WHERE EventDate = '2016-11-24'
    ORDER BY EventTime ASC
    LIMIT 5
)
┌─EventID─┬───────────EventTime─┬─delta─┐
│    1106 │ 2016-11-24 00:00:04 │     0 │
│    1107 │ 2016-11-24 00:00:05 │     1 │
│    1108 │ 2016-11-24 00:00:05 │     0 │
│    1109 │ 2016-11-24 00:00:09 │     4 │
│    1110 │ 2016-11-24 00:00:10 │     1 │
└─────────┴─────────────────────┴───────┘

Please note - block size affects the result. With each new block, the runningDifference state is reset.

SELECT
    number,
    runningDifference(number + 1) AS diff
FROM numbers(100000)
WHERE diff != 1
┌─number─┬─diff─┐
│      0 │    0 │
└────────┴──────┘
┌─number─┬─diff─┐
│  65536 │    0 │
└────────┴──────┘
set max_block_size=100000 -- default value is 65536!

SELECT
    number,
    runningDifference(number + 1) AS diff
FROM numbers(100000)
WHERE diff != 1
┌─number─┬─diff─┐
│      0 │    0 │
└────────┴──────┘

runningDifferenceStartingWithFirstValue

Same as for runningDifference, the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.

runningConcurrency

Calculates the number of concurrent events. Each event has a start time and an end time. The start time is included in the event, while the end time is excluded. Columns with a start time and an end time must be of the same data type. The function calculates the total number of active (concurrent) events for each event start time.

WARNING

Events must be ordered by the start time in ascending order. If this requirement is violated the function raises an exception. Every data block is processed separately. If events from different data blocks overlap then they can not be processed correctly.

Syntax

runningConcurrency(start, end)

Arguments

Returned values

  • The number of concurrent events at each event start time.

Type: UInt32

Example

Consider the table:

┌──────start─┬────────end─┐
│ 2021-03-03 │ 2021-03-11 │
│ 2021-03-06 │ 2021-03-12 │
│ 2021-03-07 │ 2021-03-08 │
│ 2021-03-11 │ 2021-03-12 │
└────────────┴────────────┘

Query:

SELECT start, runningConcurrency(start, end) FROM example_table;

Result:

┌──────start─┬─runningConcurrency(start, end)─┐
│ 2021-03-03 │                              1 │
│ 2021-03-06 │                              2 │
│ 2021-03-07 │                              3 │
│ 2021-03-11 │                              2 │
└────────────┴────────────────────────────────┘

MACNumToString(num)

Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).

MACStringToNum(s)

The inverse function of MACNumToString. If the MAC address has an invalid format, it returns 0.

MACStringToOUI(s)

Accepts a MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Returns the first three octets as a UInt64 number. If the MAC address has an invalid format, it returns 0.

getSizeOfEnumType

Returns the number of fields in Enum.

getSizeOfEnumType(value)

Arguments:

  • value — Value of type Enum.

Returned values

  • The number of fields with Enum input values.

  • An exception is thrown if the type is not Enum.

Example

SELECT getSizeOfEnumType( CAST('a' AS Enum8('a' = 1, 'b' = 2) ) ) AS x
┌─x─┐
│ 2 │
└───┘

blockSerializedSize

Returns size on disk (without taking into account compression).

blockSerializedSize(value[, value[, ...]])

Arguments

  • value — Any value.

Returned values

  • The number of bytes that will be written to disk for block of values (without compression).

Example

Query:

SELECT blockSerializedSize(maxState(1)) as x

Result:

┌─x─┐
│ 2 │
└───┘

toColumnTypeName

Returns the name of the class that represents the data type of the column in RAM.

toColumnTypeName(value)

Arguments:

  • value — Any type of value.

Returned values

  • A string with the name of the class that is used for representing the value data type in RAM.

Example of the difference betweentoTypeName ' and ' toColumnTypeName

SELECT toTypeName(CAST('2018-01-01 01:02:03' AS DateTime))
┌─toTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ DateTime                                            │
└─────────────────────────────────────────────────────┘
SELECT toColumnTypeName(CAST('2018-01-01 01:02:03' AS DateTime))
┌─toColumnTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ Const(UInt32)                                             │
└───────────────────────────────────────────────────────────┘

The example shows that the DateTime data type is stored in memory as Const(UInt32).

dumpColumnStructure

Outputs a detailed description of data structures in RAM

dumpColumnStructure(value)

Arguments:

  • value — Any type of value.

Returned values

  • A string describing the structure that is used for representing the value data type in RAM.

Example

SELECT dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))
┌─dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ DateTime, Const(size = 1, UInt32(size = 1))                  │
└──────────────────────────────────────────────────────────────┘

defaultValueOfArgumentType

Outputs the default value for the data type.

Does not include default values for custom columns set by the user.

defaultValueOfArgumentType(expression)

Arguments:

  • expression — Arbitrary type of value or an expression that results in a value of an arbitrary type.

Returned values

  • 0 for numbers.

  • Empty string for strings.

  • ᴺᵁᴸᴸ for Nullable.

Example

SELECT defaultValueOfArgumentType( CAST(1 AS Int8) )
┌─defaultValueOfArgumentType(CAST(1, 'Int8'))─┐
│                                           0 │
└─────────────────────────────────────────────┘
SELECT defaultValueOfArgumentType( CAST(1 AS Nullable(Int8) ) )
┌─defaultValueOfArgumentType(CAST(1, 'Nullable(Int8)'))─┐
│                                                  ᴺᵁᴸᴸ │
└───────────────────────────────────────────────────────┘

defaultValueOfTypeName

Outputs the default value for given type name.

Does not include default values for custom columns set by the user.

defaultValueOfTypeName(type)

Arguments:

  • type — A string representing a type name.

Returned values

  • 0 for numbers.

  • Empty string for strings.

  • ᴺᵁᴸᴸ for Nullable.

Example

SELECT defaultValueOfTypeName('Int8')
┌─defaultValueOfTypeName('Int8')─┐
│                              0 │
└────────────────────────────────┘
SELECT defaultValueOfTypeName('Nullable(Int8)')
┌─defaultValueOfTypeName('Nullable(Int8)')─┐
│                                     ᴺᵁᴸᴸ │
└──────────────────────────────────────────┘

indexHint

The function is intended for debugging and introspection purposes. The function ignores it's argument and always returns 1. Arguments are not even evaluated.

But for the purpose of index analysis, the argument of this function is analyzed as if it was present directly without being wrapped inside indexHint function. This allows to select data in index ranges by the corresponding condition but without further filtering by this condition. The index in ClickHouse is sparse and using indexHint will yield more data than specifying the same condition directly.

Syntax

SELECT * FROM table WHERE indexHint(<expression>)

Returned value

  1. Type: Uint8.

Example

Here is the example of test data from the table ontime.

Input table:

SELECT count() FROM ontime
┌─count()─┐
│ 4276457 │
└─────────┘

The table has indexes on the fields (FlightDate, (Year, FlightDate)).

Create a query, where the index is not used.

Query:

SELECT FlightDate AS k, count() FROM ontime GROUP BY k ORDER BY k

ClickHouse processed the entire table (Processed 4.28 million rows).

Result:

┌──────────k─┬─count()─┐
│ 2017-01-01 │   13970 │
│ 2017-01-02 │   15882 │
........................
│ 2017-09-28 │   16411 │
│ 2017-09-29 │   16384 │
│ 2017-09-30 │   12520 │
└────────────┴─────────┘

To apply the index, select a specific date.

Query:

SELECT FlightDate AS k, count() FROM ontime WHERE k = '2017-09-15' GROUP BY k ORDER BY k

By using the index, ClickHouse processed a significantly smaller number of rows (Processed 32.74 thousand rows).

Result:

┌──────────k─┬─count()─┐
│ 2017-09-15 │   16428 │
└────────────┴─────────┘

Now wrap the expression k = '2017-09-15' into indexHint function.

Query:

SELECT
    FlightDate AS k,
    count()
FROM ontime
WHERE indexHint(k = '2017-09-15')
GROUP BY k
ORDER BY k ASC

ClickHouse used the index in the same way as the previous time (Processed 32.74 thousand rows). The expression k = '2017-09-15' was not used when generating the result. In examle the indexHint function allows to see adjacent dates.

Result:

┌──────────k─┬─count()─┐
│ 2017-09-14 │    7071 │
│ 2017-09-15 │   16428 │
│ 2017-09-16 │    1077 │
│ 2017-09-30 │    8167 │
└────────────┴─────────┘

replicate

Creates an array with a single value.

Used for internal implementation of arrayJoin.

SELECT replicate(x, arr);

Arguments:

  • arr — Original array. ClickHouse creates a new array of the same length as the original and fills it with the value x.

  • x — The value that the resulting array will be filled with.

Returned value

An array filled with the value x.

Type: Array.

Example

Query:

SELECT replicate(1, ['a', 'b', 'c'])

Result:

┌─replicate(1, ['a', 'b', 'c'])─┐
│ [1,1,1]                       │
└───────────────────────────────┘

filesystemAvailable

Returns amount of remaining space on the filesystem where the files of the databases located. It is always smaller than total free space (filesystemFree) because some space is reserved for OS.

Syntax

filesystemAvailable()

Returned value

  • The amount of remaining space available in bytes.

Type: UInt64.

Example

Query:

SELECT formatReadableSize(filesystemAvailable()) AS "Available space", toTypeName(filesystemAvailable()) AS "Type";

Result:

┌─Available space─┬─Type───┐
│ 30.75 GiB       │ UInt64 │
└─────────────────┴────────┘

filesystemFree

Returns total amount of the free space on the filesystem where the files of the databases located. See also filesystemAvailable

Syntax

filesystemFree()

Returned value

  • Amount of free space in bytes.

Type: UInt64.

Example

Query:

SELECT formatReadableSize(filesystemFree()) AS "Free space", toTypeName(filesystemFree()) AS "Type";

Result:

┌─Free space─┬─Type───┐
│ 32.39 GiB  │ UInt64 │
└────────────┴────────┘

filesystemCapacity

Returns the capacity of the filesystem in bytes. For evaluation, the path to the data directory must be configured.

Syntax

filesystemCapacity()

Returned value

  • Capacity information of the filesystem in bytes.

Type: UInt64.

Example

Query:

SELECT formatReadableSize(filesystemCapacity()) AS "Capacity", toTypeName(filesystemCapacity()) AS "Type"

Result:

┌─Capacity──┬─Type───┐
│ 39.32 GiB │ UInt64 │
└───────────┴────────┘

initializeAggregation

Calculates result of aggregate function based on single value. It is intended to use this function to initialize aggregate functions with combinator -State. You can create states of aggregate functions and insert them to columns of type AggregateFunction or use initialized aggregates as default values.

Syntax

initializeAggregation (aggregate_function, arg1, arg2, ..., argN)

Arguments

  • aggregate_function — Name of the aggregation function to initialize. String.

  • arg — Arguments of aggregate function.

Returned value(s)

  • Result of aggregation for every row passed to the function.

The return type is the same as the return type of function, that initializeAgregation takes as first argument.

Example

Query:

SELECT uniqMerge(state) FROM (SELECT initializeAggregation('uniqState', number % 3) AS state FROM numbers(10000));

Result:

┌─uniqMerge(state)─┐
│                3 │
└──────────────────┘

Query:

SELECT finalizeAggregation(state), toTypeName(state) FROM (SELECT initializeAggregation('sumState', number % 3) AS state FROM numbers(5));

Result:

┌─finalizeAggregation(state)─┬─toTypeName(state)─────────────┐
│                          0 │ AggregateFunction(sum, UInt8) │
│                          1 │ AggregateFunction(sum, UInt8) │
│                          2 │ AggregateFunction(sum, UInt8) │
│                          0 │ AggregateFunction(sum, UInt8) │
│                          1 │ AggregateFunction(sum, UInt8) │
└────────────────────────────┴───────────────────────────────┘

Example with AggregatingMergeTree table engine and AggregateFunction column:

CREATE TABLE metrics
(
    key UInt64,
    value AggregateFunction(sum, UInt64) DEFAULT initializeAggregation('sumState', toUInt64(0))
)
ENGINE = AggregatingMergeTree
ORDER BY key
INSERT INTO metrics VALUES (0, initializeAggregation('sumState', toUInt64(42)))

See Also

finalizeAggregation

Takes state of aggregate function. Returns result of aggregation (or finalized state when using-State combinator).

Syntax

finalizeAggregation(state)

Arguments

Returned value(s)

  • Value/values that was aggregated.

Type: Value of any types that was aggregated.

Examples

Query:

SELECT finalizeAggregation(( SELECT countState(number) FROM numbers(10)));

Result:

┌─finalizeAggregation(_subquery16)─┐
│                               10 │
└──────────────────────────────────┘

Query:

SELECT finalizeAggregation(( SELECT sumState(number) FROM numbers(10)));

Result:

┌─finalizeAggregation(_subquery20)─┐
│                               45 │
└──────────────────────────────────┘

Note that NULL values are ignored.

Query:

SELECT finalizeAggregation(arrayReduce('anyState', [NULL, 2, 3]));

Result:

┌─finalizeAggregation(arrayReduce('anyState', [NULL, 2, 3]))─┐
│                                                          2 │
└────────────────────────────────────────────────────────────┘

Combined example:

Query:

WITH initializeAggregation('sumState', number) AS one_row_sum_state
SELECT
    number,
    finalizeAggregation(one_row_sum_state) AS one_row_sum,
    runningAccumulate(one_row_sum_state) AS cumulative_sum
FROM numbers(10);

Result:

┌─number─┬─one_row_sum─┬─cumulative_sum─┐
│      0 │           0 │              0 │
│      1 │           1 │              1 │
│      2 │           2 │              3 │
│      3 │           3 │              6 │
│      4 │           4 │             10 │
│      5 │           5 │             15 │
│      6 │           6 │             21 │
│      7 │           7 │             28 │
│      8 │           8 │             36 │
│      9 │           9 │             45 │
└────────┴─────────────┴────────────────┘

See Also

runningAccumulate

Accumulates states of an aggregate function for each row of a data block.

WARNING

The state is reset for each new data block.

Syntax

runningAccumulate(agg_state[, grouping]);

Arguments

  • agg_state — State of the aggregate function. AggregateFunction.

  • grouping — Grouping key. Optional. The state of the function is reset if the grouping value is changed. It can be any of the supported data types for which the equality operator is defined.

Returned value

  • Each resulting row contains a result of the aggregate function, accumulated for all the input rows from 0 to the current position. runningAccumulate resets states for each new data block or when the grouping value changes.

Type depends on the aggregate function used.

Examples

Consider how you can use runningAccumulate to find the cumulative sum of numbers without and with grouping.

Query:

SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k);

Result:

┌─k─┬─res─┐
│ 0 │   0 │
│ 1 │   1 │
│ 2 │   3 │
│ 3 │   6 │
│ 4 │  10 │
│ 5 │  15 │
│ 6 │  21 │
│ 7 │  28 │
│ 8 │  36 │
│ 9 │  45 │
└───┴─────┘

The subquery generates sumState for every number from 0 to 9. sumState returns the state of the sum function that contains the sum of a single number.

The whole query does the following:

  1. For the first row, runningAccumulate takes sumState(0) and returns 0.

  2. For the second row, the function merges sumState(0) and sumState(1) resulting in sumState(0 + 1), and returns 1 as a result.

  3. For the third row, the function merges sumState(0 + 1) and sumState(2) resulting in sumState(0 + 1 + 2), and returns 3 as a result.

  4. The actions are repeated until the block ends.

The following example shows the groupping parameter usage:

Query:

SELECT
    grouping,
    item,
    runningAccumulate(state, grouping) AS res
FROM
(
    SELECT
        toInt8(number / 4) AS grouping,
        number AS item,
        sumState(number) AS state
    FROM numbers(15)
    GROUP BY item
    ORDER BY item ASC
);

Result:

┌─grouping─┬─item─┬─res─┐
│        0 │    0 │   0 │
│        0 │    1 │   1 │
│        0 │    2 │   3 │
│        0 │    3 │   6 │
│        1 │    4 │   4 │
│        1 │    5 │   9 │
│        1 │    6 │  15 │
│        1 │    7 │  22 │
│        2 │    8 │   8 │
│        2 │    9 │  17 │
│        2 │   10 │  27 │
│        2 │   11 │  38 │
│        3 │   12 │  12 │
│        3 │   13 │  25 │
│        3 │   14 │  39 │
└──────────┴──────┴─────┘

As you can see, runningAccumulate merges states for each group of rows separately.

joinGet

The function lets you extract data from the table the same way as from a dictionary.

Gets data from Join tables using the specified join key.

Only supports tables created with the ENGINE = Join(ANY, LEFT, <join_keys>) statement.

Syntax

joinGet(join_storage_table_name, `value_column`, join_keys)

Arguments

  • join_storage_table_name — an identifier indicates where search is performed. The identifier is searched in the default database (see parameter default_database in the config file). To override the default database, use the USE db_name or specify the database and the table through the separator db_name.db_table, see the example.

  • value_column — name of the column of the table that contains required data.

  • join_keys — list of keys.

Returned value

Returns list of values corresponded to list of keys.

If certain does not exist in source table then 0 or null will be returned based on join_use_nulls setting.

More info about join_use_nulls in Join operation.

Example

Input table:

CREATE DATABASE db_test
CREATE TABLE db_test.id_val(`id` UInt32, `val` UInt32) ENGINE = Join(ANY, LEFT, id) SETTINGS join_use_nulls = 1
INSERT INTO db_test.id_val VALUES (1,11)(2,12)(4,13)
┌─id─┬─val─┐
│  4 │  13 │
│  2 │  12 │
│  1 │  11 │
└────┴─────┘

Query:

SELECT joinGet(db_test.id_val,'val',toUInt32(number)) from numbers(4) SETTINGS join_use_nulls = 1

Result:

┌─joinGet(db_test.id_val, 'val', toUInt32(number))─┐
│                                                0 │
│                                               11 │
│                                               12 │
│                                                0 │
└──────────────────────────────────────────────────┘

catboostEvaluate(path_to_model, feature_1, feature_2, …, feature_n)

Evaluate external catboost model. CatBoost is an open-source gradient boosting library developed by Yandex for machine learing. Accepts a path to a catboost model and model arguments (features). Returns Float64.

SELECT feat1, ..., feat_n, catboostEvaluate('/path/to/model.bin', feat_1, ..., feat_n) AS prediction
FROM data_table

Prerequisites

  1. Build the catboost evaluation library

Before evaluating catboost models, the libcatboostmodel.<so|dylib> library must be made available. See CatBoost documentation how to compile it.

Next, specify the path to libcatboostmodel.<so|dylib> in the clickhouse configuration:

<clickhouse>
...
    <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
...
</clickhouse>

For security and isolation reasons, the model evaluation does not run in the server process but in the clickhouse-library-bridge process. At the first execution of catboostEvaluate(), the server starts the library bridge process if it is not running already. Both processes communicate using a HTTP interface. By default, port 9012 is used. A different port can be specified as follows - this is useful if port 9012 is already assigned to a different service.

<library_bridge>
    <port>9019</port>
</library_bridge>
  1. Train a catboost model using libcatboost

See Training and applying models for how to train catboost models from a training data set.

throwIf(x[, message[, error_code]])

Throw an exception if the argument is non zero. message - is an optional parameter: a constant string providing a custom error message error_code - is an optional parameter: a constant integer providing a custom error code

To use the error_code argument, configuration parameter allow_custom_error_code_in_throwif must be enabled.

SELECT throwIf(number = 3, 'Too many') FROM numbers(10);
↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) Received exception from server (version 19.14.1):
Code: 395. DB::Exception: Received from localhost:9000. DB::Exception: Too many.

identity

Returns the same value that was used as its argument. Used for debugging and testing, allows to cancel using index, and get the query performance of a full scan. When query is analyzed for possible use of index, the analyzer does not look inside identity functions. Also constant folding is not applied too.

Syntax

identity(x)

Example

Query:

SELECT identity(42)

Result:

┌─identity(42)─┐
│           42 │
└──────────────┘

getSetting

Returns the current value of a custom setting.

Syntax

getSetting('custom_setting');

Parameter

  • custom_setting — The setting name. String.

Returned value

  • The setting current value.

Example

SET custom_a = 123;
SELECT getSetting('custom_a');

Result

123

See Also

isDecimalOverflow

Checks whether the Decimal value is out of its (or specified) precision.

Syntax

isDecimalOverflow(d, [p])

Arguments

  • d — value. Decimal.

  • p — precision. Optional. If omitted, the initial precision of the first argument is used. Using of this paratemer could be helpful for data extraction to another DBMS or file. UInt8.

Returned values

  • 1 — Decimal value has more digits then it's precision allow,

  • 0 — Decimal value satisfies the specified precision.

Example

Query:

SELECT isDecimalOverflow(toDecimal32(1000000000, 0), 9),
       isDecimalOverflow(toDecimal32(1000000000, 0)),
       isDecimalOverflow(toDecimal32(-1000000000, 0), 9),
       isDecimalOverflow(toDecimal32(-1000000000, 0));

Result:

1   1   1   1

countDigits

Returns number of decimal digits you need to represent the value.

Syntax

countDigits(x)

Arguments

Returned value

Number of digits.

Type: UInt8.

NOTE

For Decimal values takes into account their scales: calculates result over underlying integer type which is (value * scale). For example: countDigits(42) = 2, countDigits(42.000) = 5, countDigits(0.04200) = 4. I.e. you may check decimal overflow for Decimal64 with countDecimal(x) > 18. It's a slow variant of isDecimalOverflow.

Example

Query:

SELECT countDigits(toDecimal32(1, 9)), countDigits(toDecimal32(-1, 9)),
       countDigits(toDecimal64(1, 18)), countDigits(toDecimal64(-1, 18)),
       countDigits(toDecimal128(1, 38)), countDigits(toDecimal128(-1, 38));

Result:

10  10  19  19  39  39

errorCodeToName

Returned value

  • Variable name for the error code.

Type: LowCardinality(String).

Syntax

errorCodeToName(1)

Result:

UNSUPPORTED_METHOD

tcpPort

Returns native interface TCP port number listened by this server. If it is executed in the context of a distributed table, then it generates a normal column, otherwise it produces a constant value.

Syntax

tcpPort()

Arguments

  • None.

Returned value

  • The TCP port number.

Type: UInt16.

Example

Query:

SELECT tcpPort();

Result:

┌─tcpPort()─┐
│      9000 │
└───────────┘

See Also

RANDOM NUMBER AND STRING

All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The only purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.

NOTE

Non-cryptographic generators of pseudo-random numbers are used.

rand, rand32

Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers.

Uses a linear congruential generator.

rand64

Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.

Uses a linear congruential generator.

randCanonical

The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1).

Non-deterministic. Return type is Float64.

randConstant

Produces a constant column with a random value.

Syntax

randConstant([x])

Arguments

Returned value

  • Pseudo-random number.

Type: UInt32.

Example

Query:

SELECT rand(), rand(1), rand(number), randConstant(), randConstant(1), randConstant(number)
FROM numbers(3)

Result:

┌─────rand()─┬────rand(1)─┬─rand(number)─┬─randConstant()─┬─randConstant(1)─┬─randConstant(number)─┐
│ 3047369878 │ 4132449925 │   4044508545 │     2740811946 │      4229401477 │           1924032898 │
│ 2938880146 │ 1267722397 │   4154983056 │     2740811946 │      4229401477 │           1924032898 │
│  956619638 │ 4238287282 │   1104342490 │     2740811946 │      4229401477 │           1924032898 │
└────────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────────────────┘

randomString

randomFixedString

randomPrintableASCII

randomStringUTF8

fuzzBits

Syntax

fuzzBits([s], [prob])

Inverts bits of s, each with probability prob.

Arguments

  • s - String or FixedString

  • prob - constant Float32/64

Returned value Fuzzed string with same as s type.

Example

SELECT fuzzBits(materialize('abacaba'), 0.1)
FROM numbers(3)

Result:

┌─fuzzBits(materialize('abacaba'), 0.1)─┐
│ abaaaja                               │
│ a*cjab+                               │
│ aeca2A                                │
└───────────────────────────────────────┘

REPLACING IN STRINGS

NOTE

Functions for searching and other manipulations with strings are described separately.

replaceOne(haystack, pattern, replacement)

Replaces the first occurrence of the substring ‘pattern’ (if it exists) in ‘haystack’ by the ‘replacement’ string. ‘pattern’ and ‘replacement’ must be constants.

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

Replaces all occurrences of the substring ‘pattern’ in ‘haystack’ by the ‘replacement’ string.

replaceRegexpOne(haystack, pattern, replacement)

Replaces the first occurrence of the substring matching the regular expression ‘pattern’ in ‘haystack‘ by the ‘replacement‘ string. ‘pattern‘ must be a constant re2 regular expression. ‘replacement’ must be a plain constant string or a constant string containing substitutions \0-\9. Substitutions \1-\9 correspond to the 1st to 9th capturing group (submatch), substitution \0 corresponds to the entire match. To use a verbatim \ character in the ‘pattern‘ or ‘replacement‘ string, escape it using \. Also keep in mind that string literals require an extra escaping.

Example 1. Converting ISO dates to American format:

SELECT DISTINCT
    EventDate,
    replaceRegexpOne(toString(EventDate), '(\\d{4})-(\\d{2})-(\\d{2})', '\\2/\\3/\\1') AS res
FROM test.hits
LIMIT 7
FORMAT TabSeparated
2014-03-17      03/17/2014
2014-03-18      03/18/2014
2014-03-19      03/19/2014
2014-03-20      03/20/2014
2014-03-21      03/21/2014
2014-03-22      03/22/2014
2014-03-23      03/23/2014

Example 2. Copying a string ten times:

SELECT replaceRegexpOne('Hello, World!', '.*', '\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0') AS res
┌─res────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World! │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

replaceRegexpAll(haystack, pattern, replacement)

Like ‘replaceRegexpOne‘, but replaces all occurrences of the pattern. Example:

SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res
┌─res────────────────────────┐
│ HHeelllloo,,  WWoorrlldd!! │
└────────────────────────────┘

As an exception, if a regular expression worked on an empty substring, the replacement is not made more than once. Example:

SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
┌─res─────────────────┐
│ here: Hello, World! │
└─────────────────────┘

regexpQuoteMeta(s)

The function adds a backslash before some predefined characters in the string. Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -. This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0 instead of \x00 and it escapes only required characters. For more information, see the link: RE2

ROUNDING

floor(x[, N])

Returns the largest round number that is less than or equal to x. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N isn’t exact. ‘N’ is an integer constant, optional parameter. By default it is zero, which means to round to an integer. ‘N’ may be negative.

Examples: floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.

x is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative N value (for non-negative N, the function does not do anything). If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned.

ceil(x[, N]), ceiling(x[, N])

Returns the smallest round number that is greater than or equal to x. In every other way, it is the same as the floor function (see above).

trunc(x[, N]), truncate(x[, N])

Returns the round number with largest absolute value that has an absolute value less than or equal to x‘s. In every other way, it is the same as the ’floor’ function (see above).

round(x[, N])

Rounds a value to a specified number of decimal places.

The function returns the nearest number of the specified order. In case when given number has equal distance to surrounding numbers, the function uses banker’s rounding for float number types and rounds away from zero for the other number types (Decimal).

round(expression [, decimal_places])

Arguments

  • expression — A number to be rounded. Can be any expression returning the numeric data type.

  • decimal-places — An integer value.

    • If decimal-places > 0 then the function rounds the value to the right of the decimal point.

    • If decimal-places < 0 then the function rounds the value to the left of the decimal point.

    • If decimal-places = 0 then the function rounds the value to integer. In this case the argument can be omitted.

Returned value:

The rounded number of the same type as the input number.

Examples

Example of use with Float

SELECT number / 2 AS x, round(x) FROM system.numbers LIMIT 3
┌───x─┬─round(divide(number, 2))─┐
│   0 │                        0 │
│ 0.5 │                        0 │
│   1 │                        1 │
└─────┴──────────────────────────┘

Example of use with Decimal

SELECT cast(number / 2 AS  Decimal(10,4)) AS x, round(x) FROM system.numbers LIMIT 3
┌──────x─┬─round(CAST(divide(number, 2), 'Decimal(10, 4)'))─┐
│ 0.0000 │                                           0.0000 │
│ 0.5000 │                                           1.0000 │
│ 1.0000 │                                           1.0000 │
└────────┴──────────────────────────────────────────────────┘

Examples of rounding

Rounding to the nearest number.

round(3.2, 0) = 3
round(4.1267, 2) = 4.13
round(22,-1) = 20
round(467,-2) = 500
round(-467,-2) = -500

Banker’s rounding.

round(3.5) = 4
round(4.5) = 4
round(3.55, 1) = 3.6
round(3.65, 1) = 3.6

See Also

roundBankers

Rounds a number to a specified decimal position.

  • If the rounding number is halfway between two numbers, the function uses banker’s rounding.

    Banker's rounding is a method of rounding fractional numbers. When the rounding number is halfway between two numbers, it's rounded to the nearest even digit at the specified decimal position. For example: 3.5 rounds up to 4, 2.5 rounds down to 2.
    
    It's the default rounding method for floating point numbers defined in [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest). The [round](#rounding_functions-round) function performs the same rounding for floating point numbers. The `roundBankers` function also rounds integers the same way, for example, `roundBankers(45, -1) = 40`.
  • In other cases, the function rounds numbers to the nearest integer.

Using banker’s rounding, you can reduce the effect that rounding numbers has on the results of summing or subtracting these numbers.

For example, sum numbers 1.5, 2.5, 3.5, 4.5 with different rounding:

  • No rounding: 1.5 + 2.5 + 3.5 + 4.5 = 12.

  • Banker’s rounding: 2 + 2 + 4 + 4 = 12.

  • Rounding to the nearest integer: 2 + 3 + 4 + 5 = 14.

Syntax

roundBankers(expression [, decimal_places])

Arguments

  • expression — A number to be rounded. Can be any expression returning the numeric data type.

  • decimal-places — Decimal places. An integer number.

    • decimal-places > 0 — The function rounds the number to the given position right of the decimal point. Example: roundBankers(3.55, 1) = 3.6.

    • decimal-places < 0 — The function rounds the number to the given position left of the decimal point. Example: roundBankers(24.55, -1) = 20.

    • decimal-places = 0 — The function rounds the number to an integer. In this case the argument can be omitted. Example: roundBankers(2.5) = 2.

Returned value

A value rounded by the banker’s rounding method.

Examples

Example of use

Query:

 SELECT number / 2 AS x, roundBankers(x, 0) AS b fROM system.numbers limit 10

Result:

┌───x─┬─b─┐
│   0 │ 0 │
│ 0.5 │ 0 │
│   1 │ 1 │
│ 1.5 │ 2 │
│   2 │ 2 │
│ 2.5 │ 2 │
│   3 │ 3 │
│ 3.5 │ 4 │
│   4 │ 4 │
│ 4.5 │ 4 │
└─────┴───┘

Examples of Banker’s rounding

roundBankers(0.4) = 0
roundBankers(-3.5) = -4
roundBankers(4.5) = 4
roundBankers(3.55, 1) = 3.6
roundBankers(3.65, 1) = 3.6
roundBankers(10.35, 1) = 10.4
roundBankers(10.755, 2) = 10.76

See Also

roundToExp2(num)

Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two.

roundDuration(num)

Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to numbers from the set: 1, 10, 30, 60, 120, 180, 240, 300, 600, 1200, 1800, 3600, 7200, 18000, 36000.

roundAge(num)

Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55.

roundDown(num, arr)

Accepts a number and rounds it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned.

SEARCHING IN STRINGS

NOTE

Functions for replacing and other manipulations with strings are described separately.

position(haystack, needle), locate(haystack, needle)

Searches for the substring needle in the string haystack.

Returns the position (in bytes) of the found substring in the string, starting from 1.

For a case-insensitive search, use the function positionCaseInsensitive.

Syntax

position(haystack, needle[, start_pos])
position(needle IN haystack)

Alias: locate(haystack, needle[, start_pos]).

NOTE

Syntax of position(needle IN haystack) provides SQL-compatibility, the function works the same way as to position(haystack, needle).

Arguments

  • haystack — String, in which substring will to be searched. String.

  • needle — Substring to be searched. String.

  • start_pos – Position of the first character in the string to start search. UInt. Optional.

Returned values

  • Starting position in bytes (counting from 1), if substring was found.

  • 0, if the substring was not found.

Type: Integer.

Examples

The phrase “Hello, world!” contains a set of bytes representing a single-byte encoded text. The function returns some expected result:

Query:

SELECT position('Hello, world!', '!');

Result:

┌─position('Hello, world!', '!')─┐
│                             13 │
└────────────────────────────────┘
SELECT
    position('Hello, world!', 'o', 1),
    position('Hello, world!', 'o', 7)
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│                                 5 │                                 9 │
└───────────────────────────────────┴───────────────────────────────────┘

The same phrase in Russian contains characters which can’t be represented using a single byte. The function returns some unexpected result (use positionUTF8 function for multi-byte encoded text):

Query:

SELECT position('Привет, мир!', '!');

Result:

┌─position('Привет, мир!', '!')─┐
│                            21 │
└───────────────────────────────┘

Examples for POSITION(needle IN haystack) syntax

Query:

SELECT 3 = position('c' IN 'abc');

Result:

┌─equals(3, position('abc', 'c'))─┐
│                               1 │
└─────────────────────────────────┘

Query:

SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);

Result:

┌─equals(6, position(s, '/'))─┐
│                           1 │
└─────────────────────────────┘

positionCaseInsensitive

The same as position returns the position (in bytes) of the found substring in the string, starting from 1. Use the function for a case-insensitive search.

Works under the assumption that the string contains a set of bytes representing a single-byte encoded text. If this assumption is not met and a character can’t be represented using a single byte, the function does not throw an exception and returns some unexpected result. If character can be represented using two bytes, it will use two bytes and so on.

Syntax

positionCaseInsensitive(haystack, needle[, start_pos])

Arguments

  • haystack — String, in which substring will to be searched. String.

  • needle — Substring to be searched. String.

  • start_pos — Optional parameter, position of the first character in the string to start search. UInt.

Returned values

  • Starting position in bytes (counting from 1), if substring was found.

  • 0, if the substring was not found.

Type: Integer.

Example

Query:

SELECT positionCaseInsensitive('Hello, world!', 'hello');

Result:

┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│                                                 1 │
└───────────────────────────────────────────────────┘

positionUTF8

Returns the position (in Unicode points) of the found substring in the string, starting from 1.

Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.

For a case-insensitive search, use the function positionCaseInsensitiveUTF8.

Syntax

positionUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — String, in which substring will to be searched. String.

  • needle — Substring to be searched. String.

  • start_pos — Optional parameter, position of the first character in the string to start search. UInt

Returned values

  • Starting position in Unicode points (counting from 1), if substring was found.

  • 0, if the substring was not found.

Type: Integer.

Examples

The phrase “Hello, world!” in Russian contains a set of Unicode points representing a single-point encoded text. The function returns some expected result:

Query:

SELECT positionUTF8('Привет, мир!', '!');

Result:

┌─positionUTF8('Привет, мир!', '!')─┐
│                                12 │
└───────────────────────────────────┘

The phrase “Salut, étudiante!”, where character é can be represented using a one point (U+00E9) or two points (U+0065U+0301) the function can be returned some unexpected result:

Query for the letter é, which is represented one Unicode point U+00E9:

SELECT positionUTF8('Salut, étudiante!', '!');

Result:

┌─positionUTF8('Salut, étudiante!', '!')─┐
│                                     17 │
└────────────────────────────────────────┘

Query for the letter é, which is represented two Unicode points U+0065U+0301:

SELECT positionUTF8('Salut, étudiante!', '!');

Result:

┌─positionUTF8('Salut, étudiante!', '!')─┐
│                                     18 │
└────────────────────────────────────────┘

positionCaseInsensitiveUTF8

The same as positionUTF8, but is case-insensitive. Returns the position (in Unicode points) of the found substring in the string, starting from 1.

Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.

Syntax

positionCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — String, in which substring will to be searched. String.

  • needle — Substring to be searched. String.

  • start_pos — Optional parameter, position of the first character in the string to start search. UInt

Returned value

  • Starting position in Unicode points (counting from 1), if substring was found.

  • 0, if the substring was not found.

Type: Integer.

Example

Query:

SELECT positionCaseInsensitiveUTF8('Привет, мир!', 'Мир');

Result:

┌─positionCaseInsensitiveUTF8('Привет, мир!', 'Мир')─┐
│                                                  9 │
└────────────────────────────────────────────────────┘

multiSearchAllPositions

The same as position but returns Array of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1.

The search is performed on sequences of bytes without respect to string encoding and collation.

  • For case-insensitive ASCII search, use the function multiSearchAllPositionsCaseInsensitive.

  • For search in UTF-8, use the function multiSearchAllPositionsUTF8.

  • For case-insensitive UTF-8 search, use the function multiSearchAllPositionsCaseInsensitiveUTF8.

Syntax

multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])

Arguments

  • haystack — String, in which substring will to be searched. String.

  • needle — Substring to be searched. String.

Returned values

  • Array of starting positions in bytes (counting from 1), if the corresponding substring was found and 0 if not found.

Example

Query:

SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);

Result:

┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0]                                                          │
└───────────────────────────────────────────────────────────────────┘

multiSearchAllPositionsUTF8

See multiSearchAllPositions.

multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])

The same as position but returns the leftmost offset of the string haystack that is matched to some of the needles.

For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8.

multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])

Returns the index i (starting from 1) of the leftmost found needlei in the string haystack and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8.

multiSearchAny(haystack, [needle1, needle2, …, needlen])

Returns 1, if at least one string needlei matches the string haystack and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8.

NOTE

In all multiSearch* functions the number of needles should be less than 28 because of implementation specification.

match(haystack, pattern)

Checks whether the string matches the regular expression pattern in re2 syntax. Re2 has a more limited syntax than Perl regular expressions.

Returns 0 if it does not match, or 1 if it matches.

Matching is based on UTF-8, e.g. . matches the Unicode code point ¥ which is represented in UTF-8 using two bytes. The regular expression must not contain null bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.

For patterns to search for substrings in a string, it is better to use LIKE or ‘position’, since they work much faster.

multiMatchAny(haystack, [pattern1, pattern2, …, patternn])

The same as match, but returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. It uses hyperscan library. For patterns to search substrings in a string, it is better to use multiSearchAny since it works much faster.

NOTE

The length of any of the haystack string must be less than 232 bytes otherwise the exception is thrown. This restriction takes place because of hyperscan API.

multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])

The same as multiMatchAny, but returns any index that matches the haystack.

multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])

The same as multiMatchAny, but returns the array of all indices that match the haystack in any order.

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])

The same as multiMatchAny, but returns 1 if any pattern matches the haystack within a constant edit distance. This function relies on the experimental feature of hyperscan library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])

The same as multiFuzzyMatchAny, but returns any index that matches the haystack within a constant edit distance.

multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])

The same as multiFuzzyMatchAny, but returns the array of all indices in any order that match the haystack within a constant edit distance.

NOTE

multiFuzzyMatch* functions do not support UTF-8 regular expressions, and such expressions are treated as bytes because of hyperscan restriction.

NOTE

To turn off all functions that use hyperscan, use setting SET allow_hyperscan = 0;.

extract(haystack, pattern)

Extracts a fragment of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.

extractAll(haystack, pattern)

Extracts all the fragments of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. Returns an array of strings consisting of all matches to the regex. In general, the behavior is the same as the ‘extract’ function (it takes the first subpattern, or the entire expression if there isn’t a subpattern).

extractAllGroupsHorizontal

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.

NOTE

extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.

Syntax

extractAllGroupsHorizontal(haystack, pattern)

Arguments

  • haystack — Input string. Type: String.

  • pattern — Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown. Type: String.

Returned value

If haystack does not match the pattern regex, an array of empty arrays is returned.

Example

Query:

SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']]                                                │
└──────────────────────────────────────────────────────────────────────────────────────────┘

See Also

extractAllGroupsVertical

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack.

Syntax

extractAllGroupsVertical(haystack, pattern)

Arguments

  • haystack — Input string. Type: String.

  • pattern — Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown. Type: String.

Returned value

If haystack does not match the pattern regex, an empty array is returned.

Example

Query:

SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']]                                            │
└────────────────────────────────────────────────────────────────────────────────────────┘

See Also

like(haystack, pattern), haystack LIKE pattern operator

Checks whether a string matches a simple regular expression. The regular expression can contain the metasymbols % and _.

% indicates any quantity of any bytes (including zero characters).

_ indicates any one byte.

Use the backslash (\) for escaping metasymbols. See the note on escaping in the description of the ‘match’ function.

Matching is based on UTF-8, e.g. _ matches the Unicode code point ¥ which is represented in UTF-8 using two bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.

For regular expressions like %needle%, the code is more optimal and works as fast as the position function. For other regular expressions, the code is the same as for the ‘match’ function.

notLike(haystack, pattern), haystack NOT LIKE pattern operator

The same thing as ‘like’, but negative.

ilike

Case insensitive variant of like function. You can use ILIKE operator instead of the ilike function.

The function ignores the language, e.g. for Turkish (i/İ), the result might be incorrect.

Syntax

ilike(haystack, pattern)

Arguments

  • haystack — Input string. String.

  • pattern — If pattern does not contain percent signs or underscores, then the pattern only represents the string itself. An underscore (_) in pattern stands for (matches) any single character. A percent sign (%) matches any sequence of zero or more characters.

Some pattern examples:

'abc' ILIKE 'abc'    true
'abc' ILIKE 'a%'     true
'abc' ILIKE '_b_'    true
'abc' ILIKE 'c'      false

Returned values

  • True, if the string matches pattern.

  • False, if the string does not match pattern.

Example

Input table:

┌─id─┬─name─────┬─days─┐
│  1 │ January  │   31 │
│  2 │ February │   29 │
│  3 │ March    │   31 │
│  4 │ April    │   30 │
└────┴──────────┴──────┘

Query:

SELECT * FROM Months WHERE ilike(name, '%j%');

Result:

┌─id─┬─name────┬─days─┐
│  1 │ January │   31 │
└────┴─────────┴──────┘

See Also

ngramDistance(haystack, needle)

Calculates the 4-gram distance between haystack and needle: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 – the closer to zero, the more strings are similar to each other. If the constant needle or haystack is more than 32Kb, throws an exception. If some of the non-constant haystack or needle strings are more than 32Kb, the distance is always one.

For case-insensitive search or/and in UTF-8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8.

ngramSearch(haystack, needle)

Same as ngramDistance but calculates the non-symmetric difference between needle and haystack – the number of n-grams from needle minus the common number of n-grams normalized by the number of needle n-grams. The closer to one, the more likely needle is in the haystack. Can be useful for fuzzy string search.

For case-insensitive search or/and in UTF-8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8.

NOTE

For UTF-8 case we use 3-gram distance. All these are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair tolower function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.

countSubstrings

Returns the number of substring occurrences.

For a case-insensitive search, use countSubstringsCaseInsensitive or countSubstringsCaseInsensitiveUTF8 functions.

Syntax

countSubstrings(haystack, needle[, start_pos])

Arguments

  • haystack — The string to search in. String.

  • needle — The substring to search for. String.

  • start_pos – Position of the first character in the string to start search. Optional. UInt.

Returned values

  • Number of occurrences.

Type: UInt64.

Examples

Query:

SELECT countSubstrings('foobar.com', '.');

Result:

┌─countSubstrings('foobar.com', '.')─┐
│                                  1 │
└────────────────────────────────────┘

Query:

SELECT countSubstrings('aaaa', 'aa');

Result:

┌─countSubstrings('aaaa', 'aa')─┐
│                             2 │
└───────────────────────────────┘

Query:

SELECT countSubstrings('abc___abc', 'abc', 4);

Result:

┌─countSubstrings('abc___abc', 'abc', 4)─┐
│                                      1 │
└────────────────────────────────────────┘

countSubstringsCaseInsensitive

Returns the number of substring occurrences case-insensitive.

Syntax

countSubstringsCaseInsensitive(haystack, needle[, start_pos])

Arguments

  • haystack — The string to search in. String.

  • needle — The substring to search for. String.

  • start_pos — Position of the first character in the string to start search. Optional. UInt.

Returned values

  • Number of occurrences.

Type: UInt64.

Examples

Query:

SELECT countSubstringsCaseInsensitive('aba', 'B');

Result:

┌─countSubstringsCaseInsensitive('aba', 'B')─┐
│                                          1 │
└────────────────────────────────────────────┘

Query:

SELECT countSubstringsCaseInsensitive('foobar.com', 'CoM');

Result:

┌─countSubstringsCaseInsensitive('foobar.com', 'CoM')─┐
│                                                   1 │
└─────────────────────────────────────────────────────┘

Query:

SELECT countSubstringsCaseInsensitive('abC___abC', 'aBc', 2);

Result:

┌─countSubstringsCaseInsensitive('abC___abC', 'aBc', 2)─┐
│                                                     1 │
└───────────────────────────────────────────────────────┘

countSubstringsCaseInsensitiveUTF8

Returns the number of substring occurrences in UTF-8 case-insensitive.

Syntax

SELECT countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — The string to search in. String.

  • needle — The substring to search for. String.

  • start_pos — Position of the first character in the string to start search. Optional. UInt.

Returned values

  • Number of occurrences.

Type: UInt64.

Examples

Query:

SELECT countSubstringsCaseInsensitiveUTF8('абв', 'A');

Result:

┌─countSubstringsCaseInsensitiveUTF8('абв', 'A')─┐
│                                              1 │
└────────────────────────────────────────────────┘

Query:

SELECT countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв');

Result:

┌─countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв')─┐
│                                                          3 │
└────────────────────────────────────────────────────────────┘

countMatches(haystack, pattern)

Returns the number of regular expression matches for a pattern in a haystack.

Syntax

countMatches(haystack, pattern)

Arguments

Returned value

  • The number of matches.

Type: UInt64.

Examples

Query:

SELECT countMatches('foobar.com', 'o+');

Result:

┌─countMatches('foobar.com', 'o+')─┐
│                                2 │
└──────────────────────────────────┘

Query:

SELECT countMatches('aaaa', 'aa');

Result:

┌─countMatches('aaaa', 'aa')────┐
│                             2 │
└───────────────────────────────┘

SPLITTING AND MERGING

splitByChar(separator, s[, max_substrings])

Splits a string into substrings separated by a specified character. It uses a constant string separator which consists of exactly one character. Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.

Syntax

splitByChar(separator, s[, max_substrings]))

Arguments

  • separator — The separator which should contain exactly one character. String.

  • s — The string to split. String.

  • max_substrings — An optional Int64 defaulting to 0. When max_substrings > 0, the returned substrings will be no more than max_substrings, otherwise the function will return as many substrings as possible.

Returned value(s)

Returns an array of selected substrings. Empty substrings may be selected when:

  • A separator occurs at the beginning or end of the string;

  • There are multiple consecutive separators;

  • The original string s is empty.

Type: Array(String).

Example

SELECT splitByChar(',', '1,2,3,abcde');
┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde']           │
└─────────────────────────────────┘

splitByString(separator, s[, max_substrings])

Splits a string into substrings separated by a string. It uses a constant string separator of multiple characters as the separator. If the string separator is empty, it will split the string s into an array of single characters.

Syntax

splitByString(separator, s[, max_substrings]))

Arguments

  • separator — The separator. String.

  • s — The string to split. String.

  • max_substrings — An optional Int64 defaulting to 0. When max_substrings > 0, the returned substrings will be no more than max_substrings, otherwise the function will return as many substrings as possible.

Returned value(s)

Returns an array of selected substrings. Empty substrings may be selected when:

Type: Array(String).

  • A non-empty separator occurs at the beginning or end of the string;

  • There are multiple consecutive non-empty separators;

  • The original string s is empty while the separator is not empty.

Example

SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde']                 │
└───────────────────────────────────────────┘
SELECT splitByString('', 'abcde');
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘

arrayStringConcat(arr[, separator])

Concatenates string representations of values listed in the array with the separator. separator is an optional parameter: a constant string, set to an empty string by default. Returns the string.

alphaTokens(s[, max_substrings]), splitByAlpha(s[, max_substrings])

Selects substrings of consecutive bytes from the ranges a-z and A-Z.Returns an array of substrings.

Syntax

alphaTokens(s[, max_substrings]))
splitByAlpha(s[, max_substrings])

Arguments

  • s — The string to split. String.

  • max_substrings — An optional Int64 defaulting to 0. When max_substrings > 0, the returned substrings will be no more than max_substrings, otherwise the function will return as many substrings as possible.

Returned value(s)

Returns an array of selected substrings.

Type: Array(String).

Example

SELECT alphaTokens('abca1abc');
┌─alphaTokens('abca1abc')─┐
│ ['abca','abc']          │
└─────────────────────────┘

extractAllGroups(text, regexp)

Extracts all groups from non-overlapping substrings matched by a regular expression.

Syntax

extractAllGroups(text, regexp)

Arguments

Returned values

  • If the function finds at least one matching group, it returns Array(Array(String)) column, clustered by group_id (1 to N, where N is number of capturing groups in regexp).

  • If there is no matching group, returns an empty array.

Type: Array.

Example

Query:

SELECT extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','123'],['8','"hkl"']]                                         │
└───────────────────────────────────────────────────────────────────────┘

STRINGS

NOTE

Functions for searching and replacing in strings are described separately.

empty

Checks whether the input string is empty.

Syntax

empty(x)

A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.

The function also works for arrays or UUID.

Arguments

Returned value

  • Returns 1 for an empty string or 0 for a non-empty string.

Type: UInt8.

Example

Query:

SELECT empty('');

Result:

┌─empty('')─┐
│         1 │
└───────────┘

notEmpty

Checks whether the input string is non-empty.

Syntax

notEmpty(x)

A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.

The function also works for arrays or UUID.

Arguments

Returned value

  • Returns 1 for a non-empty string or 0 for an empty string string.

Type: UInt8.

Example

Query:

SELECT notEmpty('text');

Result:

┌─notEmpty('text')─┐
│                1 │
└──────────────────┘

length

Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64. The function also works for arrays.

lengthUTF8

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.

char_length, CHAR_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.

character_length, CHARACTER_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.

leftPad

Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD function.

Syntax

leftPad('string', 'length'[, 'pad_string'])

Arguments

  • string — Input string that needs to be padded. String.

  • length — The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.

  • pad_string — The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.

Returned value

  • The resulting string of the given length.

Type: String.

Example

Query:

SELECT leftPad('abc', 7, '*'), leftPad('def', 7);

Result:

┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
│ ****abc                │     def           │
└────────────────────────┴───────────────────┘

leftPadUTF8

Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD function. While in the leftPad function the length is measured in bytes, here in the leftPadUTF8 function it is measured in code points.

Syntax

leftPadUTF8('string','length'[, 'pad_string'])

Arguments

  • string — Input string that needs to be padded. String.

  • length — The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.

  • pad_string — The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.

Returned value

  • The resulting string of the given length.

Type: String.

Example

Query:

SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7);

Result:

┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
│ ***абвг                     │    дежз                │
└─────────────────────────────┴────────────────────────┘

rightPad

Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD function.

Syntax

rightPad('string', 'length'[, 'pad_string'])

Arguments

  • string — Input string that needs to be padded. String.

  • length — The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.

  • pad_string — The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.

Returned value

  • The resulting string of the given length.

Type: String.

Example

Query:

SELECT rightPad('abc', 7, '*'), rightPad('abc', 7);

Result:

┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc****                 │ abc                │
└─────────────────────────┴────────────────────┘

rightPadUTF8

Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD function. While in the rightPad function the length is measured in bytes, here in the rightPadUTF8 function it is measured in code points.

Syntax

rightPadUTF8('string','length'[, 'pad_string'])

Arguments

  • string — Input string that needs to be padded. String.

  • length — The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.

  • pad_string — The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.

Returned value

  • The resulting string of the given length.

Type: String.

Example

Query:

SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7);

Result:

┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг***                      │ абвг                    │
└──────────────────────────────┴─────────────────────────┘

lower, lcase

Converts ASCII Latin symbols in a string to lowercase.

upper, ucase

Converts ASCII Latin symbols in a string to uppercase.

lowerUTF8

Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.

upperUTF8

Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.

isValidUTF8

Returns 1, if the set of bytes is valid UTF-8 encoded, otherwise 0.

toValidUTF8

Replaces invalid UTF-8 characters by the (U+FFFD) character. All running in a row invalid characters are collapsed into the one replacement character.

toValidUTF8(input_string)

Arguments

  • input_string — Any set of bytes represented as the String data type object.

Returned value: Valid UTF-8 string.

Example

SELECT toValidUTF8('\x61\xF0\x80\x80\x80b');
┌─toValidUTF8('a����b')─┐
│ a�b                   │
└───────────────────────┘

repeat

Repeats a string as many times as specified and concatenates the replicated values as a single string.

Alias: REPEAT.

Syntax

repeat(s, n)

Arguments

  • s — The string to repeat. String.

  • n — The number of times to repeat the string. UInt.

Returned value

The single string, which contains the string s repeated n times. If n \< 1, the function returns empty string.

Type: String.

Example

Query:

SELECT repeat('abc', 10);

Result:

┌─repeat('abc', 10)──────────────┐
│ abcabcabcabcabcabcabcabcabcabc │
└────────────────────────────────┘

reverse

Reverses the string (as a sequence of bytes).

reverseUTF8

Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does something else (it does not throw an exception).

format(pattern, s0, s1, …)

Formatting constant pattern with the string listed in the arguments. pattern is a simplified Python format pattern. Format string contains “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ '{{' }} and {{ '}}' }}. Field names can be numbers (starting from zero) or empty (then they are treated as consequence numbers).

SELECT format('{1} {0} {1}', 'World', 'Hello')
┌─format('{1} {0} {1}', 'World', 'Hello')─┐
│ Hello World Hello                       │
└─────────────────────────────────────────┘
SELECT format('{} {}', 'Hello', 'World')
┌─format('{} {}', 'Hello', 'World')─┐
│ Hello World                       │
└───────────────────────────────────┘

concat

Concatenates the strings listed in the arguments, without a separator.

Syntax

concat(s1, s2, ...)

Arguments

Values of type String or FixedString.

Returned values

Returns the String that results from concatenating the arguments.

If any of argument values is NULL, concat returns NULL.

Example

Query:

SELECT concat('Hello, ', 'World!');

Result:

┌─concat('Hello, ', 'World!')─┐
│ Hello, World!               │
└─────────────────────────────┘

concatAssumeInjective

Same as concat, the difference is that you need to ensure that concat(s1, s2, ...) → sn is injective, it will be used for optimization of GROUP BY.

The function is named “injective” if it always returns different result for different values of arguments. In other words: different arguments never yield identical result.

Syntax

concatAssumeInjective(s1, s2, ...)

Arguments

Values of type String or FixedString.

Returned values

Returns the String that results from concatenating the arguments.

If any of argument values is NULL, concatAssumeInjective returns NULL.

Example

Input table:

CREATE TABLE key_val(`key1` String, `key2` String, `value` UInt32) ENGINE = TinyLog;
INSERT INTO key_val VALUES ('Hello, ','World',1), ('Hello, ','World',2), ('Hello, ','World!',3), ('Hello',', World!',2);
SELECT * from key_val;
┌─key1────┬─key2─────┬─value─┐
│ Hello,  │ World    │     1 │
│ Hello,  │ World    │     2 │
│ Hello,  │ World!   │     3 │
│ Hello   │ , World! │     2 │
└─────────┴──────────┴───────┘

Query:

SELECT concat(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2);

Result:

┌─concat(key1, key2)─┬─sum(value)─┐
│ Hello, World!      │          3 │
│ Hello, World!      │          2 │
│ Hello, World       │          3 │
└────────────────────┴────────────┘

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

Returns a substring starting with the byte from the ‘offset’ index that is ‘length’ bytes long. Character indexing starts from one (as in standard SQL).

substringUTF8(s, offset, length)

The same as ‘substring’, but for Unicode code points. Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception).

appendTrailingCharIfAbsent(s, c)

If the ‘s’ string is non-empty and does not contain the ‘c’ character at the end, it appends the ‘c’ character to the end.

convertCharset(s, from, to)

Returns the string ‘s’ that was converted from the encoding in ‘from’ to the encoding in ‘to’.

base58Encode(plaintext)

Accepts a String and encodes it using Base58 encoding scheme using "Bitcoin" alphabet.

Syntax

base58Encode(plaintext)

Arguments

  • plaintextString column or constant.

Returned value

  • A string containing encoded value of 1st argument.

Type: String.

Example

Query:

SELECT base58Encode('Encoded');

Result:

┌─base58Encode('Encoded')─┐
│ 3dc8KtHrwM              │
└─────────────────────────┘

base64Encode(s)

Encodes ‘s’ FixedString or String into base64.

Alias: TO_BASE64.

base64Decode(s)

Decode base64-encoded FixedString or String ‘s’ into original string. In case of failure raises an exception.

Alias: FROM_BASE64.

tryBase64Decode(s)

Similar to base64Decode, but returns an empty string in case of error.

endsWith(s, suffix)

Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.

startsWith(str, prefix)

Returns 1 whether string starts with the specified prefix, otherwise it returns 0.

SELECT startsWith('Spider-Man', 'Spi');

Returned values

  • 1, if the string starts with the specified prefix.

  • 0, if the string does not start with the specified prefix.

Example

Query:

SELECT startsWith('Hello, world!', 'He');

Result:

┌─startsWith('Hello, world!', 'He')─┐
│                                 1 │
└───────────────────────────────────┘

trim

Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.

Syntax

trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

Arguments

  • trim_character — Specified characters for trim. String.

  • input_string — String for trim. String.

Returned value

A string without leading and (or) trailing specified characters.

Type: String.

Example

Query:

SELECT trim(BOTH ' ()' FROM '(   Hello, world!   )');

Result:

┌─trim(BOTH ' ()' FROM '(   Hello, world!   )')─┐
│ Hello, world!                                 │
└───────────────────────────────────────────────┘

trimLeft

Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).

Syntax

trimLeft(input_string)

Alias: ltrim(input_string).

Arguments

  • input_string — string to trim. String.

Returned value

A string without leading common whitespaces.

Type: String.

Example

Query:

SELECT trimLeft('     Hello, world!     ');

Result:

┌─trimLeft('     Hello, world!     ')─┐
│ Hello, world!                       │
└─────────────────────────────────────┘

trimRight

Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).

Syntax

trimRight(input_string)

Alias: rtrim(input_string).

Arguments

  • input_string — string to trim. String.

Returned value

A string without trailing common whitespaces.

Type: String.

Example

Query:

SELECT trimRight('     Hello, world!     ');

Result:

┌─trimRight('     Hello, world!     ')─┐
│      Hello, world!                   │
└──────────────────────────────────────┘

trimBoth

Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).

Syntax

trimBoth(input_string)

Alias: trim(input_string).

Arguments

  • input_string — string to trim. String.

Returned value

A string without leading and trailing common whitespaces.

Type: String.

Example

Query:

SELECT trimBoth('     Hello, world!     ');

Result:

┌─trimBoth('     Hello, world!     ')─┐
│ Hello, world!                       │
└─────────────────────────────────────┘

CRC32(s)

Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial and initial value 0xffffffff (zlib implementation).

The result type is UInt32.

CRC32IEEE(s)

Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial.

The result type is UInt32.

CRC64(s)

Returns the CRC64 checksum of a string, using CRC-64-ECMA polynomial.

The result type is UInt64.

normalizeQuery

Replaces literals, sequences of literals and complex aliases with placeholders.

Syntax

normalizeQuery(x)

Arguments

  • x — Sequence of characters. String.

Returned value

  • Sequence of characters with placeholders.

Type: String.

Example

Query:

SELECT normalizeQuery('[1, 2, 3, x]') AS query;

Result:

┌─query────┐
│ [?.., x] │
└──────────┘

normalizedQueryHash

Returns identical 64bit hash values without the values of literals for similar queries. It helps to analyze query log.

Syntax

normalizedQueryHash(x)

Arguments

  • x — Sequence of characters. String.

Returned value

  • Hash value.

Type: UInt64.

Example

Query:

SELECT normalizedQueryHash('SELECT 1 AS `xyz`') != normalizedQueryHash('SELECT 1 AS `abc`') AS res;

Result:

┌─res─┐
│   1 │
└─────┘

normalizeUTF8NFC

Converts a string to NFC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.

Syntax

normalizeUTF8NFC(words)

Arguments

  • words — Input string that contains UTF-8 encoded text. String.

Returned value

  • String transformed to NFC normalization form.

Type: String.

Example

Query:

SELECT length('â'), normalizeUTF8NFC('â') AS nfc, length(nfc) AS nfc_len;

Result:

┌─length('â')─┬─nfc─┬─nfc_len─┐
│           2 │ â   │       2 │
└─────────────┴─────┴─────────┘

normalizeUTF8NFD

Converts a string to NFD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.

Syntax

normalizeUTF8NFD(words)

Arguments

  • words — Input string that contains UTF-8 encoded text. String.

Returned value

  • String transformed to NFD normalization form.

Type: String.

Example

Query:

SELECT length('â'), normalizeUTF8NFD('â') AS nfd, length(nfd) AS nfd_len;

Result:

┌─length('â')─┬─nfd─┬─nfd_len─┐
│           2 │ â   │       3 │
└─────────────┴─────┴─────────┘

normalizeUTF8NFKC

Converts a string to NFKC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.

Syntax

normalizeUTF8NFKC(words)

Arguments

  • words — Input string that contains UTF-8 encoded text. String.

Returned value

  • String transformed to NFKC normalization form.

Type: String.

Example

Query:

SELECT length('â'), normalizeUTF8NFKC('â') AS nfkc, length(nfkc) AS nfkc_len;

Result:

┌─length('â')─┬─nfkc─┬─nfkc_len─┐
│           2 │ â    │        2 │
└─────────────┴──────┴──────────┘

normalizeUTF8NFKD

Converts a string to NFKD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.

Syntax

normalizeUTF8NFKD(words)

Arguments

  • words — Input string that contains UTF-8 encoded text. String.

Returned value

  • String transformed to NFKD normalization form.

Type: String.

Example

Query:

SELECT length('â'), normalizeUTF8NFKD('â') AS nfkd, length(nfkd) AS nfkd_len;

Result:

┌─length('â')─┬─nfkd─┬─nfkd_len─┐
│           2 │ â    │        3 │
└─────────────┴──────┴──────────┘

encodeXMLComponent

Escapes characters to place string into XML text node or attribute.

The following five XML predefined entities will be replaced: <, &, >, ", '.

Syntax

encodeXMLComponent(x)

Arguments

  • x — The sequence of characters. String.

Returned value

  • The sequence of characters with escape characters.

Type: String.

Example

Query:

SELECT encodeXMLComponent('Hello, "world"!');
SELECT encodeXMLComponent('<123>');
SELECT encodeXMLComponent('&clickhouse');
SELECT encodeXMLComponent('\'foo\'');

Result:

Hello, &quot;world&quot;!
&lt;123&gt;
&amp;clickhouse
&apos;foo&apos;

decodeXMLComponent

Replaces XML predefined entities with characters. Predefined entities are &quot; &amp; &apos; &gt; &lt; This function also replaces numeric character references with Unicode characters. Both decimal (like &#10003;) and hexadecimal (&#x2713;) forms are supported.

Syntax

decodeXMLComponent(x)

Arguments

  • x — A sequence of characters. String.

Returned value

  • The sequence of characters after replacement.

Type: String.

Example

Query:

SELECT decodeXMLComponent('&apos;foo&apos;');
SELECT decodeXMLComponent('&lt; &#x3A3; &gt;');

Result:

'foo'
< Σ >

See Also

extractTextFromHTML

A function to extract text from HTML or XHTML. It does not necessarily 100% conform to any of the HTML, XML or XHTML standards, but the implementation is reasonably accurate and it is fast. The rules are the following:

  1. Comments are skipped. Example: <!-- test -->. Comment must end with -->. Nested comments are not possible. Note: constructions like <!--> and <!---> are not valid comments in HTML but they are skipped by other rules.

  2. CDATA is pasted verbatim. Note: CDATA is XML/XHTML specific. But it is processed for "best-effort" approach.

  3. script and style elements are removed with all their content. Note: it is assumed that closing tag cannot appear inside content. For example, in JS string literal has to be escaped like "<\/script>". Note: comments and CDATA are possible inside script or style - then closing tags are not searched inside CDATA. Example: <script><![CDATA[</script>]]></script>. But they are still searched inside comments. Sometimes it becomes complicated: <script>var x = "<!--"; </script> var y = "-->"; alert(x + y);</script> Note: script and style can be the names of XML namespaces - then they are not treated like usual script or style elements. Example: <script:a>Hello</script:a>. Note: whitespaces are possible after closing tag name: </script > but not before: < / script>.

  4. Other tags or tag-like elements are skipped without inner content. Example: <a>.</a> Note: it is expected that this HTML is illegal: <a test=">"></a> Note: it also skips something like tags: <>, <!>, etc. Note: tag without end is skipped to the end of input: <hello

  5. HTML and XML entities are not decoded. They must be processed by separate function.

  6. Whitespaces in the text are collapsed or inserted by specific rules.

    • Whitespaces at the beginning and at the end are removed.

    • Consecutive whitespaces are collapsed.

    • But if the text is separated by other elements and there is no whitespace, it is inserted.

    • It may cause unnatural examples: Hello<b>world</b>, Hello<!-- -->world - there is no whitespace in HTML, but the function inserts it. Also consider: Hello<p>world</p>, Hello<br>world. This behavior is reasonable for data analysis, e.g. to convert HTML to a bag of words.

  7. Also note that correct handling of whitespaces requires the support of <pre></pre> and CSS display and white-space properties.

Syntax

extractTextFromHTML(x)

Arguments

Returned value

  • Extracted text.

Type: String.

Example

The first example contains several tags and a comment and also shows whitespace processing. The second example shows CDATA and script tag processing. In the third example text is extracted from the full HTML response received by the url function.

Query:

SELECT extractTextFromHTML(' <p> A text <i>with</i><b>tags</b>. <!-- comments --> </p> ');
SELECT extractTextFromHTML('<![CDATA[The content within <b>CDATA</b>]]> <script>alert("Script");</script>');
SELECT extractTextFromHTML(html) FROM url('http://www.donothingfor2minutes.com/', RawBLOB, 'html String');

Result:

A text with tags .
The content within <b>CDATA</b>
Do Nothing for 2 Minutes 2:00 &nbsp;

TUPLES

tuple

A function that allows grouping multiple columns. For columns with the types T1, T2, …, it returns a Tuple(T1, T2, …) type tuple containing these columns. There is no cost to execute the function. Tuples are normally used as intermediate values for an argument of IN operators, or for creating a list of formal parameters of lambda functions. Tuples can’t be written to a table.

The function implements the operator (x, y, …).

Syntax

tuple(x, y, …)

tupleElement

A function that allows getting a column from a tuple. ‘N’ is the column index, starting from 1. ‘N’ must be a constant. ‘N’ must be a strict postive integer no greater than the size of the tuple. There is no cost to execute the function.

The function implements the operator x.N.

Syntax

tupleElement(tuple, n)

untuple

Performs syntactic substitution of tuple elements in the call location.

Syntax

untuple(x)

You can use the EXCEPT expression to skip columns as a result of the query.

Arguments

  • x — A tuple function, column, or tuple of elements. Tuple.

Returned value

  • None.

Examples

Input table:

┌─key─┬─v1─┬─v2─┬─v3─┬─v4─┬─v5─┬─v6────────┐
│   1 │ 10 │ 20 │ 40 │ 30 │ 15 │ (33,'ab') │
│   2 │ 25 │ 65 │ 70 │ 40 │  6 │ (44,'cd') │
│   3 │ 57 │ 30 │ 20 │ 10 │  5 │ (55,'ef') │
│   4 │ 55 │ 12 │  7 │ 80 │ 90 │ (66,'gh') │
│   5 │ 30 │ 50 │ 70 │ 25 │ 55 │ (77,'kl') │
└─────┴────┴────┴────┴────┴────┴───────────┘

Example of using a Tuple-type column as the untuple function parameter:

Query:

SELECT untuple(v6) FROM kv;

Result:

┌─_ut_1─┬─_ut_2─┐
│    33 │ ab    │
│    44 │ cd    │
│    55 │ ef    │
│    66 │ gh    │
│    77 │ kl    │
└───────┴───────┘

Note: the names are implementation specific and are subject to change. You should not assume specific names of the columns after application of the untuple.

Example of using an EXCEPT expression:

Query:

SELECT untuple((* EXCEPT (v2, v3),)) FROM kv;

Result:

┌─key─┬─v1─┬─v4─┬─v5─┬─v6────────┐
│   1 │ 10 │ 30 │ 15 │ (33,'ab') │
│   2 │ 25 │ 40 │  6 │ (44,'cd') │
│   3 │ 57 │ 10 │  5 │ (55,'ef') │
│   4 │ 55 │ 80 │ 90 │ (66,'gh') │
│   5 │ 30 │ 25 │ 55 │ (77,'kl') │
└─────┴────┴────┴────┴───────────┘

See Also

tupleHammingDistance

Returns the Hamming Distance between two tuples of the same size.

Syntax

tupleHammingDistance(tuple1, tuple2)

Arguments

  • tuple1 — First tuple. Tuple.

  • tuple2 — Second tuple. Tuple.

Tuples should have the same type of the elements.

Returned value

  • The Hamming distance.

Type: The result type is calculed the same way it is for Arithmetic functions, based on the number of elements in the input tuples.

SELECT
    toTypeName(tupleHammingDistance(tuple(0), tuple(0))) AS t1,
    toTypeName(tupleHammingDistance((0, 0), (0, 0))) AS t2,
    toTypeName(tupleHammingDistance((0, 0, 0), (0, 0, 0))) AS t3,
    toTypeName(tupleHammingDistance((0, 0, 0, 0), (0, 0, 0, 0))) AS t4,
    toTypeName(tupleHammingDistance((0, 0, 0, 0, 0), (0, 0, 0, 0, 0))) AS t5
┌─t1────┬─t2─────┬─t3─────┬─t4─────┬─t5─────┐
│ UInt8 │ UInt16 │ UInt32 │ UInt64 │ UInt64 │
└───────┴────────┴────────┴────────┴────────┘

Examples

Query:

SELECT tupleHammingDistance((1, 2, 3), (3, 2, 1)) AS HammingDistance;

Result:

┌─HammingDistance─┐
│               2 │
└─────────────────┘

Can be used with MinHash functions for detection of semi-duplicate strings:

SELECT tupleHammingDistance(wordShingleMinHash(string), wordShingleMinHashCaseInsensitive(string)) as HammingDistance FROM (SELECT 'ClickHouse is a column-oriented database management system for online analytical processing of queries.' AS string);

Result:

┌─HammingDistance─┐
│               2 │
└─────────────────┘

TYPE CONVERSION

Common Issues of Numeric Conversions

When you convert a value from one to another data type, you should remember that in common case, it is an unsafe operation that can lead to a data loss. A data loss can occur if you try to fit value from a larger data type to a smaller data type, or if you convert values between different data types.

ClickHouse has the same behavior as C++ programs.

toInt(8|16|32|64|128|256)

Converts an input value to the Int data type. This function family includes:

  • toInt8(expr) — Results in the Int8 data type.

  • toInt16(expr) — Results in the Int16 data type.

  • toInt32(expr) — Results in the Int32 data type.

  • toInt64(expr) — Results in the Int64 data type.

  • toInt128(expr) — Results in the Int128 data type.

  • toInt256(expr) — Results in the Int256 data type.

Arguments

  • exprExpression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.

Returned value

Integer value in the Int8, Int16, Int32, Int64, Int128 or Int256 data type.

Functions use rounding towards zero, meaning they truncate fractional digits of numbers.

The behavior of functions for the NaN and Inf arguments is undefined. Remember about numeric conversions issues, when using the functions.

Example

Query:

SELECT toInt64(nan), toInt32(32), toInt16('16'), toInt8(8.8);

Result:

┌─────────toInt64(nan)─┬─toInt32(32)─┬─toInt16('16')─┬─toInt8(8.8)─┐
│ -9223372036854775808 │          32 │            16 │           8 │
└──────────────────────┴─────────────┴───────────────┴─────────────┘

toInt(8|16|32|64|128|256)OrZero

It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns 0.

Example

Query:

SELECT toInt64OrZero('123123'), toInt8OrZero('123qwe123');

Result:

┌─toInt64OrZero('123123')─┬─toInt8OrZero('123qwe123')─┐
│                  123123 │                         0 │
└─────────────────────────┴───────────────────────────┘

toInt(8|16|32|64|128|256)OrNull

It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns NULL.

Example

Query:

SELECT toInt64OrNull('123123'), toInt8OrNull('123qwe123');

Result:

┌─toInt64OrNull('123123')─┬─toInt8OrNull('123qwe123')─┐
│                  123123 │                      ᴺᵁᴸᴸ │
└─────────────────────────┴───────────────────────────┘

toInt(8|16|32|64|128|256)OrDefault

It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns the default type value.

Example

Query:

SELECT toInt64OrDefault('123123', cast('-1' as Int64)), toInt8OrDefault('123qwe123', cast('-1' as Int8));

Result:

┌─toInt64OrDefault('123123', CAST('-1', 'Int64'))─┬─toInt8OrDefault('123qwe123', CAST('-1', 'Int8'))─┐
│                                          123123 │                                               -1 │
└─────────────────────────────────────────────────┴──────────────────────────────────────────────────┘

toUInt(8|16|32|64|256)

Converts an input value to the UInt data type. This function family includes:

  • toUInt8(expr) — Results in the UInt8 data type.

  • toUInt16(expr) — Results in the UInt16 data type.

  • toUInt32(expr) — Results in the UInt32 data type.

  • toUInt64(expr) — Results in the UInt64 data type.

  • toUInt256(expr) — Results in the UInt256 data type.

Arguments

  • exprExpression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.

Returned value

Integer value in the UInt8, UInt16, UInt32, UInt64 or UInt256 data type.

Functions use rounding towards zero, meaning they truncate fractional digits of numbers.

The behavior of functions for negative arguments and for the NaN and Inf arguments is undefined. If you pass a string with a negative number, for example '-32', ClickHouse raises an exception. Remember about numeric conversions issues, when using the functions.

Example

Query:

SELECT toUInt64(nan), toUInt32(-32), toUInt16('16'), toUInt8(8.8);

Result:

┌───────toUInt64(nan)─┬─toUInt32(-32)─┬─toUInt16('16')─┬─toUInt8(8.8)─┐
│ 9223372036854775808 │    4294967264 │             16 │            8 │
└─────────────────────┴───────────────┴────────────────┴──────────────┘

toUInt(8|16|32|64|256)OrZero

toUInt(8|16|32|64|256)OrNull

toUInt(8|16|32|64|256)OrDefault

toFloat(32|64)

toFloat(32|64)OrZero

toFloat(32|64)OrNull

toFloat(32|64)OrDefault

toDate

Converts the argument to Date data type.

If the argument is DateTime or DateTime64, it truncates it, leaving the date component of the DateTime:

SELECT
    now() AS x,
    toDate(x)
┌───────────────────x─┬─toDate(now())─┐
│ 2022-12-30 13:44:17 │    2022-12-30 │
└─────────────────────┴───────────────┘

If the argument is a string, it is parsed as Date or DateTime. If it was parsed as DateTime, the date component is being used:

SELECT
    toDate('2022-12-30') AS x,
    toTypeName(x)
┌──────────x─┬─toTypeName(toDate('2022-12-30'))─┐
│ 2022-12-30 │ Date                             │
└────────────┴──────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec. 
SELECT
    toDate('2022-12-30 01:02:03') AS x,
    toTypeName(x)
┌──────────x─┬─toTypeName(toDate('2022-12-30 01:02:03'))─┐
│ 2022-12-30 │ Date                                      │
└────────────┴───────────────────────────────────────────┘

If the argument is a number and it looks like a UNIX timestamp (is greater than 65535), it is interpreted as a DateTime, then truncated to Date in the current timezone. The timezone argument can be specified as a second argument of the function. The truncation to Date depends on the timezone:

SELECT
    now() AS current_time,
    toUnixTimestamp(current_time) AS ts,
    toDateTime(ts) AS time_Amsterdam,
    toDateTime(ts, 'Pacific/Apia') AS time_Samoa,
    toDate(time_Amsterdam) AS date_Amsterdam,
    toDate(time_Samoa) AS date_Samoa,
    toDate(ts) AS date_Amsterdam_2,
    toDate(ts, 'Pacific/Apia') AS date_Samoa_2
Row 1:
──────
current_time:     2022-12-30 13:51:54
ts:               1672404714
time_Amsterdam:   2022-12-30 13:51:54
time_Samoa:       2022-12-31 01:51:54
date_Amsterdam:   2022-12-30
date_Samoa:       2022-12-31
date_Amsterdam_2: 2022-12-30
date_Samoa_2:     2022-12-31

The example above demonstrates how the same UNIX timestamp can be interpreted as different dates in different time zones.

If the argument is a number and it is smaller than 65536, it is interpreted as the number of days since 1970-01-01 (a UNIX day) and converted to Date. It corresponds to the internal numeric representation of the Date data type. Example:

SELECT toDate(12345)
┌─toDate(12345)─┐
│    2003-10-20 │
└───────────────┘

This conversion does not depend on timezones.

If the argument does not fit in the range of the Date type, it results in an implementation-defined behavior, that can saturate to the maximum supported date or overflow:

SELECT toDate(10000000000.)
┌─toDate(10000000000.)─┐
│           2106-02-07 │
└──────────────────────┘

The function toDate can be also written in alternative forms:

SELECT
    now() AS time,
    toDate(time),
    DATE(time),
    CAST(time, 'Date')
┌────────────────time─┬─toDate(now())─┬─DATE(now())─┬─CAST(now(), 'Date')─┐
│ 2022-12-30 13:54:58 │    2022-12-30 │  2022-12-30 │          2022-12-30 │
└─────────────────────┴───────────────┴─────────────┴─────────────────────┘

Have a nice day working with dates and times.

toDateOrZero

toDateOrNull

toDateOrDefault

toDateTime

toDateTimeOrZero

toDateTimeOrNull

toDateTimeOrDefault

toDate32

Converts the argument to the Date32 data type. If the value is outside the range, toDate32 returns the border values supported by Date32. If the argument has Date type, borders of Date are taken into account.

Syntax

toDate32(expr)

Arguments

Returned value

  • A calendar date.

Type: Date32.

Example

  1. The value is within the range:

SELECT toDate32('1955-01-01') AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32('1925-01-01'))─┐
│ 1955-01-01 │ Date32                             │
└────────────┴────────────────────────────────────┘
  1. The value is outside the range:

SELECT toDate32('1899-01-01') AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32('1899-01-01'))─┐
│ 1900-01-01 │ Date32                             │
└────────────┴────────────────────────────────────┘
  1. With Date-type argument:

SELECT toDate32(toDate('1899-01-01')) AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32(toDate('1899-01-01')))─┐
│ 1970-01-01 │ Date32                                     │
└────────────┴────────────────────────────────────────────┘

toDate32OrZero

The same as toDate32 but returns the min value of Date32 if an invalid argument is received.

Example

Query:

SELECT toDate32OrZero('1899-01-01'), toDate32OrZero('');

Result:

┌─toDate32OrZero('1899-01-01')─┬─toDate32OrZero('')─┐
│                   1900-01-01 │         1900-01-01 │
└──────────────────────────────┴────────────────────┘

toDate32OrNull

The same as toDate32 but returns NULL if an invalid argument is received.

Example

Query:

SELECT toDate32OrNull('1955-01-01'), toDate32OrNull('');

Result:

┌─toDate32OrNull('1955-01-01')─┬─toDate32OrNull('')─┐
│                   1955-01-01 │               ᴺᵁᴸᴸ │
└──────────────────────────────┴────────────────────┘

toDate32OrDefault

Converts the argument to the Date32 data type. If the value is outside the range, toDate32OrDefault returns the lower border value supported by Date32. If the argument has Date type, borders of Date are taken into account. Returns default value if an invalid argument is received.

Example

Query:

SELECT
    toDate32OrDefault('1930-01-01', toDate32('2020-01-01')),
    toDate32OrDefault('xx1930-01-01', toDate32('2020-01-01'));

Result:

┌─toDate32OrDefault('1930-01-01', toDate32('2020-01-01'))─┬─toDate32OrDefault('xx1930-01-01', toDate32('2020-01-01'))─┐
│                                              1930-01-01 │                                                2020-01-01 │
└─────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘

toDateTime64

Converts the argument to the DateTime64 data type.

Syntax

toDateTime64(expr, scale, [timezone])

Arguments

  • expr — The value. String, UInt32, Float or DateTime.

  • scale - Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ].

  • timezone - Time zone of the specified datetime64 object.

Returned value

  • A calendar date and time of day, with sub-second precision.

Type: DateTime64.

Example

  1. The value is within the range:

SELECT toDateTime64('1955-01-01 00:00:00.000', 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64('1955-01-01 00:00:00.000', 3))─┐
│ 1955-01-01 00:00:00.000 │ DateTime64(3)                                          │
└─────────────────────────┴────────────────────────────────────────────────────────┘
  1. As decimal with precision:

SELECT toDateTime64(1546300800.000, 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64(1546300800., 3))─┐
│ 2019-01-01 00:00:00.000 │ DateTime64(3)                            │
└─────────────────────────┴──────────────────────────────────────────┘

Without the decimal point the value is still treated as Unix Timestamp in seconds:

SELECT toDateTime64(1546300800000, 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64(1546300800000, 3))─┐
│ 2282-12-31 00:00:00.000 │ DateTime64(3)                              │
└─────────────────────────┴────────────────────────────────────────────┘
  1. With timezone:

SELECT toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul') AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul'))─┐
│ 2019-01-01 00:00:00.000 │ DateTime64(3, 'Asia/Istanbul')                                      │
└─────────────────────────┴─────────────────────────────────────────────────────────────────────┘

toDecimal(32|64|128|256)

Converts value to the Decimal data type with precision of S. The value can be a number or a string. The S (scale) parameter specifies the number of decimal places.

  • toDecimal32(value, S)

  • toDecimal64(value, S)

  • toDecimal128(value, S)

  • toDecimal256(value, S)

toDecimal(32|64|128|256)OrNull

Converts an input string to a Nullable(Decimal(P,S)) data type value. This family of functions includes:

  • toDecimal32OrNull(expr, S) — Results in Nullable(Decimal32(S)) data type.

  • toDecimal64OrNull(expr, S) — Results in Nullable(Decimal64(S)) data type.

  • toDecimal128OrNull(expr, S) — Results in Nullable(Decimal128(S)) data type.

  • toDecimal256OrNull(expr, S) — Results in Nullable(Decimal256(S)) data type.

These functions should be used instead of toDecimal*() functions, if you prefer to get a NULL value instead of an exception in the event of an input value parsing error.

Arguments

  • exprExpression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example, '1.111'.

  • S — Scale, the number of decimal places in the resulting value.

Returned value

A value in the Nullable(Decimal(P,S)) data type. The value contains:

  • Number with S decimal places, if ClickHouse interprets the input string as a number.

  • NULL, if ClickHouse can’t interpret the input string as a number or if the input number contains more than S decimal places.

Examples

Query:

SELECT toDecimal32OrNull(toString(-1.111), 5) AS val, toTypeName(val);

Result:

┌────val─┬─toTypeName(toDecimal32OrNull(toString(-1.111), 5))─┐
│ -1.111 │ Nullable(Decimal(9, 5))                            │
└────────┴────────────────────────────────────────────────────┘

Query:

SELECT toDecimal32OrNull(toString(-1.111), 2) AS val, toTypeName(val);

Result:

┌──val─┬─toTypeName(toDecimal32OrNull(toString(-1.111), 2))─┐
│ ᴺᵁᴸᴸ │ Nullable(Decimal(9, 2))                            │
└──────┴────────────────────────────────────────────────────┘

toDecimal(32|64|128|256)OrDefault

Converts an input string to a Decimal(P,S) data type value. This family of functions includes:

  • toDecimal32OrDefault(expr, S) — Results in Decimal32(S) data type.

  • toDecimal64OrDefault(expr, S) — Results in Decimal64(S) data type.

  • toDecimal128OrDefault(expr, S) — Results in Decimal128(S) data type.

  • toDecimal256OrDefault(expr, S) — Results in Decimal256(S) data type.

These functions should be used instead of toDecimal*() functions, if you prefer to get a default value instead of an exception in the event of an input value parsing error.

Arguments

  • exprExpression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example, '1.111'.

  • S — Scale, the number of decimal places in the resulting value.

Returned value

A value in the Decimal(P,S) data type. The value contains:

  • Number with S decimal places, if ClickHouse interprets the input string as a number.

  • Default Decimal(P,S) data type value, if ClickHouse can’t interpret the input string as a number or if the input number contains more than S decimal places.

Examples

Query:

SELECT toDecimal32OrDefault(toString(-1.111), 5) AS val, toTypeName(val);

Result:

┌────val─┬─toTypeName(toDecimal32OrDefault(toString(-1.111), 5))─┐
│ -1.111 │ Decimal(9, 5)                                         │
└────────┴───────────────────────────────────────────────────────┘

Query:

SELECT toDecimal32OrDefault(toString(-1.111), 2) AS val, toTypeName(val);

Result:

┌─val─┬─toTypeName(toDecimal32OrDefault(toString(-1.111), 2))─┐
│   0 │ Decimal(9, 2)                                         │
└─────┴───────────────────────────────────────────────────────┘

toDecimal(32|64|128|256)OrZero

Converts an input value to the Decimal(P,S) data type. This family of functions includes:

  • toDecimal32OrZero( expr, S) — Results in Decimal32(S) data type.

  • toDecimal64OrZero( expr, S) — Results in Decimal64(S) data type.

  • toDecimal128OrZero( expr, S) — Results in Decimal128(S) data type.

  • toDecimal256OrZero( expr, S) — Results in Decimal256(S) data type.

These functions should be used instead of toDecimal*() functions, if you prefer to get a 0 value instead of an exception in the event of an input value parsing error.

Arguments

  • exprExpression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example, '1.111'.

  • S — Scale, the number of decimal places in the resulting value.

Returned value

A value in the Nullable(Decimal(P,S)) data type. The value contains:

  • Number with S decimal places, if ClickHouse interprets the input string as a number.

  • 0 with S decimal places, if ClickHouse can’t interpret the input string as a number or if the input number contains more than S decimal places.

Example

Query:

SELECT toDecimal32OrZero(toString(-1.111), 5) AS val, toTypeName(val);

Result:

┌────val─┬─toTypeName(toDecimal32OrZero(toString(-1.111), 5))─┐
│ -1.111 │ Decimal(9, 5)                                      │
└────────┴────────────────────────────────────────────────────┘

Query:

SELECT toDecimal32OrZero(toString(-1.111), 2) AS val, toTypeName(val);

Result:

┌──val─┬─toTypeName(toDecimal32OrZero(toString(-1.111), 2))─┐
│ 0.00 │ Decimal(9, 2)                                      │
└──────┴────────────────────────────────────────────────────┘

toString

Functions for converting between numbers, strings (but not fixed strings), dates, and dates with times. All these functions accept one argument.

When converting to or from a string, the value is formatted or parsed using the same rules as for the TabSeparated format (and almost all other text formats). If the string can’t be parsed, an exception is thrown and the request is canceled.

When converting dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix epoch. When converting dates with times to numbers or vice versa, the date with time corresponds to the number of seconds since the beginning of the Unix epoch.

The date and date-with-time formats for the toDate/toDateTime functions are defined as follows:

YYYY-MM-DD
YYYY-MM-DD hh:mm:ss

As an exception, if converting from UInt32, Int32, UInt64, or Int64 numeric types to Date, and if the number is greater than or equal to 65536, the number is interpreted as a Unix timestamp (and not as the number of days) and is rounded to the date. This allows support for the common occurrence of writing ‘toDate(unix_timestamp)’, which otherwise would be an error and would require writing the more cumbersome ‘toDate(toDateTime(unix_timestamp))’.

Conversion between a date and a date with time is performed the natural way: by adding a null time or dropping the time.

Conversion between numeric types uses the same rules as assignments between different numeric types in C++.

Additionally, the toString function of the DateTime argument can take a second String argument containing the name of the time zone. Example: Asia/Yekaterinburg In this case, the time is formatted according to the specified time zone.

Example

Query:

SELECT
    now() AS now_local,
    toString(now(), 'Asia/Yekaterinburg') AS now_yekat;

Result:

┌───────────now_local─┬─now_yekat───────────┐
│ 2016-06-15 00:11:21 │ 2016-06-15 02:11:21 │
└─────────────────────┴─────────────────────┘

Also see the toUnixTimestamp function.

toFixedString(s, N)

Converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is padded with null bytes to the right. If the string has more bytes than N, an exception is thrown.

toStringCutToZero(s)

Accepts a String or FixedString argument. Returns the String with the content truncated at the first zero byte found.

Example

Query:

SELECT toFixedString('foo', 8) AS s, toStringCutToZero(s) AS s_cut;

Result:

┌─s─────────────┬─s_cut─┐
│ foo\0\0\0\0\0 │ foo   │
└───────────────┴───────┘

Query:

SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut;

Result:

┌─s──────────┬─s_cut─┐
│ foo\0bar\0 │ foo   │
└────────────┴───────┘

reinterpretAsUInt(8|16|32|64)

reinterpretAsInt(8|16|32|64)

reinterpretAsFloat(32|64)

reinterpretAsDate

reinterpretAsDateTime

These functions accept a string and interpret the bytes placed at the beginning of the string as a number in host order (little endian). If the string isn’t long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date is interpreted as the number of days since the beginning of the Unix Epoch, and a date with time is interpreted as the number of seconds since the beginning of the Unix Epoch.

reinterpretAsString

This function accepts a number or date or date with time and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.

reinterpretAsFixedString

This function accepts a number or date or date with time and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.

reinterpretAsUUID

Accepts 16 bytes string and returns UUID containing bytes representing the corresponding value in network byte order (big-endian). If the string isn't long enough, the function works as if the string is padded with the necessary number of null bytes to the end. If the string is longer than 16 bytes, the extra bytes at the end are ignored.

Syntax

reinterpretAsUUID(fixed_string)

Arguments

Returned value

  • The UUID type value. UUID.

Examples

String to UUID.

Query:

SELECT reinterpretAsUUID(reverse(unhex('000102030405060708090a0b0c0d0e0f')));

Result:

┌─reinterpretAsUUID(reverse(unhex('000102030405060708090a0b0c0d0e0f')))─┐
│                                  08090a0b-0c0d-0e0f-0001-020304050607 │
└───────────────────────────────────────────────────────────────────────┘

Going back and forth from String to UUID.

Query:

WITH
    generateUUIDv4() AS uuid,
    identity(lower(hex(reverse(reinterpretAsString(uuid))))) AS str,
    reinterpretAsUUID(reverse(unhex(str))) AS uuid2
SELECT uuid = uuid2;

Result:

┌─equals(uuid, uuid2)─┐
│                   1 │
└─────────────────────┘

reinterpret(x, T)

Uses the same source in-memory bytes sequence for x value and reinterprets it to destination type.

Syntax

reinterpret(x, type)

Arguments

  • x — Any type.

  • type — Destination type. String.

Returned value

  • Destination type value.

Examples

Query:

SELECT reinterpret(toInt8(-1), 'UInt8') as int_to_uint,
    reinterpret(toInt8(1), 'Float32') as int_to_float,
    reinterpret('1', 'UInt32') as string_to_int;

Result:

┌─int_to_uint─┬─int_to_float─┬─string_to_int─┐
│         255 │        1e-45 │            49 │
└─────────────┴──────────────┴───────────────┘

CAST(x, T)

Converts an input value to the specified data type. Unlike the reinterpret function, CAST tries to present the same value using the new data type. If the conversion can not be done then an exception is raised. Several syntax variants are supported.

Syntax

CAST(x, T)
CAST(x AS t)
x::t

Arguments

  • x — A value to convert. May be of any type.

  • T — The name of the target data type. String.

  • t — The target data type.

Returned value

  • Converted value.

NOTE

If the input value does not fit the bounds of the target type, the result overflows. For example, CAST(-1, 'UInt8') returns 255.

Examples

Query:

SELECT
    CAST(toInt8(-1), 'UInt8') AS cast_int_to_uint,
    CAST(1.5 AS Decimal(3,2)) AS cast_float_to_decimal,
    '1'::Int32 AS cast_string_to_int;

Result:

┌─cast_int_to_uint─┬─cast_float_to_decimal─┬─cast_string_to_int─┐
│              255 │                  1.50 │                  1 │
└──────────────────┴───────────────────────┴────────────────────┘

Query:

SELECT
    '2016-06-15 23:00:00' AS timestamp,
    CAST(timestamp AS DateTime) AS datetime,
    CAST(timestamp AS Date) AS date,
    CAST(timestamp, 'String') AS string,
    CAST(timestamp, 'FixedString(22)') AS fixed_string;

Result:

┌─timestamp───────────┬────────────datetime─┬───────date─┬─string──────────────┬─fixed_string──────────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00\0\0\0 │
└─────────────────────┴─────────────────────┴────────────┴─────────────────────┴───────────────────────────┘

Conversion to FixedString(N) only works for arguments of type String or FixedString.

Type conversion to Nullable and back is supported.

Example

Query:

SELECT toTypeName(x) FROM t_null;

Result:

┌─toTypeName(x)─┐
│ Int8          │
│ Int8          │
└───────────────┘

Query:

SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null;

Result:

┌─toTypeName(CAST(x, 'Nullable(UInt16)'))─┐
│ Nullable(UInt16)                        │
│ Nullable(UInt16)                        │
└─────────────────────────────────────────┘

See also

accurateCast(x, T)

Converts x to the T data type.

The difference from cast(x, T) is that accurateCast does not allow overflow of numeric types during cast if type value x does not fit the bounds of type T. For example, accurateCast(-1, 'UInt8') throws an exception.

Example

Query:

SELECT cast(-1, 'UInt8') as uint8;

Result:

┌─uint8─┐
│   255 │
└───────┘

Query:

SELECT accurateCast(-1, 'UInt8') as uint8;

Result:

Code: 70. DB::Exception: Received from localhost:9000. DB::Exception: Value in column Int8 cannot be safely converted into type UInt8: While processing accurateCast(-1, 'UInt8') AS uint8.

accurateCastOrNull(x, T)

Converts input value x to the specified data type T. Always returns Nullable type and returns NULL if the casted value is not representable in the target type.

Syntax

accurateCastOrNull(x, T)

Parameters

  • x — Input value.

  • T — The name of the returned data type.

Returned value

  • The value, converted to the specified data type T.

Example

Query:

SELECT toTypeName(accurateCastOrNull(5, 'UInt8'));

Result:

┌─toTypeName(accurateCastOrNull(5, 'UInt8'))─┐
│ Nullable(UInt8)                            │
└────────────────────────────────────────────┘

Query:

SELECT
    accurateCastOrNull(-1, 'UInt8') as uint8,
    accurateCastOrNull(128, 'Int8') as int8,
    accurateCastOrNull('Test', 'FixedString(2)') as fixed_string;

Result:

┌─uint8─┬─int8─┬─fixed_string─┐
│  ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ         │
└───────┴──────┴──────────────┘

accurateCastOrDefault(x, T[, default_value])

Converts input value x to the specified data type T. Returns default type value or default_value if specified if the casted value is not representable in the target type.

Syntax

accurateCastOrDefault(x, T)

Parameters

  • x — Input value.

  • T — The name of the returned data type.

  • default_value — Default value of returned data type.

Returned value

  • The value converted to the specified data type T.

Example

Query:

SELECT toTypeName(accurateCastOrDefault(5, 'UInt8'));

Result:

┌─toTypeName(accurateCastOrDefault(5, 'UInt8'))─┐
│ UInt8                                         │
└───────────────────────────────────────────────┘

Query:

SELECT
    accurateCastOrDefault(-1, 'UInt8') as uint8,
    accurateCastOrDefault(-1, 'UInt8', 5) as uint8_default,
    accurateCastOrDefault(128, 'Int8') as int8,
    accurateCastOrDefault(128, 'Int8', 5) as int8_default,
    accurateCastOrDefault('Test', 'FixedString(2)') as fixed_string,
    accurateCastOrDefault('Test', 'FixedString(2)', 'Te') as fixed_string_default;

Result:

┌─uint8─┬─uint8_default─┬─int8─┬─int8_default─┬─fixed_string─┬─fixed_string_default─┐
│     0 │             5 │    0 │            5 │              │ Te                   │
└───────┴───────────────┴──────┴──────────────┴──────────────┴──────────────────────┘

toInterval(Year|Quarter|Month|Week|Day|Hour|Minute|Second)

Converts a Number type argument to an Interval data type.

Syntax

toIntervalSecond(number)
toIntervalMinute(number)
toIntervalHour(number)
toIntervalDay(number)
toIntervalWeek(number)
toIntervalMonth(number)
toIntervalQuarter(number)
toIntervalYear(number)

Arguments

  • number — Duration of interval. Positive integer number.

Returned values

  • The value in Interval data type.

Example

Query:

WITH
    toDate('2019-01-01') AS date,
    INTERVAL 1 WEEK AS interval_week,
    toIntervalWeek(1) AS interval_to_week
SELECT
    date + interval_week,
    date + interval_to_week;

Result:

┌─plus(date, interval_week)─┬─plus(date, interval_to_week)─┐
│                2019-01-08 │                   2019-01-08 │
└───────────────────────────┴──────────────────────────────┘

parseDateTimeBestEffort

parseDateTime32BestEffort

Converts a date and time in the String representation to DateTime data type.

The function parses ISO 8601, RFC 1123 - 5.2.14 RFC-822 Date and Time Specification, ClickHouse’s and some other date and time formats.

Syntax

parseDateTimeBestEffort(time_string [, time_zone])

Arguments

  • time_string — String containing a date and time to convert. String.

  • time_zone — Time zone. The function parses time_string according to the time zone. String.

Supported non-standard formats

  • A string containing 9..10 digit unix timestamp.

  • A string with a date and a time component: YYYYMMDDhhmmss, DD/MM/YYYY hh:mm:ss, DD-MM-YY hh:mm, YYYY-MM-DD hh:mm:ss, etc.

  • A string with a date, but no time component: YYYY, YYYYMM, YYYY*MM, DD/MM/YYYY, DD-MM-YY etc.

  • A string with a day and time: DD, DD hh, DD hh:mm. In this case YYYY-MM are substituted as 2000-01.

  • A string that includes the date and time along with time zone offset information: YYYY-MM-DD hh:mm:ss ±h:mm, etc. For example, 2020-12-12 17:36:00 -5:00.

For all of the formats with separator the function parses months names expressed by their full name or by the first three letters of a month name. Examples: 24/DEC/18, 24-Dec-18, 01-September-2018.

Returned value

  • time_string converted to the DateTime data type.

Examples

Query:

SELECT parseDateTimeBestEffort('23/10/2020 12:12:57')
AS parseDateTimeBestEffort;

Result:

┌─parseDateTimeBestEffort─┐
│     2020-10-23 12:12:57 │
└─────────────────────────┘

Query:

SELECT parseDateTimeBestEffort('Sat, 18 Aug 2018 07:22:16 GMT', 'Asia/Istanbul')
AS parseDateTimeBestEffort;

Result:

┌─parseDateTimeBestEffort─┐
│     2018-08-18 10:22:16 │
└─────────────────────────┘

Query:

SELECT parseDateTimeBestEffort('1284101485')
AS parseDateTimeBestEffort;

Result:

┌─parseDateTimeBestEffort─┐
│     2015-07-07 12:04:41 │
└─────────────────────────┘

Query:

SELECT parseDateTimeBestEffort('2018-10-23 10:12:12')
AS parseDateTimeBestEffort;

Result:

┌─parseDateTimeBestEffort─┐
│     2018-10-23 10:12:12 │
└─────────────────────────┘

Query:

SELECT parseDateTimeBestEffort('10 20:19');

Result:

┌─parseDateTimeBestEffort('10 20:19')─┐
│                 2000-01-10 20:19:00 │
└─────────────────────────────────────┘

See Also

parseDateTimeBestEffortUS

This function behaves like parseDateTimeBestEffort for ISO date formats, e.g. YYYY-MM-DD hh:mm:ss, and other date formats where the month and date components can be unambiguously extracted, e.g. YYYYMMDDhhmmss, YYYY-MM, DD hh, or YYYY-MM-DD hh:mm:ss ±h:mm. If the month and the date components cannot be unambiguously extracted, e.g. MM/DD/YYYY, MM-DD-YYYY, or MM-DD-YY, it prefers the US date format instead of DD/MM/YYYY, DD-MM-YYYY, or DD-MM-YY. As an exception from the latter, if the month is bigger than 12 and smaller or equal than 31, this function falls back to the behavior of parseDateTimeBestEffort, e.g. 15/08/2020 is parsed as 2020-08-15.

parseDateTimeBestEffortOrNull

parseDateTime32BestEffortOrNull

Same as for parseDateTimeBestEffort except that it returns NULL when it encounters a date format that cannot be processed.

parseDateTimeBestEffortOrZero

parseDateTime32BestEffortOrZero

Same as for parseDateTimeBestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.

parseDateTimeBestEffortUSOrNull

Same as parseDateTimeBestEffortUS function except that it returns NULL when it encounters a date format that cannot be processed.

parseDateTimeBestEffortUSOrZero

Same as parseDateTimeBestEffortUS function except that it returns zero date (1970-01-01) or zero date with time (1970-01-01 00:00:00) when it encounters a date format that cannot be processed.

parseDateTime64BestEffort

Same as parseDateTimeBestEffort function but also parse milliseconds and microseconds and returns DateTime data type.

Syntax

parseDateTime64BestEffort(time_string [, precision [, time_zone]])

Parameters

  • time_string — String containing a date or date with time to convert. String.

  • precision — Required precision. 3 — for milliseconds, 6 — for microseconds. Default — 3. Optional. UInt8.

  • time_zoneTimezone. The function parses time_string according to the timezone. Optional. String.

Returned value

  • time_string converted to the DateTime data type.

Examples

Query:

SELECT parseDateTime64BestEffort('2021-01-01') AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346') AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346',6) AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346',3,'Asia/Istanbul') AS a, toTypeName(a) AS t
FORMAT PrettyCompactMonoBlock;

Result:

┌──────────────────────────a─┬─t──────────────────────────────┐
│ 2021-01-01 01:01:00.123000 │ DateTime64(3)                  │
│ 2021-01-01 00:00:00.000000 │ DateTime64(3)                  │
│ 2021-01-01 01:01:00.123460 │ DateTime64(6)                  │
│ 2020-12-31 22:01:00.123000 │ DateTime64(3, 'Asia/Istanbul') │
└────────────────────────────┴────────────────────────────────┘

parseDateTime64BestEffortUS

Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY etc.) in case of ambiguity.

parseDateTime64BestEffortOrNull

Same as for parseDateTime64BestEffort except that it returns NULL when it encounters a date format that cannot be processed.

parseDateTime64BestEffortOrZero

Same as for parseDateTime64BestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.

parseDateTime64BestEffortUSOrNull

Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY etc.) in case of ambiguity and returns NULL when it encounters a date format that cannot be processed.

parseDateTime64BestEffortUSOrZero

Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY etc.) in case of ambiguity and returns zero date or zero date time when it encounters a date format that cannot be processed.

toLowCardinality

Converts input parameter to the LowCardinality version of same data type.

To convert data from the LowCardinality data type use the CAST function. For example, CAST(x as String).

Syntax

toLowCardinality(expr)

Arguments

Returned values

  • Result of expr.

Type: LowCardinality(expr_result_type)

Example

Query:

SELECT toLowCardinality('1');

Result:

┌─toLowCardinality('1')─┐
│ 1                     │
└───────────────────────┘

toUnixTimestamp64Milli

toUnixTimestamp64Micro

toUnixTimestamp64Nano

Converts a DateTime64 to a Int64 value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision.

NOTE

The output value is a timestamp in UTC, not in the timezone of DateTime64.

Syntax

toUnixTimestamp64Milli(value)

Arguments

  • value — DateTime64 value with any precision.

Returned value

  • value converted to the Int64 data type.

Examples

Query:

WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64
SELECT toUnixTimestamp64Milli(dt64);

Result:

┌─toUnixTimestamp64Milli(dt64)─┐
│                1568650812345 │
└──────────────────────────────┘

Query:

WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64
SELECT toUnixTimestamp64Nano(dt64);

Result:

┌─toUnixTimestamp64Nano(dt64)─┐
│         1568650812345678000 │
└─────────────────────────────┘

fromUnixTimestamp64Milli

fromUnixTimestamp64Micro

fromUnixTimestamp64Nano

Converts an Int64 to a DateTime64 value with fixed sub-second precision and optional timezone. Input value is scaled up or down appropriately depending on it’s precision. Please note that input value is treated as UTC timestamp, not timestamp at given (or implicit) timezone.

Syntax

fromUnixTimestamp64Milli(value [, ti])

Arguments

  • valueInt64 value with any precision.

  • timezoneString (optional) timezone name of the result.

Returned value

  • value converted to the DateTime64 data type.

Example

Query:

WITH CAST(1234567891011, 'Int64') AS i64
SELECT fromUnixTimestamp64Milli(i64, 'UTC');

Result:

┌─fromUnixTimestamp64Milli(i64, 'UTC')─┐
│              2009-02-13 23:31:31.011 │
└──────────────────────────────────────┘

formatRow

Converts arbitrary expressions into a string via given format.

Syntax

formatRow(format, x, y, ...)

Arguments

  • format — Text format. For example, CSV, TSV.

  • x,y, ... — Expressions.

Returned value

  • A formatted string. (for text formats it's usually terminated with the new line character).

Example

Query:

SELECT formatRow('CSV', number, 'good')
FROM numbers(3);

Result:

┌─formatRow('CSV', number, 'good')─┐
│ 0,"good"

│ 1,"good"

│ 2,"good"

└──────────────────────────────────┘

Note: If format contains suffix/prefix, it will be written in each row.

Example

Query:

SELECT formatRow('CustomSeparated', number, 'good')
FROM numbers(3)
SETTINGS format_custom_result_before_delimiter='<prefix>\n', format_custom_result_after_delimiter='<suffix>'

Result:

┌─formatRow('CustomSeparated', number, 'good')─┐
│ <prefix>
0   good
<suffix>                   │
│ <prefix>
1   good
<suffix>                   │
│ <prefix>
2   good
<suffix>                   │
└──────────────────────────────────────────────┘

Note: Only row-based formats are supported in this function.

formatRowNoNewline

Converts arbitrary expressions into a string via given format. Differs from formatRow in that this function trims the last if any.

Syntax

formatRowNoNewline(format, x, y, ...)

Arguments

  • format — Text format. For example, CSV, TSV.

  • x,y, ... — Expressions.

Returned value

  • A formatted string.

Example

Query:

SELECT formatRowNoNewline('CSV', number, 'good')
FROM numbers(3);

Result:

┌─formatRowNoNewline('CSV', number, 'good')─┐
│ 0,"good"                                  │
│ 1,"good"                                  │
│ 2,"good"                                  │
└───────────────────────────────────────────┘

URLs

All these functions do not follow the RFC. They are maximally simplified for improved performance.

protocol

Extracts the protocol from a URL.

Examples of typical returned values: http, https, ftp, mailto, tel, magnet…

domain

Extracts the hostname from a URL.

domain(url)

Arguments

The URL can be specified with or without a scheme. Examples:

svn+ssh://some.svn-hosting.com:80/repo/trunk
some.svn-hosting.com:80/repo/trunk
https://clickhouse.com/time/

For these examples, the domain function returns the following results:

some.svn-hosting.com
some.svn-hosting.com
clickhouse.com

Returned values

  • Host name. If ClickHouse can parse the input string as a URL.

  • Empty string. If ClickHouse can’t parse the input string as a URL.

Type: String.

Example

SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk');
┌─domain('svn+ssh://some.svn-hosting.com:80/repo/trunk')─┐
│ some.svn-hosting.com                                   │
└────────────────────────────────────────────────────────┘

domainWithoutWWW

Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present.

topLevelDomain

Extracts the the top-level domain from a URL.

topLevelDomain(url)

Arguments

The URL can be specified with or without a scheme. Examples:

svn+ssh://some.svn-hosting.com:80/repo/trunk
some.svn-hosting.com:80/repo/trunk
https://clickhouse.com/time/

Returned values

  • Domain name. If ClickHouse can parse the input string as a URL.

  • Empty string. If ClickHouse cannot parse the input string as a URL.

Type: String.

Example

SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk');
┌─topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')─┐
│ com                                                                │
└────────────────────────────────────────────────────────────────────┘

firstSignificantSubdomain

Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’. The list of “insignificant” second-level domains and other implementation details may change in the future.

cutToFirstSignificantSubdomain

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above).

For example:

  • cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'.

  • cutToFirstSignificantSubdomain('www.tr') = 'tr'.

  • cutToFirstSignificantSubdomain('tr') = ''.

cutToFirstSignificantSubdomainWithWWW

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www".

For example:

  • cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'.

  • cutToFirstSignificantSubdomain('www.tr') = 'www.tr'.

  • cutToFirstSignificantSubdomain('tr') = ''.

cutToFirstSignificantSubdomainCustom

Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom TLD list name.

Can be useful if you need fresh TLD list or you have custom.

Configuration example:

<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
    <!-- https://publicsuffix.org/list/public_suffix_list.dat -->
    <public_suffix_list>public_suffix_list.dat</public_suffix_list>
    <!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>

Syntax

cutToFirstSignificantSubdomain(URL, TLD)

Parameters

Returned value

  • Part of the domain that includes top-level subdomains up to the first significant subdomain.

Type: String.

Example

Query:

SELECT cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');

Result:

┌─cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┐
│ foo.there-is-no-such-domain                                                                   │
└───────────────────────────────────────────────────────────────────────────────────────────────┘

See Also

cutToFirstSignificantSubdomainCustomWithWWW

Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping www. Accepts custom TLD list name.

Can be useful if you need fresh TLD list or you have custom.

Configuration example:

<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
    <!-- https://publicsuffix.org/list/public_suffix_list.dat -->
    <public_suffix_list>public_suffix_list.dat</public_suffix_list>
    <!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>

Syntax

cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)

Parameters

Returned value

  • Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping www.

Type: String.

Example

Query:

SELECT cutToFirstSignificantSubdomainCustomWithWWW('www.foo', 'public_suffix_list');

Result:

┌─cutToFirstSignificantSubdomainCustomWithWWW('www.foo', 'public_suffix_list')─┐
│ www.foo                                                                      │
└──────────────────────────────────────────────────────────────────────────────┘

See Also

firstSignificantSubdomainCustom

Returns the first significant subdomain. Accepts customs TLD list name.

Can be useful if you need fresh TLD list or you have custom.

Configuration example:

<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
    <!-- https://publicsuffix.org/list/public_suffix_list.dat -->
    <public_suffix_list>public_suffix_list.dat</public_suffix_list>
    <!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>

Syntax

firstSignificantSubdomainCustom(URL, TLD)

Parameters

Returned value

  • First significant subdomain.

Type: String.

Example

Query:

SELECT firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');

Result:

┌─firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┐
│ foo                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────────┘

See Also

port(URL[, default_port = 0])

Returns the port or default_port if there is no port in the URL (or in case of validation error).

path

Returns the path. Example: /top/news.html The path does not include the query string.

pathFull

The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments

queryString

Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #.

fragment

Returns the fragment identifier. fragment does not include the initial hash symbol.

queryStringAndFragment

Returns the query string and fragment identifier. Example: page=1#29390.

extractURLParameter(URL, name)

Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

extractURLParameters(URL)

Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.

extractURLParameterNames(URL)

Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.

URLHierarchy(URL)

Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.

URLPathHierarchy(URL)

The same as above, but without the protocol and host in the result. The / element (root) is not included.

URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
    '/browse/',
    '/browse/CONV-6788'
]

decodeURLComponent(URL)

Returns the decoded URL. Example:

SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
┌─DecodedURL─────────────────────────────┐
│ http://127.0.0.1:8123/?query=SELECT 1; │
└────────────────────────────────────────┘

netloc

Extracts network locality (username:password@host:port) from a URL.

Syntax

netloc(URL)

Arguments

Returned value

  • username:password@host:port.

Type: String.

Example

Query:

SELECT netloc('http://paul@www.example.com:80/');

Result:

┌─netloc('http://paul@www.example.com:80/')─┐
│ paul@www.example.com:80                   │
└───────────────────────────────────────────┘

cutWWW

Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present.

cutQueryString

Removes query string. The question mark is also removed.

cutFragment

Removes the fragment identifier. The number sign is also removed.

cutQueryStringAndFragment

Removes the query string and fragment identifier. The question mark and number sign are also removed.

cutURLParameter(URL, name)

Removes the name parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. Client ID and Client%20ID are treated as different parameter names.

Syntax

cutURLParameter(URL, name)

Arguments

Returned value

  • URL with name URL parameter removed.

Type: String.

Example

Query:

SELECT
    cutURLParameter('http://bigmir.net/?a=b&c=d&e=f#g', 'a') as url_without_a,
    cutURLParameter('http://bigmir.net/?a=b&c=d&e=f#g', ['c', 'e']) as url_without_c_and_e;

Result:

┌─url_without_a────────────────┬─url_without_c_and_e──────┐
│ http://bigmir.net/?c=d&e=f#g │ http://bigmir.net/?a=b#g │
└──────────────────────────────┴──────────────────────────┘

UUID

The functions for working with UUID are listed below.

generateUUIDv4

Generates the UUID of version 4.

Syntax

generateUUIDv4([x])

Arguments

Returned value

The UUID type value.

Usage example

This example demonstrates creating a table with the UUID type column and inserting a value into the table.

CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog

INSERT INTO t_uuid SELECT generateUUIDv4()

SELECT * FROM t_uuid
┌────────────────────────────────────x─┐
│ f4bf890f-f9dc-4332-ad5c-0c18e73f28e9 │
└──────────────────────────────────────┘

Usage example if it is needed to generate multiple values in one row

SELECT generateUUIDv4(1), generateUUIDv4(2)
┌─generateUUIDv4(1)────────────────────┬─generateUUIDv4(2)────────────────────┐
│ 2d49dc6e-ddce-4cd0-afb8-790956df54c1 │ 8abf8c13-7dea-4fdf-af3e-0e18767770e6 │
└──────────────────────────────────────┴──────────────────────────────────────┘

toUUID (x)

Converts String type value to UUID type.

toUUID(String)

Returned value

The UUID type value.

Usage example

SELECT toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0') AS uuid
┌─────────────────────────────────uuid─┐
│ 61f0c404-5cb3-11e7-907b-a6006ad3dba0 │
└──────────────────────────────────────┘

toUUIDOrNull (x)

It takes an argument of type String and tries to parse it into UUID. If failed, returns NULL.

toUUIDOrNull(String)

Returned value

The Nullable(UUID) type value.

Usage example

SELECT toUUIDOrNull('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
┌─uuid─┐
│ ᴺᵁᴸᴸ │
└──────┘

toUUIDOrZero (x)

It takes an argument of type String and tries to parse it into UUID. If failed, returns zero UUID.

toUUIDOrZero(String)

Returned value

The UUID type value.

Usage example

SELECT toUUIDOrZero('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
┌─────────────────────────────────uuid─┐
│ 00000000-0000-0000-0000-000000000000 │
└──────────────────────────────────────┘

UUIDStringToNum

Accepts string containing 36 characters in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, and returns a FixedString(16) as its binary representation, with its format optionally specified by variant (Big-endian by default).

Syntax

UUIDStringToNum(string[, variant = 1])

Arguments

  • string — String of 36 characters or FixedString(36). String.

  • variant — Integer, representing a variant as specified by RFC4122. 1 = Big-endian (default), 2 = Microsoft.

Returned value

FixedString(16)

Usage examples

SELECT
    '612f3c40-5d3b-217e-707b-6a546a3d7b29' AS uuid,
    UUIDStringToNum(uuid) AS bytes
┌─uuid─────────────────────────────────┬─bytes────────────┐
│ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │ a/<@];!~p{jTj={) │
└──────────────────────────────────────┴──────────────────┘
SELECT
    '612f3c40-5d3b-217e-707b-6a546a3d7b29' AS uuid,
    UUIDStringToNum(uuid, 2) AS bytes
┌─uuid─────────────────────────────────┬─bytes────────────┐
│ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │ @</a;]~!p{jTj={) │
└──────────────────────────────────────┴──────────────────┘

UUIDNumToString

Accepts binary containing a binary representation of a UUID, with its format optionally specified by variant (Big-endian by default), and returns a string containing 36 characters in text format.

Syntax

UUIDNumToString(binary[, variant = 1])

Arguments

  • binaryFixedString(16) as a binary representation of a UUID.

  • variant — Integer, representing a variant as specified by RFC4122. 1 = Big-endian (default), 2 = Microsoft.

Returned value

String.

Usage example

SELECT
    'a/<@];!~p{jTj={)' AS bytes,
    UUIDNumToString(toFixedString(bytes, 16)) AS uuid
┌─bytes────────────┬─uuid─────────────────────────────────┐
│ a/<@];!~p{jTj={) │ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │
└──────────────────┴──────────────────────────────────────┘
SELECT
    '@</a;]~!p{jTj={)' AS bytes,
    UUIDNumToString(toFixedString(bytes, 16), 2) AS uuid
┌─bytes────────────┬─uuid─────────────────────────────────┐
│ @</a;]~!p{jTj={) │ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │
└──────────────────┴──────────────────────────────────────┘

Last updated