Functions
There are at least* two types of functions - regular functions (they are just called “functions”) and aggregate functions. These are completely different concepts. Regular functions work as if they are applied to each row separately (for each row, the result of the function does not depend on the other rows). Aggregate functions accumulate a set of values from various rows (i.e. they depend on the entire set of rows).
In this section we discuss regular functions. For aggregate functions, see the section “Aggregate functions”.
* - There is a third type of function that the ‘arrayJoin’ function belongs to; table functions can also be mentioned separately.*
ARITHMETIC
For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it floats. If there are not enough bits, the highest bit type is taken.
Example
Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.
Overflow is produced the same way as in C++.
plus(a, b), a + b operator
Calculates the sum of the numbers. You can also add integer numbers with a date or date and time. In the case of a date, adding an integer means adding the corresponding number of days. For a date with time, it means adding the corresponding number of seconds.
Example
minus(a, b), a - b operator
Calculates the difference. The result is always signed.
You can also calculate integer numbers from a date or date with time. The idea is the same – see above for ‘plus’.
Example
multiply(a, b), a * b operator
Calculates the product of the numbers.
Example
divide(a, b), a / b operator
Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the ‘intDiv’ function. When dividing by zero you get ‘inf’, ‘-inf’, or ‘nan’.
Example
intDiv(a, b)
Calculates the quotient of the numbers. Divides into integers, rounding down (by the absolute value). An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
intDivOrZero(a, b)
Differs from ‘intDiv’ in that it returns zero when dividing by zero or when dividing a minimal negative number by minus one.
Example
modulo(a, b), a % b operator
Calculates the remainder when dividing a
by b
. The result type is an integer if both inputs are integers. If one of the inputs is a floating-point number, the result is a floating-point number. The remainder is computed like in C++. Truncated division is used for negative numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
moduloOrZero(a, b)
Differs from modulo in that it returns zero when the divisor is zero.
Example
negate(a), -a operator
Calculates a number with the reverse sign. The result is always signed.
Example
abs(a)
Calculates the absolute value of the number (a). That is, if a \< 0, it returns -a. For unsigned types it does not do anything. For signed integer types, it returns an unsigned number.
Example
gcd(a, b)
Returns the greatest common divisor of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
lcm(a, b)
Returns the least common multiple of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
ARRAY
empty
Checks whether the input array is empty.
Syntax
An array is considered empty if it does not contain any elements.
NOTE
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT empty(arr) FROM TABLE;
transforms to SELECT arr.size0 = 0 FROM TABLE;
.
The function also works for strings or UUID.
Arguments
[x]
— Input array. Array.
Returned value
Returns
1
for an empty array or0
for a non-empty array.
Type: UInt8.
Example
Query:
Result:
notEmpty
Checks whether the input array is non-empty.
Syntax
An array is considered non-empty if it contains at least one element.
NOTE
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT notEmpty(arr) FROM table
transforms to SELECT arr.size0 != 0 FROM TABLE
.
The function also works for strings or UUID.
Arguments
[x]
— Input array. Array.
Returned value
Returns
1
for a non-empty array or0
for an empty array.
Type: UInt8.
Example
Query:
Result:
length
Returns the number of items in the array. The result type is UInt64. The function also works for strings.
Syntax
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT length(arr) FROM table
transforms to SELECT arr.size0 FROM TABLE
.
Example
emptyArrayUInt8, emptyArrayUInt16, emptyArrayUInt32, emptyArrayUInt64
emptyArrayInt8, emptyArrayInt16, emptyArrayInt32, emptyArrayInt64
emptyArrayFloat32, emptyArrayFloat64
emptyArrayDate, emptyArrayDateTime
emptyArrayString
Accepts zero arguments and returns an empty array of the appropriate type.
emptyArrayToSingle
Accepts an empty array and returns a one-element array that is equal to the default value.
range(end), range([start, ] end [, step])
Returns an array of UInt
numbers from start
to end - 1
by step
.
Syntax
Arguments
start
— The first element of the array. Optional, required ifstep
is used. Default value: 0. UIntend
— The number before which the array is constructed. Required. UIntstep
— Determines the incremental step between each element in the array. Optional. Default value: 1. UInt
Returned value
Array of
UInt
numbers fromstart
toend - 1
bystep
.
Implementation details
All arguments must be positive values:
start
,end
,step
areUInt
data types, as well as elements of the returned array.An exception is thrown if query results in arrays with a total length of more than number of elements specified by the function_range_max_elements_in_block setting.
Examples
Query:
Result:
array(x1, …), operator [x1, …]
Creates an array from the function arguments. The arguments must be constants and have types that have the smallest common type. At least one argument must be passed, because otherwise it isn’t clear which type of array to create. That is, you can’t use this function to create an empty array (to do that, use the ‘emptyArray*’ function described above). Returns an ‘Array(T)’ type result, where ‘T’ is the smallest common type out of the passed arguments.
Example
arrayConcat
Combines arrays passed as arguments.
Arguments
arrays
– Arbitrary number of arguments of Array type. Example
has(arr, elem)
Checks whether the ‘arr’ array has the ‘elem’ element. Returns 0 if the element is not in the array, or 1 if it is.
NULL
is processed as a value.
hasAll
Checks whether one array is a subset of another.
Arguments
set
– Array of any type with a set of elements.subset
– Array of any type with elements that should be tested to be a subset ofset
.
Return values
1
, ifset
contains all of the elements fromsubset
.0
, otherwise.
Peculiar properties
An empty array is a subset of any array.
Null
processed as a value.Order of values in both of arrays does not matter.
Examples
SELECT hasAll([], [])
returns 1.
SELECT hasAll([1, Null], [Null])
returns 1.
SELECT hasAll([1.0, 2, 3, 4], [1, 3])
returns 1.
SELECT hasAll(['a', 'b'], ['a'])
returns 1.
SELECT hasAll([1], ['a'])
returns 0.
SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [3, 5]])
returns 0.
hasAny
Checks whether two arrays have intersection by some elements.
Arguments
array1
– Array of any type with a set of elements.array2
– Array of any type with a set of elements.
Return values
1
, ifarray1
andarray2
have one similar element at least.0
, otherwise.
Peculiar properties
Null
processed as a value.Order of values in both of arrays does not matter.
Examples
SELECT hasAny([1], [])
returns 0
.
SELECT hasAny([Null], [Null, 1])
returns 1
.
SELECT hasAny([-128, 1., 512], [1])
returns 1
.
SELECT hasAny([[1, 2], [3, 4]], ['a', 'c'])
returns 0
.
SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [1, 2]])
returns 1
.
hasSubstr
Checks whether all the elements of array2 appear in array1 in the same exact order. Therefore, the function will return 1, if and only if array1 = prefix + array2 + suffix
.
In other words, the functions will check whether all the elements of array2
are contained in array1
like the hasAll
function. In addition, it will check that the elements are observed in the same order in both array1
and array2
.
For Example:
hasSubstr([1,2,3,4], [2,3])
returns 1. However,hasSubstr([1,2,3,4], [3,2])
will return0
.hasSubstr([1,2,3,4], [1,2,3])
returns 1. However,hasSubstr([1,2,3,4], [1,2,4])
will return0
.
Arguments
array1
– Array of any type with a set of elements.array2
– Array of any type with a set of elements.
Return values
1
, ifarray1
containsarray2
.0
, otherwise.
Peculiar properties
The function will return
1
ifarray2
is empty.Null
processed as a value. In other wordshasSubstr([1, 2, NULL, 3, 4], [2,3])
will return0
. However,hasSubstr([1, 2, NULL, 3, 4], [2,NULL,3])
will return1
Order of values in both of arrays does matter.
Examples
SELECT hasSubstr([], [])
returns 1.
SELECT hasSubstr([1, Null], [Null])
returns 1.
SELECT hasSubstr([1.0, 2, 3, 4], [1, 3])
returns 0.
SELECT hasSubstr(['a', 'b'], ['a'])
returns 1.
SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'b'])
returns 1.
SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'c'])
returns 0.
SELECT hasSubstr([[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4]])
returns 1.
indexOf(arr, x)
Returns the index of the first ‘x’ element (starting from 1) if it is in the array, or 0 if it is not.
Example:
Elements set to NULL
are handled as normal values.
arrayCount([func,] arr1, …)
Returns the number of elements for which func(arr1[i], …, arrN[i])
returns something other than 0. If func
is not specified, it returns the number of non-zero elements in the array.
Note that the arrayCount
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
countEqual(arr, x)
Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem -> elem = x, arr).
NULL
elements are handled as separate values.
Example
arrayEnumerate(arr)
Returns the array [1, 2, 3, …, length (arr) ]
This function is normally used with ARRAY JOIN. It allows counting something just once for each array after applying ARRAY JOIN. Example:
In this example, Reaches is the number of conversions (the strings received after applying ARRAY JOIN), and Hits is the number of pageviews (strings before ARRAY JOIN). In this particular case, you can get the same result in an easier way:
This function can also be used in higher-order functions. For example, you can use it to get array indexes for elements that match a condition.
arrayEnumerateUniq(arr, …)
Returns an array the same size as the source array, indicating for each element what its position is among elements with the same value. For example: arrayEnumerateUniq([10, 20, 10, 30]) = [1, 1, 2, 1].
This function is useful when using ARRAY JOIN and aggregation of array elements. Example:
In this example, each goal ID has a calculation of the number of conversions (each element in the Goals nested data structure is a goal that was reached, which we refer to as a conversion) and the number of sessions. Without ARRAY JOIN, we would have counted the number of sessions as sum(Sign). But in this particular case, the rows were multiplied by the nested Goals structure, so in order to count each session one time after this, we apply a condition to the value of the arrayEnumerateUniq(Goals.ID) function.
The arrayEnumerateUniq function can take multiple arrays of the same size as arguments. In this case, uniqueness is considered for tuples of elements in the same positions in all the arrays.
This is necessary when using ARRAY JOIN with a nested data structure and further aggregation across multiple elements in this structure.
arrayPopBack
Removes the last item from the array.
Arguments
array
– Array.
Example
arrayPopFront
Removes the first item from the array.
Arguments
array
– Array.
Example
arrayPushBack
Adds one item to the end of the array.
Arguments
array
– Array.single_value
– A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets thesingle_value
type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can beNULL
. The function adds aNULL
element to an array, and the type of array elements converts toNullable
.
Example
arrayPushFront
Adds one element to the beginning of the array.
Arguments
array
– Array.single_value
– A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets thesingle_value
type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can beNULL
. The function adds aNULL
element to an array, and the type of array elements converts toNullable
.
Example
arrayResize
Changes the length of the array.
Arguments:
array
— Array.size
— Required length of the array.If
size
is less than the original size of the array, the array is truncated from the right.
If
size
is larger than the initial size of the array, the array is extended to the right withextender
values or default values for the data type of the array items.extender
— Value for extending an array. Can beNULL
.
Returned value:
An array of length size
.
Examples of calls
arraySlice
Returns a slice of the array.
Arguments
array
– Array of data.offset
– Indent from the edge of the array. A positive value indicates an offset on the left, and a negative value is an indent on the right. Numbering of the array items begins with 1.length
– The length of the required slice. If you specify a negative value, the function returns an open slice[offset, array_length - length]
. If you omit the value, the function returns the slice[offset, the_end_of_array]
.
Example
Array elements set to NULL
are handled as normal values.
arraySort([func,] arr, …)
Sorts the elements of the arr
array in ascending order. If the func
function is specified, sorting order is determined by the result of the func
function applied to the elements of the array. If func
accepts multiple arguments, the arraySort
function is passed several arrays that the arguments of func
will correspond to. Detailed examples are shown at the end of arraySort
description.
Example of integer values sorting:
Example of string values sorting:
Consider the following sorting order for the NULL
, NaN
and Inf
values:
-Inf
values are first in the array.NULL
values are last in the array.NaN
values are right beforeNULL
.Inf
values are right beforeNaN
.
Note that arraySort
is a higher-order function. You can pass a lambda function to it as the first argument. In this case, sorting order is determined by the result of the lambda function applied to the elements of the array.
Let’s consider the following example:
For each element of the source array, the lambda function returns the sorting key, that is, [1 –> -1, 2 –> -2, 3 –> -3]. Since the arraySort
function sorts the keys in ascending order, the result is [3, 2, 1]. Thus, the (x) –> -x
lambda function sets the descending order in a sorting.
The lambda function can accept multiple arguments. In this case, you need to pass the arraySort
function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:
Here, the elements that are passed in the second array ([2, 1]) define a sorting key for the corresponding element from the source array ([‘hello’, ‘world’]), that is, [‘hello’ –> 2, ‘world’ –> 1]. Since the lambda function does not use x
, actual values of the source array do not affect the order in the result. So, ‘hello’ will be the second element in the result, and ‘world’ will be the first.
Other examples are shown below.
NOTE
To improve sorting efficiency, the Schwartzian transform is used.
arrayReverseSort([func,] arr, …)
Sorts the elements of the arr
array in descending order. If the func
function is specified, arr
is sorted according to the result of the func
function applied to the elements of the array, and then the sorted array is reversed. If func
accepts multiple arguments, the arrayReverseSort
function is passed several arrays that the arguments of func
will correspond to. Detailed examples are shown at the end of arrayReverseSort
description.
Example of integer values sorting:
Example of string values sorting:
Consider the following sorting order for the NULL
, NaN
and Inf
values:
Inf
values are first in the array.NULL
values are last in the array.NaN
values are right beforeNULL
.-Inf
values are right beforeNaN
.
Note that the arrayReverseSort
is a higher-order function. You can pass a lambda function to it as the first argument. Example is shown below.
The array is sorted in the following way:
At first, the source array ([1, 2, 3]) is sorted according to the result of the lambda function applied to the elements of the array. The result is an array [3, 2, 1].
Array that is obtained on the previous step, is reversed. So, the final result is [1, 2, 3].
The lambda function can accept multiple arguments. In this case, you need to pass the arrayReverseSort
function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:
In this example, the array is sorted in the following way:
At first, the source array ([‘hello’, ‘world’]) is sorted according to the result of the lambda function applied to the elements of the arrays. The elements that are passed in the second array ([2, 1]), define the sorting keys for corresponding elements from the source array. The result is an array [‘world’, ‘hello’].
Array that was sorted on the previous step, is reversed. So, the final result is [‘hello’, ‘world’].
Other examples are shown below.
arrayUniq(arr, …)
If one argument is passed, it counts the number of different elements in the array. If multiple arguments are passed, it counts the number of different tuples of elements at corresponding positions in multiple arrays.
If you want to get a list of unique items in an array, you can use arrayReduce(‘groupUniqArray’, arr).
Example
arrayJoin(arr)
A special function. See the section “ArrayJoin function”.
arrayDifference
Calculates the difference between adjacent array elements. Returns an array where the first element will be 0, the second is the difference between a[1] - a[0]
, etc. The type of elements in the resulting array is determined by the type inference rules for subtraction (e.g. UInt8
- UInt8
= Int16
).
Syntax
Arguments
array
– Array.
Returned values
Returns an array of differences between adjacent elements.
Example
Query:
Result:
Example of the overflow due to result type Int64:
Query:
Result:
arrayDistinct
Takes an array, returns an array containing the distinct elements only.
Syntax
Arguments
array
– Array.
Returned values
Returns an array containing the distinct elements.
Example
Query:
Result:
arrayEnumerateDense(arr)
Returns an array of the same size as the source array, indicating where each element first appears in the source array.
Example
arrayIntersect(arr)
Takes multiple arrays, returns an array with elements that are present in all source arrays.
Example
arrayReduce
Applies an aggregate function to array elements and returns its result. The name of the aggregation function is passed as a string in single quotes 'max'
, 'sum'
. When using parametric aggregate functions, the parameter is indicated after the function name in parentheses 'uniqUpTo(6)'
.
Syntax
Arguments
agg_func
— The name of an aggregate function which should be a constant string.arr
— Any number of array type columns as the parameters of the aggregation function.
Returned value
Example
Query:
Result:
If an aggregate function takes multiple arguments, then this function must be applied to multiple arrays of the same size.
Query:
Result:
Example with a parametric aggregate function:
Query:
Result:
arrayReduceInRanges
Applies an aggregate function to array elements in given ranges and returns an array containing the result corresponding to each range. The function will return the same result as multiple arrayReduce(agg_func, arraySlice(arr1, index, length), ...)
.
Syntax
Arguments
agg_func
— The name of an aggregate function which should be a constant string.arr
— Any number of Array type columns as the parameters of the aggregation function.
Returned value
Array containing results of the aggregate function over specified ranges.
Type: Array.
Example
Query:
Result:
arrayReverse(arr)
Returns an array of the same size as the original array containing the elements in reverse order.
Example:
reverse(arr)
Synonym for “arrayReverse”
arrayFlatten
Converts an array of arrays to a flat array.
Function:
Applies to any depth of nested arrays.
Does not change arrays that are already flat.
The flattened array contains all the elements from all source arrays.
Syntax
Alias: flatten
.
Arguments
array_of_arrays
— Array of arrays. For example,[[1,2,3], [4,5]]
.
Examples
arrayCompact
Removes consecutive duplicate elements from an array. The order of result values is determined by the order in the source array.
Syntax
Arguments
arr
— The array to inspect.
Returned value
The array without duplicate.
Type: Array
.
Example
Query:
Result:
arrayZip
Combines multiple arrays into a single array. The resulting array contains the corresponding elements of the source arrays grouped into tuples in the listed order of arguments.
Syntax
Arguments
arrN
— Array.
The function can take any number of arrays of different types. All the input arrays must be of equal size.
Returned value
Array with elements from the source arrays grouped into tuples. Data types in the tuple are the same as types of the input arrays and in the same order as arrays are passed.
Type: Array.
Example
Query:
Result:
arrayAUC
Calculate AUC (Area Under the Curve, which is a concept in machine learning, see more details: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
Syntax
Arguments
arr_scores
— scores prediction model gives.arr_labels
— labels of samples, usually 1 for positive sample and 0 for negtive sample.
Returned value
Returns AUC value with type Float64.
Example
Query:
Result:
arrayMap(func, arr1, …)
Returns an array obtained from the original arrays by application of func(arr1[i], …, arrN[i])
for each element. Arrays arr1
… arrN
must have the same number of elements.
Examples
The following example shows how to create a tuple of elements from different arrays:
Note that the arrayMap
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayFilter(func, arr1, …)
Returns an array containing only the elements in arr1
for which func(arr1[i], …, arrN[i])
returns something other than 0.
Examples
Note that the arrayFilter
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayFill(func, arr1, …)
Scan through arr1
from the first element to the last element and replace arr1[i]
by arr1[i - 1]
if func(arr1[i], …, arrN[i])
returns 0. The first element of arr1
will not be replaced.
Examples
Note that the arrayFill
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayReverseFill(func, arr1, …)
Scan through arr1
from the last element to the first element and replace arr1[i]
by arr1[i + 1]
if func(arr1[i], …, arrN[i])
returns 0. The last element of arr1
will not be replaced.
Examples:
Note that the arrayReverseFill
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arraySplit(func, arr1, …)
Split arr1
into multiple arrays. When func(arr1[i], …, arrN[i])
returns something other than 0, the array will be split on the left hand side of the element. The array will not be split before the first element.
Examples:
Note that the arraySplit
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayReverseSplit(func, arr1, …)
Split arr1
into multiple arrays. When func(arr1[i], …, arrN[i])
returns something other than 0, the array will be split on the right hand side of the element. The array will not be split after the last element.
Examples:
Note that the arrayReverseSplit
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayExists([func,] arr1, …)
Returns 1 if there is at least one element in arr
for which func(arr1[i], …, arrN[i])
returns something other than 0. Otherwise, it returns 0.
Note that the arrayExists
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
arrayAll([func,] arr1, …)
Returns 1 if func(arr1[i], …, arrN[i])
returns something other than 0 for all the elements in arrays. Otherwise, it returns 0.
Note that the arrayAll
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
arrayFirst(func, arr1, …)
Returns the first element in the arr1
array for which func(arr1[i], …, arrN[i])
returns something other than 0.
Note that the arrayFirst
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
Example
arrayFirstIndex(func, arr1, …)
Returns the index of the first element in the arr1
array for which func(arr1[i], …, arrN[i])
returns something other than 0.
Note that the arrayFirstIndex
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
Example
arrayMin
Returns the minimum of elements in the source array.
If the func
function is specified, returns the mininum of elements converted by this function.
Note that the arrayMin
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The minimum of function values (or the array minimum).
Type: if func
is specified, matches func
return value type, else matches the array elements type.
Examples
Query:
Result:
Query:
Result:
arrayMax
Returns the maximum of elements in the source array.
If the func
function is specified, returns the maximum of elements converted by this function.
Note that the arrayMax
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The maximum of function values (or the array maximum).
Type: if func
is specified, matches func
return value type, else matches the array elements type.
Examples
Query:
Result:
Query:
Result:
arraySum
Returns the sum of elements in the source array.
If the func
function is specified, returns the sum of elements converted by this function.
Note that the arraySum
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The sum of the function values (or the array sum).
Type: for decimal numbers in source array (or for converted values, if func
is specified) — Decimal128, for floating point numbers — Float64, for numeric unsigned — UInt64, and for numeric signed — Int64.
Examples
Query:
Result:
Query:
Result:
arrayAvg
Returns the average of elements in the source array.
If the func
function is specified, returns the average of elements converted by this function.
Note that the arrayAvg
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The average of function values (or the array average).
Type: Float64.
Examples
Query:
Result:
Query:
Result:
arrayCumSum([func,] arr1, …)
Returns an array of partial sums of elements in the source array (a running sum). If the func
function is specified, then the values of the array elements are converted by func(arr1[i], …, arrN[i])
before summing.
Example:
Note that the arrayCumSum
is a higher-order function. You can pass a lambda function to it as the first argument.
arrayCumSumNonNegative(arr)
Same as arrayCumSum
, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum
, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example:
Note that the arraySumNonNegative
is a higher-order function. You can pass a lambda function to it as the first argument.
arrayProduct
Multiplies elements of an array.
Syntax
Arguments
arr
— Array of numeric values.
Returned value
A product of array's elements.
Type: Float64.
Examples
Query:
Result:
Query:
Return value type is always Float64. Result:
BIT
Bit functions work for any pair of types from UInt8
, UInt16
, UInt32
, UInt64
, Int8
, Int16
, Int32
, Int64
, Float32
, or Float64
. Some functions support String
and FixedString
types.
The result type is an integer with bits equal to the maximum bits of its arguments. If at least one of the arguments is signed, the result is a signed number. If an argument is a floating-point number, it is cast to Int64.
bitAnd(a, b)
bitOr(a, b)
bitXor(a, b)
bitNot(a)
bitShiftLeft(a, b)
Shifts the binary representation of a value to the left by a specified number of bit positions.
A FixedString
or a String
is treated as a single multibyte value.
Bits of a FixedString
value are lost as they are shifted out. On the contrary, a String
value is extended with additional bytes, so no bits are lost.
Syntax
Arguments
a
— A value to shift. Integer types, String or FixedString.b
— The number of shift positions. Unsigned integer types, 64 bit types or less are allowed.
Returned value
Shifted value.
The type of the returned value is the same as the type of the input value.
Example
In the following queries bin and hex functions are used to show bits of shifted values.
Result:
bitShiftRight(a, b)
Shifts the binary representation of a value to the right by a specified number of bit positions.
A FixedString
or a String
is treated as a single multibyte value. Note that the length of a String
value is reduced as bits are shifted out.
Syntax
Arguments
a
— A value to shift. Integer types, String or FixedString.b
— The number of shift positions. Unsigned integer types, 64 bit types or less are allowed.
Returned value
Shifted value.
The type of the returned value is the same as the type of the input value.
Example
Query:
Result:
bitRotateLeft(a, b)
bitRotateRight(a, b)
bitTest
Takes any integer and converts it into binary form, returns the value of a bit at specified position. The countdown starts from 0 from the right to the left.
Syntax
Arguments
number
– Integer number.index
– Position of bit.
Returned values
Returns a value of bit at specified position.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
Result:
Another example:
Query:
Result:
bitTestAll
Returns result of logical conjuction (AND operator) of all bits at given positions. The countdown starts from 0 from the right to the left.
The conjuction for bitwise operations:
0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1
Syntax
Arguments
number
– Integer number.index1
,index2
,index3
,index4
– Positions of bit. For example, for set of positions (index1
,index2
,index3
,index4
) is true if and only if all of its positions are true (index1
⋀index2
, ⋀index3
⋀index4
).
Returned values
Returns result of logical conjuction.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
Result:
Another example:
Query:
Result:
bitTestAny
Returns result of logical disjunction (OR operator) of all bits at given positions. The countdown starts from 0 from the right to the left.
The disjunction for bitwise operations:
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
Syntax
Arguments
number
– Integer number.index1
,index2
,index3
,index4
– Positions of bit.
Returned values
Returns result of logical disjuction.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
Result:
Another example:
Query:
Result:
bitCount
Calculates the number of bits set to one in the binary representation of a number.
Syntax
Arguments
x
— Integer or floating-point number. The function uses the value representation in memory. It allows supporting floating-point numbers.
Returned value
Number of bits set to one in the input number.
The function does not convert input value to a larger type (sign extension). So, for example, bitCount(toUInt8(-1)) = 8
.
Type: UInt8
.
Example
Take for example the number 333. Its binary representation: 0000000101001101.
Query:
Result:
bitHammingDistance
Returns the Hamming Distance between the bit representations of two integer values. Can be used with SimHash functions for detection of semi-duplicate strings. The smaller is the distance, the more likely those strings are the same.
Syntax
Arguments
Returned value
The Hamming distance.
Type: UInt8.
Examples
Query:
Result:
With SimHash:
Result:
BITMAP
Bitmap functions work for two bitmaps Object value calculation, it is to return new bitmap or cardinality while using formula calculation, such as and, or, xor, and not, etc.
There are 2 kinds of construction methods for Bitmap Object. One is to be constructed by aggregation function groupBitmap with -State, the other is to be constructed by Array Object. It is also to convert Bitmap Object to Array Object.
RoaringBitmap is wrapped into a data structure while actual storage of Bitmap objects. When the cardinality is less than or equal to 32, it uses Set objet. When the cardinality is greater than 32, it uses RoaringBitmap object. That is why storage of low cardinality set is faster.
For more information on RoaringBitmap, see: CRoaring.
bitmapBuild
Build a bitmap from unsigned integer array.
Arguments
array
– Unsigned integer array.
Example
bitmapToArray
Convert bitmap to integer array.
Arguments
bitmap
– Bitmap object.
Example
bitmapSubsetInRange
Return subset in specified range (not include the range_end).
Arguments
bitmap
– Bitmap object.range_start
– Range start point. Type: UInt32.range_end
– Range end point (excluded). Type: UInt32.
Example
bitmapSubsetLimit
Creates a subset of bitmap with n elements taken between range_start
and cardinality_limit
.
Syntax
Arguments
bitmap
– Bitmap object.range_start
– The subset starting point. Type: UInt32.cardinality_limit
– The subset cardinality upper limit. Type: UInt32.
Returned value
The subset.
Type: Bitmap object.
Example
Query:
Result:
subBitmap
Returns the bitmap elements, starting from the offset
position. The number of returned elements is limited by the cardinality_limit
parameter. Analog of the substring) string function, but for bitmap.
Syntax
Arguments
bitmap
– The bitmap. Type: Bitmap object.offset
– The position of the first element of the subset. Type: UInt32.cardinality_limit
– The maximum number of elements in the subset. Type: UInt32.
Returned value
The subset.
Type: Bitmap object.
Example
Query:
Result:
bitmapContains
Checks whether the bitmap contains an element.
Arguments
haystack
– Bitmap object, where the function searches.needle
– Value that the function searches. Type: UInt32.
Returned values
0 — If
haystack
does not containneedle
.1 — If
haystack
containsneedle
.
Type: UInt8
.
Example
bitmapHasAny
Checks whether two bitmaps have intersection by some elements.
If you are sure that bitmap2
contains strictly one element, consider using the bitmapContains function. It works more efficiently.
Arguments
bitmap*
– Bitmap object.
Return values
1
, ifbitmap1
andbitmap2
have one similar element at least.0
, otherwise.
Example
bitmapHasAll
Analogous to hasAll(array, array)
returns 1 if the first bitmap contains all the elements of the second one, 0 otherwise. If the second argument is an empty bitmap then returns 1.
Arguments
bitmap
– Bitmap object.
Example
bitmapCardinality
Retrun bitmap cardinality of type UInt64.
Arguments
bitmap
– Bitmap object.
Example
bitmapMin
Retrun the smallest value of type UInt64 in the set, UINT32_MAX if the set is empty.
Arguments
bitmap
– Bitmap object.
Example
bitmapMax
Retrun the greatest value of type UInt64 in the set, 0 if the set is empty.
Arguments
bitmap
– Bitmap object.
Example
bitmapTransform
Transform an array of values in a bitmap to another array of values, the result is a new bitmap.
Arguments
bitmap
– Bitmap object.from_array
– UInt32 array. For idx in range [0, from_array.size()), if bitmap contains from_array[idx], then replace it with to_array[idx]. Note that the result depends on array ordering if there are common elements between from_array and to_array.to_array
– UInt32 array, its size shall be the same to from_array.
Example
bitmapAnd
Two bitmap and calculation, the result is a new bitmap.
Arguments
bitmap
– Bitmap object.
Example
bitmapOr
Two bitmap or calculation, the result is a new bitmap.
Arguments
bitmap
– Bitmap object.
Example
bitmapXor
Two bitmap xor calculation, the result is a new bitmap.
Arguments
bitmap
– Bitmap object.
Example
bitmapAndnot
Two bitmap andnot calculation, the result is a new bitmap.
Arguments
bitmap
– Bitmap object.
Example
bitmapAndCardinality
Two bitmap and calculation, return cardinality of type UInt64.
Arguments
bitmap
– Bitmap object.
Example
bitmapOrCardinality
Two bitmap or calculation, return cardinality of type UInt64.
Arguments
bitmap
– Bitmap object.
Example
bitmapXorCardinality
Two bitmap xor calculation, return cardinality of type UInt64.
Arguments
bitmap
– Bitmap object.
Example
bitmapAndnotCardinality
Two bitmap andnot calculation, return cardinality of type UInt64.
Arguments
bitmap
– Bitmap object.
Example
CONDITIONAL
if
Controls conditional branching. Unlike most systems, ClickHouse always evaluates both expressions then
and else
.
Syntax
If the condition cond
evaluates to a non-zero value, returns the result of the expression then
, and the result of the expression else
, if present, is skipped. If the cond
is zero or NULL
, then the result of the then
expression is skipped and the result of the else
expression, if present, is returned.
You can use the short_circuit_function_evaluation setting to calculate the if
function according to a short scheme. If this setting is enabled, then
expression is evaluated only on rows where cond
is true, else
expression – where cond
is false. For example, an exception about division by zero is not thrown when executing the query SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10)
, because intDiv(42, number)
will be evaluated only for numbers that doesn't satisfy condition number = 0
.
Arguments
cond
– The condition for evaluation that can be zero or not. The type is UInt8, Nullable(UInt8) or NULL.then
– The expression to return if condition is met.else
– The expression to return if condition is not met.
Returned values
The function executes then
and else
expressions and returns its result, depending on whether the condition cond
ended up being zero or not.
Example
Query:
Result:
Query:
Result:
then
andelse
must have the lowest common type.
Example:
Take this LEFT_RIGHT
table:
The following query compares left
and right
values:
Note: NULL
values are not used in this example, check NULL values in conditionals section.
Ternary Operator
It works same as if
function.
Syntax: cond ? then : else
Returns then
if the cond
evaluates to be true (greater than zero), otherwise returns else
.
cond
must be of type ofUInt8
, andthen
andelse
must have the lowest common type.then
andelse
can beNULL
See also
multiIf
Allows you to write the CASE operator more compactly in the query.
Syntax
You can use the short_circuit_function_evaluation setting to calculate the multiIf
function according to a short scheme. If this setting is enabled, then_i
expression is evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}) AND cond_i)
is true, cond_i
will be evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}))
is true. For example, an exception about division by zero is not thrown when executing the query SELECT multiIf(number = 2, intDiv(1, number), number = 5) FROM numbers(10)
.
Arguments
cond_N
— The condition for the function to returnthen_N
.then_N
— The result of the function when executed.else
— The result of the function if none of the conditions is met.
The function accepts 2N+1
parameters.
Returned values
The function returns one of the values then_N
or else
, depending on the conditions cond_N
.
Example
Again using LEFT_RIGHT
table.
Case
DATES AND TIMES
Support for time zones.
All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument. Example: Asia/Yekaterinburg. In this case, they use the specified time zone instead of the local (default) one.
timeZone
Returns the timezone of the server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
Syntax
Alias: timezone
.
Returned value
Timezone.
Type: String.
toTimeZone
Converts time or date and time to the specified time zone. The time zone is an attribute of the Date
and DateTime
data types. The internal value (number of seconds) of the table field or of the resultset's column does not change, the column's type changes and its string representation changes accordingly.
Syntax
Alias: toTimezone
.
Arguments
value
— Time or date and time. DateTime64.timezone
— Timezone for the returned value. String. This argument is a constant, becausetoTimezone
changes the timezone of a column (timezone is an attribute ofDateTime*
types).
Returned value
Date and time.
Type: DateTime.
Example
Query:
Result:
toTimeZone(time_utc, 'Asia/Yekaterinburg')
changes the DateTime('UTC')
type to DateTime('Asia/Yekaterinburg')
. The value (Unixtimestamp) 1546300800 stays the same, but the string representation (the result of the toString() function) changes from time_utc: 2019-01-01 00:00:00
to time_yekat: 2019-01-01 05:00:00
.
timeZoneOf
Returns the timezone name of DateTime or DateTime64 data types.
Syntax
Alias: timezoneOf
.
Arguments
value
— Date and time. DateTime or DateTime64.
Returned value
Timezone name.
Type: String.
Example
Query:
Result:
timeZoneOffset
Returns a timezone offset in seconds from UTC. The function takes into account daylight saving time and historical timezone changes at the specified date and time. IANA timezone database is used to calculate the offset.
Syntax
Alias: timezoneOffset
.
Arguments
value
— Date and time. DateTime or DateTime64.
Returned value
Offset from UTC in seconds.
Type: Int32.
Example
Query:
Result:
toYear
Converts a date or date with time to a UInt16 number containing the year number (AD).
Alias: YEAR
.
toQuarter
Converts a date or date with time to a UInt8 number containing the quarter number.
Alias: QUARTER
.
toMonth
Converts a date or date with time to a UInt8 number containing the month number (1-12).
Alias: MONTH
.
toDayOfYear
Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
Alias: DAYOFYEAR
.
toDayOfMonth
Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
Aliases: DAYOFMONTH
, DAY
.
toDayOfWeek
Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
Alias: DAYOFWEEK
.
toHour
Converts a date with time to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true – even in Moscow the clocks were twice changed at a different time).
Alias: HOUR
.
toMinute
Converts a date with time to a UInt8 number containing the number of the minute of the hour (0-59).
Alias: MINUTE
.
toSecond
Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for.
Alias: SECOND
.
toUnixTimestamp
For DateTime argument: converts value to the number with type UInt32 -- Unix Timestamp (https://en.wikipedia.org/wiki/Unix_time). For String argument: converts the input string to the datetime according to the timezone (optional second argument, server timezone is used by default) and returns the corresponding unix timestamp.
Syntax
Returned value
Returns the unix timestamp.
Type: UInt32
.
Example
Query:
Result:
NOTE
The return type of toStartOf*
, toLastDayOfMonth
, toMonday
, timeSlot
functions described below is determined by the configuration parameter enable_extended_results_for_datetime_functions which is 0
by default.
Behavior for
enable_extended_results_for_datetime_functions = 0
: FunctionstoStartOfYear
,toStartOfISOYear
,toStartOfQuarter
,toStartOfMonth
,toStartOfWeek
,toLastDayOfMonth
,toMonday
returnDate
orDateTime
. FunctionstoStartOfDay
,toStartOfHour
,toStartOfFifteenMinutes
,toStartOfTenMinutes
,toStartOfFiveMinutes
,toStartOfMinute
,timeSlot
returnDateTime
. Though these functions can take values of the extended typesDate32
andDateTime64
as an argument, passing them a time outside the normal range (year 1970 to 2149 forDate
/ 2106 forDateTime
) will produce wrong results.enable_extended_results_for_datetime_functions = 1
:Functions
toStartOfYear
,toStartOfISOYear
,toStartOfQuarter
,toStartOfMonth
,toStartOfWeek
,toLastDayOfMonth
,toMonday
returnDate
orDateTime
if their argument is aDate
orDateTime
, and they returnDate32
orDateTime64
if their argument is aDate32
orDateTime64
.Functions
toStartOfDay
,toStartOfHour
,toStartOfFifteenMinutes
,toStartOfTenMinutes
,toStartOfFiveMinutes
,toStartOfMinute
,timeSlot
returnDateTime
if their argument is aDate
orDateTime
, and they returnDateTime64
if their argument is aDate32
orDateTime64
.
toStartOfYear
Rounds down a date or date with time to the first day of the year. Returns the date.
toStartOfISOYear
Rounds down a date or date with time to the first day of ISO year. Returns the date.
toStartOfQuarter
Rounds down a date or date with time to the first day of the quarter. The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. Returns the date.
toStartOfMonth
Rounds down a date or date with time to the first day of the month. Returns the date.
NOTE
The behavior of parsing incorrect dates is implementation specific. ClickHouse may return zero date, throw an exception or do “natural” overflow.
If toLastDayOfMonth
is called with an argument of type Date
greater then 2149-05-31, the result will be calculated from the argument 2149-05-31 instead.
toMonday
Rounds down a date or date with time to the nearest Monday. Returns the date.
toStartOfWeek(t[,mode])
Rounds down a date or date with time to the nearest Sunday or Monday by mode. Returns the date. The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.
toStartOfDay
Rounds down a date with time to the start of the day.
toStartOfHour
Rounds down a date with time to the start of the hour.
toStartOfMinute
Rounds down a date with time to the start of the minute.
toStartOfSecond
Truncates sub-seconds.
Syntax
Arguments
value
— Date and time. DateTime64.
Returned value
Input value without sub-seconds.
Type: DateTime64.
Examples
Query without timezone:
Result:
Query with timezone:
Result:
See also
Timezone server configuration parameter.
toStartOfFiveMinutes
Rounds down a date with time to the start of the five-minute interval.
toStartOfTenMinutes
Rounds down a date with time to the start of the ten-minute interval.
toStartOfFifteenMinutes
Rounds down the date with time to the start of the fifteen-minute interval.
toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone])
This is a generalization of other functions named toStartOf*
. For example, toStartOfInterval(t, INTERVAL 1 year)
returns the same as toStartOfYear(t)
, toStartOfInterval(t, INTERVAL 1 month)
returns the same as toStartOfMonth(t)
, toStartOfInterval(t, INTERVAL 1 day)
returns the same as toStartOfDay(t)
, toStartOfInterval(t, INTERVAL 15 minute)
returns the same as toStartOfFifteenMinutes(t)
etc.
toTime
Converts a date with time to a certain fixed date, while preserving the time.
toRelativeYearNum
Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.
toRelativeQuarterNum
Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past.
toRelativeMonthNum
Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.
toRelativeWeekNum
Converts a date with time or date to the number of the week, starting from a certain fixed point in the past.
toRelativeDayNum
Converts a date with time or date to the number of the day, starting from a certain fixed point in the past.
toRelativeHourNum
Converts a date with time or date to the number of the hour, starting from a certain fixed point in the past.
toRelativeMinuteNum
Converts a date with time or date to the number of the minute, starting from a certain fixed point in the past.
toRelativeSecondNum
Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.
toISOYear
Converts a date or date with time to a UInt16 number containing the ISO Year number.
toISOWeek
Converts a date or date with time to a UInt8 number containing the ISO Week number.
toWeek(date[,mode])
This function returns the week number for date or datetime. The two-argument form of toWeek() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the default mode is 0. toISOWeek()
is a compatibility function that is equivalent to toWeek(date,3)
. The following table describes how the mode argument works.
Mode | First day of week | Range | Week 1 is the first week … |
---|---|---|---|
0 | Sunday | 0-53 | with a Sunday in this year |
1 | Monday | 0-53 | with 4 or more days this year |
2 | Sunday | 1-53 | with a Sunday in this year |
3 | Monday | 1-53 | with 4 or more days this year |
4 | Sunday | 0-53 | with 4 or more days this year |
5 | Monday | 0-53 | with a Monday in this year |
6 | Sunday | 1-53 | with 4 or more days this year |
7 | Monday | 1-53 | with a Monday in this year |
8 | Sunday | 1-53 | contains January 1 |
9 | Monday | 1-53 | contains January 1 |
For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:
If the week containing January 1 has 4 or more days in the new year, it is week 1.
Otherwise, it is the last week of the previous year, and the next week is week 1.
For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It does not matter how many days in the new year the week contained, even if it contained only one day.
Arguments
date
– Date or DateTime.mode
– Optional parameter, Range of values is [0,9], default is 0.Timezone
– Optional parameter, it behaves like any other conversion function.
Example
toYearWeek(date[,mode])
Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.
The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.
toISOYear()
is a compatibility function that is equivalent to intDiv(toYearWeek(date,3),100)
.
Example
date_trunc
Truncates date and time data to the specified part of date.
Syntax
Alias: dateTrunc
.
Arguments
unit
— The type of interval to truncate the result. String Literal. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Date and time. DateTime or DateTime64.timezone
— Timezone name for the returned value (optional). If not specified, the function uses the timezone of thevalue
parameter. String.
Returned value
Value, truncated to the specified part of date.
Type: DateTime.
Example
Query without timezone:
Result:
Query with the specified timezone:
Result:
See Also
date_add
Adds the time interval or date interval to the provided date or date with time.
Syntax
Aliases: dateAdd
, DATE_ADD
.
Arguments
unit
— The type of interval to add. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to add. Int.
Returned value
Date or date with time obtained by adding value
, expressed in unit
, to date
.
Example
Query:
Result:
date_diff
Returns the difference between two dates or dates with time values. The difference is calculated using relative units, e.g. the difference between 2022-01-01
and 2021-12-29
is 3 days for day unit (see toRelativeDayNum), 1 month for month unit (see toRelativeMonthNum), 1 year for year unit (see toRelativeYearNum).
Syntax
Aliases: dateDiff
, DATE_DIFF
.
Arguments
unit
— The type of interval for result. String. Possible values:second
minute
hour
day
week
month
quarter
year
startdate
— The first time value to subtract (the subtrahend). Date, Date32, DateTime or DateTime64.enddate
— The second time value to subtract from (the minuend). Date, Date32, DateTime or DateTime64.timezone
— Timezone name (optional). If specified, it is applied to bothstartdate
andenddate
. If not specified, timezones ofstartdate
andenddate
are used. If they are not the same, the result is unspecified. String.
Returned value
Difference between enddate
and startdate
expressed in unit
.
Type: Int.
Example
Query:
Result:
Query:
Result:
date_sub
Subtracts the time interval or date interval from the provided date or date with time.
Syntax
Aliases: dateSub
, DATE_SUB
.
Arguments
unit
— The type of interval to subtract. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to subtract. Int.
Returned value
Date or date with time obtained by subtracting value
, expressed in unit
, from date
.
Example
Query:
Result:
timestamp_add
Adds the specified time value with the provided date or date time value.
Syntax
Aliases: timeStampAdd
, TIMESTAMP_ADD
.
Arguments
value
— Value of interval to add. Int.unit
— The type of interval to add. String. Possible values:second
minute
hour
day
week
month
quarter
year
Returned value
Date or date with time with the specified value
expressed in unit
added to date
.
Example
Query:
Result:
timestamp_sub
Subtracts the time interval from the provided date or date with time.
Syntax
Aliases: timeStampSub
, TIMESTAMP_SUB
.
Arguments
unit
— The type of interval to subtract. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to subtract. Int.
Returned value
Date or date with time obtained by subtracting value
, expressed in unit
, from date
.
Example
Query:
Result:
now
Returns the current date and time at the moment of query analysis. The function is a constant expression.
Syntax
Arguments
timezone
— Timezone name for the returned value (optional). String.
Returned value
Current date and time.
Type: DateTime.
Example
Query without timezone:
Result:
Query with the specified timezone:
Result:
Returns the current date and time with sub-second precision at the moment of query analysis. The function is a constant expression.
Syntax
Arguments
scale
- Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ]. Typically are used - 3 (default) (milliseconds), 6 (microseconds), 9 (nanoseconds).timezone
— Timezone name for the returned value (optional). String.
Returned value
Current date and time with sub-second precision.
Type: DateTime64.
Example
Result:
today
Accepts zero arguments and returns the current date at one of the moments of query analysis. The same as ‘toDate(now())’.
yesterday
Accepts zero arguments and returns yesterday’s date at one of the moments of query analysis. The same as ‘today() - 1’.
timeSlot
Rounds the time to the half hour.
toYYYYMM
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM).
toYYYYMMDD
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).
toYYYYMMDDhhmmss
Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters
Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters
Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
formatDateTime
Formats a Time according to the given Format string. Format is a constant expression, so you cannot have multiple formats for a single result column.
formatDateTime uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.
Syntax
Returned value(s)
Returns time and date values according to the determined format.
Replacement fields Using replacement fields, you can define a pattern for the resulting string. “Example” column shows formatting result for 2018-01-02 22:33:44
.
Placeholder | Description | Example |
---|---|---|
%C | year divided by 100 and truncated to integer (00-99) | 20 |
%d | day of the month, zero-padded (01-31) | 02 |
%D | Short MM/DD/YY date, equivalent to %m/%d/%y | 01/02/18 |
%e | day of the month, space-padded ( 1-31) | 2 |
%f | fractional second from the fractional part of DateTime64 | 1234560 |
%F | short YYYY-MM-DD date, equivalent to %Y-%m-%d | 2018-01-02 |
%G | four-digit year format for ISO week number, calculated from the week-based year defined by the ISO 8601 standard, normally useful only with %V | 2018 |
%g | two-digit year format, aligned to ISO 8601, abbreviated from four-digit notation | 18 |
%H | hour in 24h format (00-23) | 22 |
%I | hour in 12h format (01-12) | 10 |
%j | day of the year (001-366) | 002 |
%m | month as a decimal number (01-12) | 01 |
%M | minute (00-59) | 33 |
%n | new-line character (‘’) | |
%p | AM or PM designation | PM |
%Q | Quarter (1-4) | 1 |
%R | 24-hour HH:MM time, equivalent to %H:%M | 22:33 |
%S | second (00-59) | 44 |
%t | horizontal-tab character (’) | |
%T | ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S | 22:33:44 |
%u | ISO 8601 weekday as number with Monday as 1 (1-7) | 2 |
%V | ISO 8601 week number (01-53) | 01 |
%w | weekday as a decimal number with Sunday as 0 (0-6) | 2 |
%y | Year, last two digits (00-99) | 18 |
%Y | Year | 2018 |
%z | Time offset from UTC as +HHMM or -HHMM | -0500 |
%% | a % sign | % |
Example
Query:
Result:
Query:
Result:
See Also
formatDateTimeInJodaSyntax
Similar to formatDateTime, except that it formats datetime in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.
Replacement fields
Using replacement fields, you can define a pattern for the resulting string.
Placeholder | Description | Presentation | Examples |
---|---|---|---|
G | era | text | AD |
C | century of era (>=0) | number | 20 |
Y | year of era (>=0) | year | 1996 |
x | weekyear(not supported yet) | year | 1996 |
w | week of weekyear(not supported yet) | number | 27 |
e | day of week | number | 2 |
E | day of week | text | Tuesday; Tue |
y | year | year | 1996 |
D | day of year | number | 189 |
M | month of year | month | July; Jul; 07 |
d | day of month | number | 10 |
a | halfday of day | text | PM |
K | hour of halfday (0~11) | number | 0 |
h | clockhour of halfday (1~12) | number | 12 |
H | hour of day (0~23) | number | 0 |
k | clockhour of day (1~24) | number | 24 |
m | minute of hour | number | 30 |
s | second of minute | number | 55 |
S | fraction of second(not supported yet) | number | 978 |
z | time zone(short name not supported yet) | text | Pacific Standard Time; PST |
Z | time zone offset/id(not supported yet) | zone | -0800; -08:00; America/Los_Angeles |
' | escape for text | delimiter | |
'' | single quote | literal | ' |
Example
Query:
Result:
dateName
Returns specified part of date.
Syntax
Arguments
date_part
— Date part. Possible values: 'year', 'quarter', 'month', 'week', 'dayofyear', 'day', 'weekday', 'hour', 'minute', 'second'. String.date
— Date. Date, Date32, DateTime or DateTime64.timezone
— Timezone. Optional. String.
Returned value
The specified part of date.
Type: String
Example
Query:
Result:
FROM_UNIXTIME
Function converts Unix timestamp to a calendar date and a time of a day. When there is only a single argument of Integer type, it acts in the same way as toDateTime and return DateTime type.
FROM_UNIXTIME uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.
Alias: fromUnixTimestamp
.
Example:
Query:
Result:
When there are two or three arguments, the first an Integer, Date, Date32, DateTime or DateTime64, the second a constant format string and the third an optional constant time zone string — it acts in the same way as formatDateTime and return String type.
For example:
See Also
fromUnixTimestampInJodaSyntax
Similar to FROM_UNIXTIME, except that it formats time in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.
Example: Query:
Result:
toModifiedJulianDay
Converts a Proleptic Gregorian calendar date in text form YYYY-MM-DD
to a Modified Julian Day number in Int32. This function supports date from 0000-01-01
to 9999-12-31
. It raises an exception if the argument cannot be parsed as a date, or the date is invalid.
Syntax
Arguments
date
— Date in text form. String or FixedString.
Returned value
Modified Julian Day number.
Type: Int32.
Example
Query:
Result:
toModifiedJulianDayOrNull
Similar to toModifiedJulianDay(), but instead of raising exceptions it returns NULL
.
Syntax
Arguments
date
— Date in text form. String or FixedString.
Returned value
Modified Julian Day number.
Type: Nullable(Int32).
Example
Query:
Result:
fromModifiedJulianDay
Converts a Modified Julian Day number to a Proleptic Gregorian calendar date in text form YYYY-MM-DD
. This function supports day number from -678941
to 2973119
(which represent 0000-01-01 and 9999-12-31 respectively). It raises an exception if the day number is outside of the supported range.
Syntax
Arguments
day
— Modified Julian Day number. Any integral types.
Returned value
Date in text form.
Type: String
Example
Query:
Result:
fromModifiedJulianDayOrNull
Similar to fromModifiedJulianDayOrNull(), but instead of raising exceptions it returns NULL
.
Syntax
Arguments
day
— Modified Julian Day number. Any integral types.
Returned value
Date in text form.
Type: Nullable(String)
Example
Query:
Result:
DICTIONARIES
For information on connecting and configuring dictionaries, see Dictionaries.
dictGet, dictGetOrDefault, dictGetOrNull
Retrieves values from a dictionary.
Arguments
dict_name
— Name of the dictionary. String literal.attr_names
— Name of the column of the dictionary, String literal, or tuple of column names, Tuple(String literal).id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.default_value_expr
— Values returned if the dictionary does not contain a row with theid_expr
key. Expression or Tuple(Expression), returning the value (or values) in the data types configured for theattr_names
attribute.
Returned value
If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to
id_expr
.If there is no the key, corresponding to
id_expr
, in the dictionary, then:
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
Example for simple key dictionary
Create a text file ext-dict-test.csv
containing the following:
The first column is id
, the second column is c1
.
Configure the dictionary:
Perform the query:
Example for complex key dictionary
Create a text file ext-dict-mult.csv
containing the following:
The first column is id
, the second is c1
, the third is c2
.
Configure the dictionary:
Perform the query:
Example for range key dictionary
Input table:
Create the dictionary:
Perform the query:
Result:
See Also
dictHas
Checks whether a key is present in a dictionary.
Arguments
dict_name
— Name of the dictionary. String literal.id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.
Returned value
0, if there is no key.
1, if there is a key.
Type: UInt8
.
dictGetHierarchy
Creates an array, containing all the parents of a key in the hierarchical dictionary.
Syntax
Arguments
dict_name
— Name of the dictionary. String literal.key
— Key value. Expression returning a UInt64-type value.
Returned value
Parents for the key.
Type: Array(UInt64).
dictIsIn
Checks the ancestor of a key through the whole hierarchical chain in the dictionary.
Arguments
dict_name
— Name of the dictionary. String literal.child_id_expr
— Key to be checked. Expression returning a UInt64-type value.ancestor_id_expr
— Alleged ancestor of thechild_id_expr
key. Expression returning a UInt64-type value.
Returned value
0, if
child_id_expr
is not a child ofancestor_id_expr
.1, if
child_id_expr
is a child ofancestor_id_expr
or ifchild_id_expr
is anancestor_id_expr
.
Type: UInt8
.
Other Functions
ClickHouse supports specialized functions that convert dictionary attribute values to a specific data type regardless of the dictionary configuration.
Functions:
dictGetInt8
,dictGetInt16
,dictGetInt32
,dictGetInt64
dictGetUInt8
,dictGetUInt16
,dictGetUInt32
,dictGetUInt64
dictGetFloat32
,dictGetFloat64
dictGetDate
dictGetDateTime
dictGetUUID
dictGetString
All these functions have the OrDefault
modification. For example, dictGetDateOrDefault
.
Syntax:
Arguments
dict_name
— Name of the dictionary. String literal.attr_name
— Name of the column of the dictionary. String literal.id_expr
— Key value. Expression returning a UInt64 or Tuple-type value depending on the dictionary configuration.default_value_expr
— Value returned if the dictionary does not contain a row with theid_expr
key. Expression returning the value in the data type configured for theattr_name
attribute.
Returned value
If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to
id_expr
.If there is no requested
id_expr
in the dictionary then:
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
ENCODING
char
Returns the string with the length as the number of passed arguments and each byte has the value of corresponding argument. Accepts multiple arguments of numeric types. If the value of argument is out of range of UInt8 data type, it is converted to UInt8 with possible rounding and overflow.
Syntax
Arguments
Returned value
a string of given bytes.
Type: String
.
Example
Query:
Result:
You can construct a string of arbitrary encoding by passing the corresponding bytes. Here is example for UTF-8:
Query:
Result:
Query:
Result:
hex
Returns a string containing the argument’s hexadecimal representation.
Alias: HEX
.
Syntax
The function is using uppercase letters A-F
and not using any prefixes (like 0x
) or suffixes (like h
).
For integer arguments, it prints hex digits (“nibbles”) from the most significant to least significant (big-endian or “human-readable” order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero.
Values of type Date and DateTime are formatted as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime).
For String and FixedString, all bytes are simply encoded as two hexadecimal numbers. Zero bytes are not omitted.
Values of Float and Decimal types are encoded as their representation in memory. As we support little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted.
Values of UUID type are encoded as big-endian order string.
Arguments
Returned value
A string with the hexadecimal representation of the argument.
Type: String.
Examples
Query:
Result:
Query:
Result:
Query:
Result:
Query:
Result:
unhex
Performs the opposite operation of hex. It interprets each pair of hexadecimal digits (in the argument) as a number and converts it to the byte represented by the number. The return value is a binary string (BLOB).
If you want to convert the result to a number, you can use the reverse and reinterpretAs<Type> functions.
NOTE
If unhex
is invoked from within the clickhouse-client
, binary strings display using UTF-8.
Alias: UNHEX
.
Syntax
Arguments
arg
— A string containing any number of hexadecimal digits. Type: String, FixedString.
Supports both uppercase and lowercase letters A-F
. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the least significant half of the 00-0F
byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn’t thrown). For a numeric argument the inverse of hex(N) is not performed by unhex().
Returned value
A binary string (BLOB).
Type: String.
Example
Query:
Result:
Query:
Result:
bitmaskToList(num)
Accepts an integer. Returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.
bitmaskToArray(num)
Accepts an integer. Returns an array of UInt64 numbers containing the list of powers of two that total the source number when summed. Numbers in the array are in ascending order.
ENCRYPTION
These functions implement encryption and decryption of data with AES (Advanced Encryption Standard) algorithm.
Key length depends on encryption mode. It is 16, 24, and 32 bytes long for -128-
, -196-
, and -256-
modes respectively.
Initialization vector length is always 16 bytes (bytes in excess of 16 are ignored).
Note that these functions work slowly until ClickHouse 21.1.
encrypt
This function encrypts data using these modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
aes-128-gcm, aes-192-gcm, aes-256-gcm
aes-128-ctr, aes-192-ctr, aes-256-ctr
Syntax
Arguments
mode
— Encryption mode. String.plaintext
— Text thats need to be encrypted. String.key
— Encryption key. String.iv
— Initialization vector. Required for-gcm
modes, optinal for others. String.aad
— Additional authenticated data. It isn't encrypted, but it affects decryption. Works only in-gcm
modes, for others would throw an exception. String.
Returned value
Ciphertext binary string. String.
Examples
Create this table:
Query:
Insert some data (please avoid storing the keys/ivs in the database as this undermines the whole concept of encryption), also storing 'hints' is unsafe too and used only for illustrative purposes:
Query:
Query:
Result:
Example with -gcm
:
Query:
Result:
aes_encrypt_mysql
Compatible with mysql encryption and resulting ciphertext can be decrypted with AES_DECRYPT function.
Will produce the same ciphertext as encrypt
on equal inputs. But when key
or iv
are longer than they should normally be, aes_encrypt_mysql
will stick to what MySQL's aes_encrypt
does: 'fold' key
and ignore excess bits of iv
.
Supported encryption modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
Syntax
Arguments
mode
— Encryption mode. String.plaintext
— Text that needs to be encrypted. String.key
— Encryption key. If key is longer than required by mode, MySQL-specific key folding is performed. String.iv
— Initialization vector. Optional, only first 16 bytes are taken into account String.
Returned value
Ciphertext binary string. String.
Examples
Given equal input encrypt
and aes_encrypt_mysql
produce the same ciphertext:
Query:
Result:
But encrypt
fails when key
or iv
is longer than expected:
Query:
Result:
While aes_encrypt_mysql
produces MySQL-compatitalbe output:
Query:
Result:
Notice how supplying even longer IV
produces the same result
Query:
Result:
Which is binary equal to what MySQL produces on same inputs:
decrypt
This function decrypts ciphertext into a plaintext using these modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
aes-128-gcm, aes-192-gcm, aes-256-gcm
aes-128-ctr, aes-192-ctr, aes-256-ctr
Syntax
Arguments
mode
— Decryption mode. String.ciphertext
— Encrypted text that needs to be decrypted. String.key
— Decryption key. String.iv
— Initialization vector. Required for-gcm
modes, optinal for others. String.aad
— Additional authenticated data. Won't decrypt if this value is incorrect. Works only in-gcm
modes, for others would throw an exception. String.
Returned value
Decrypted String. String.
Examples
Re-using table from encrypt.
Query:
Result:
Now let's try to decrypt all that data.
Query:
Result:
Notice how only a portion of the data was properly decrypted, and the rest is gibberish since either mode
, key
, or iv
were different upon encryption.
aes_decrypt_mysql
Compatible with mysql encryption and decrypts data encrypted with AES_ENCRYPT function.
Will produce same plaintext as decrypt
on equal inputs. But when key
or iv
are longer than they should normally be, aes_decrypt_mysql
will stick to what MySQL's aes_decrypt
does: 'fold' key
and ignore excess bits of IV
.
Supported decryption modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-cfb128
aes-128-ofb, aes-192-ofb, aes-256-ofb
Syntax
Arguments
mode
— Decryption mode. String.ciphertext
— Encrypted text that needs to be decrypted. String.key
— Decryption key. String.iv
— Initialization vector. Optinal. String.
Returned value
Decrypted String. String.
Examples
Let's decrypt data we've previously encrypted with MySQL:
Query:
Result:
FILE
file
Reads file as a String. The file content is not parsed, so any information is read as one string and placed into the specified column.
Syntax
Arguments
path
— The relative path to the file from user_files_path. Path to file support following wildcards:*
,?
,{abc,def}
and{N..M}
whereN
,M
— numbers,'abc', 'def'
— strings.
Example
Inserting data from files a.txt and b.txt into a table as strings:
Query:
See Also
GEOGRAPHICAL COORDINATES
greatCircleDistance
Calculates the distance between two points on the Earth’s surface using the great-circle formula.
Input parameters
lon1Deg
— Longitude of the first point in degrees. Range:[-180°, 180°]
.lat1Deg
— Latitude of the first point in degrees. Range:[-90°, 90°]
.lon2Deg
— Longitude of the second point in degrees. Range:[-180°, 180°]
.lat2Deg
— Latitude of the second point in degrees. Range:[-90°, 90°]
.
Positive values correspond to North latitude and East longitude, and negative values correspond to South latitude and West longitude.
Returned value
The distance between two points on the Earth’s surface, in meters.
Generates an exception when the input parameter values fall outside of the range.
Example
geoDistance
Similar to greatCircleDistance
but calculates the distance on WGS-84 ellipsoid instead of sphere. This is more precise approximation of the Earth Geoid. The performance is the same as for greatCircleDistance
(no performance drawback). It is recommended to use geoDistance
to calculate the distances on Earth.
Technical note: for close enough points we calculate the distance using planar approximation with the metric on the tangent plane at the midpoint of the coordinates.
greatCircleAngle
Calculates the central angle between two points on the Earth’s surface using the great-circle formula.
Input parameters
lon1Deg
— Longitude of the first point in degrees.lat1Deg
— Latitude of the first point in degrees.lon2Deg
— Longitude of the second point in degrees.lat2Deg
— Latitude of the second point in degrees.
Returned value
The central angle between two points in degrees.
Example
pointInEllipses
Checks whether the point belongs to at least one of the ellipses. Coordinates are geometric in the Cartesian coordinate system.
Input parameters
x, y
— Coordinates of a point on the plane.xᵢ, yᵢ
— Coordinates of the center of thei
-th ellipsis.aᵢ, bᵢ
— Axes of thei
-th ellipsis in units of x, y coordinates.
The input parameters must be 2+4⋅n
, where n
is the number of ellipses.
Returned values
1
if the point is inside at least one of the ellipses; 0
if it is not.
Example
pointInPolygon
Checks whether the point belongs to the polygon on the plane.
Input values
(x, y)
— Coordinates of a point on the plane. Data type — Tuple — A tuple of two numbers.[(a, b), (c, d) ...]
— Polygon vertices. Data type — Array. Each vertex is represented by a pair of coordinates(a, b)
. Vertices should be specified in a clockwise or counterclockwise order. The minimum number of vertices is 3. The polygon must be constant.The function also supports polygons with holes (cut out sections). In this case, add polygons that define the cut out sections using additional arguments of the function. The function does not support non-simply-connected polygons.
Returned values
1
if the point is inside the polygon, 0
if it is not. If the point is on the polygon boundary, the function may return either 0 or 1.
Example
GEOHASH
Geohash is the geocode system, which subdivides Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location.
If you need to manually convert geographic coordinates to geohash strings, you can use geohash.org.
geohashEncode
Encodes latitude and longitude as a geohash-string.
Input values
longitude - longitude part of the coordinate you want to encode. Floating in range
[-180°, 180°]
latitude - latitude part of the coordinate you want to encode. Floating in range
[-90°, 90°]
precision - Optional, length of the resulting encoded string, defaults to
12
. Integer in range[1, 12]
. Any value less than1
or greater than12
is silently converted to12
.
Returned values
alphanumeric
String
of encoded coordinate (modified version of the base32-encoding alphabet is used).
Example
geohashDecode
Decodes any geohash-encoded string into longitude and latitude.
Input values
encoded string - geohash-encoded string.
Returned values
(longitude, latitude) - 2-tuple of
Float64
values of longitude and latitude.
Example
geohashesInBox
Returns an array of geohash-encoded strings of given precision that fall inside and intersect boundaries of given box, basically a 2D grid flattened into array.
Syntax
Arguments
longitude_min
— Minimum longitude. Range:[-180°, 180°]
. Type: Float.latitude_min
— Minimum latitude. Range:[-90°, 90°]
. Type: Float.longitude_max
— Maximum longitude. Range:[-180°, 180°]
. Type: Float.latitude_max
— Maximum latitude. Range:[-90°, 90°]
. Type: Float.precision
— Geohash precision. Range:[1, 12]
. Type: UInt8.
NOTE
All coordinate parameters must be of the same type: either Float32
or Float64
.
Returned values
Array of precision-long strings of geohash-boxes covering provided area, you should not rely on order of items.
[]
- Empty array if minimum latitude and longitude values aren’t less than corresponding maximum values.
NOTE
Function throws an exception if resulting array is over 10’000’000 items long.
Example
Query:
Result:
H3 INDEXES
H3 is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be split into seven even but smaller ones ("children"), and so on.
The level of the hierarchy is called resolution
and can receive a value from 0
till 15
, where 0
is the base
level with the largest and coarsest cells.
A latitude and longitude pair can be transformed to a 64-bit H3 index, identifying a grid cell.
The H3 index is used primarily for bucketing locations and other geospatial manipulations.
The full description of the H3 system is available at the Uber Engeneering site.
h3IsValid
Verifies whether the number is a valid H3 index.
Syntax
Parameter
h3index
— Hexagon index number. Type: UInt64.
Returned values
1 — The number is a valid H3 index.
0 — The number is not a valid H3 index.
Type: UInt8.
Example
Query:
Result:
h3GetResolution
Defines the resolution of the given H3 index.
Syntax
Parameter
h3index
— Hexagon index number. Type: UInt64.
Returned values
Index resolution. Range:
[0, 15]
.If the index is not valid, the function returns a random value. Use h3IsValid to verify the index.
Type: UInt8.
Example
Query:
Result:
h3EdgeAngle
Calculates the average length of the H3 hexagon edge in grades.
Syntax
Parameter
resolution
— Index resolution. Type: UInt8. Range:[0, 15]
.
Returned values
Example
Query:
Result:
h3EdgeLengthM
Calculates the average length of the H3 hexagon edge in meters.
Syntax
Parameter
resolution
— Index resolution. Type: UInt8. Range:[0, 15]
.
Returned values
Example
Query:
Result:
geoToH3
Returns H3 point index (lon, lat)
with specified resolution.
Syntax
Arguments
lon
— Longitude. Type: Float64.lat
— Latitude. Type: Float64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned values
Hexagon index number.
0 in case of error.
Type: UInt64.
Example
Query:
Result:
h3kRing
Lists all the H3 hexagons in the raduis of k
from the given hexagon in random order.
Syntax
Arguments
Returned values
Array of H3 indexes.
Example
Query:
Result:
h3GetBaseCell
Returns the base cell number of the H3 index.
Syntax
Parameter
index
— Hexagon index number. Type: UInt64.
Returned value
Hexagon base cell number.
Type: UInt8.
Example
Query:
Result:
h3HexAreaM2
Returns average hexagon area in square meters at the given resolution.
Syntax
Parameter
resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned value
Area in square meters.
Type: Float64.
Example
Query:
Result:
h3IndexesAreNeighbors
Returns whether or not the provided H3 indexes are neighbors.
Syntax
Arguments
Returned value
1
— Indexes are neighbours.0
— Indexes are not neighbours.
Type: UInt8.
Example
Query:
Result:
h3ToChildren
Returns an array of child indexes for the given H3 index.
Syntax
Arguments
index
— Hexagon index number. Type: UInt64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned values
Array of the child H3-indexes.
Example
Query:
Result:
h3ToParent
Returns the parent (coarser) index containing the given H3 index.
Syntax
Arguments
index
— Hexagon index number. Type: UInt64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned value
Parent H3 index.
Type: UInt64.
Example
Query:
Result:
h3ToString
Converts the H3Index
representation of the index to the string representation.
Parameter
index
— Hexagon index number. Type: UInt64.
Returned value
String representation of the H3 index.
Type: String.
Example
Query:
Result:
stringToH3
Converts the string representation to the H3Index
(UInt64) representation.
Syntax
Parameter
index_str
— String representation of the H3 index. Type: String.
Returned value
Hexagon index number. Returns 0 on error. Type: UInt64.
Example
Query:
Result:
HASH
Hash functions can be used for the deterministic pseudo-random shuffling of elements.
Simhash is a hash function, which returns close hash values for close (similar) arguments.
halfMD5
Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64
in big-endian byte order.
The function is relatively slow (5 million short strings per second per processor core). Consider using the sipHash64 function instead.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
MD5
Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16). If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘sipHash128’ function instead. If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).
sipHash64
Produces a 64-bit SipHash hash value.
This is a cryptographic hash function. It works at least three times faster than the MD5 function.
Function interprets all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
After hashing all the input parameters, the function gets the array of hashes.
Function takes the first and the second elements and calculates a hash for the array of them.
Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.
The previous step is repeated for all the remaining elements of the initial hash array.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
sipHash128
Produces a 128-bit SipHash hash value. Differs from sipHash64 in that the final xor-folding state is done up to 128 bits.
Syntax
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned value
A 128-bit SipHash
hash value.
Type: FixedString(16).
Example
Query:
Result:
cityHash64
Produces a 64-bit CityHash hash value.
This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Examples
Call example:
The following example shows how to compute the checksum of the entire table with accuracy up to the row order:
intHash32
Calculates a 32-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers.
intHash64
Calculates a 64-bit hash code from any type of integer. It works faster than intHash32. Average quality.
SHA1, SHA224, SHA256, SHA512
Calculates SHA-1, SHA-224, SHA-256, SHA-512 hash from a string and returns the resulting set of bytes as FixedString.
Syntax
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million). We recommend using this function only in cases when you need a specific hash function and you can’t select it. Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECT
queries.
Arguments
s
— Input string for SHA hash calculation. String.
Returned value
SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64).
Type: FixedString.
Example
Use the hex function to represent the result as a hex-encoded string.
Query:
Result:
URLHash(url[, N])
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization. URLHash(s)
– Calculates a hash from a string without one of the trailing symbols /
,?
or #
at the end, if present. URLHash(s, N)
– Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /
,?
or #
at the end, if present. Levels are the same as in URLHierarchy.
farmFingerprint64
farmHash64
Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64
is preferred for a stable and portable value.
These functions use the Fingerprint64
and Hash64
methods respectively from all available methods.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data)..
Returned Value
A UInt64 data type hash value.
Example
javaHash
Calculates JavaHash from a string, Byte, Short, Integer, Long. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.
Syntax
Returned value
A Int32
data type hash value.
Example
Query:
Result:
Query:
Result:
javaHashUTF16LE
Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.
Syntax
Arguments
stringUtf16le
— a string in UTF-16LE encoding.
Returned value
A Int32
data type hash value.
Example
Correct query with UTF-16LE encoded string.
Query:
Result:
hiveHash
Calculates HiveHash
from a string.
This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Returned value
A Int32
data type hash value.
Type: hiveHash
.
Example
Query:
Result:
metroHash64
Produces a 64-bit MetroHash hash value.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
jumpConsistentHash
Calculates JumpConsistentHash form a UInt64. Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32. For more information, see the link: JumpConsistentHash
murmurHash2_32, murmurHash2_64
Produces a MurmurHash2 hash value.
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
The
murmurHash2_32
function returns hash value having the UInt32 data type.The
murmurHash2_64
function returns hash value having the UInt64 data type.
Example
gccMurmurHash
Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between CLang and GCC builds.
Syntax
Arguments
par1, ...
— A variable number of parameters that can be any of the supported data types.
Returned value
Calculated hash value.
Type: UInt64.
Example
Query:
Result:
murmurHash3_32, murmurHash3_64
Produces a MurmurHash3 hash value.
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
The
murmurHash3_32
function returns a UInt32 data type hash value.The
murmurHash3_64
function returns a UInt64 data type hash value.
Example
murmurHash3_128
Produces a 128-bit MurmurHash3 hash value.
Syntax
Arguments
expr
— A list of expressions. String.
Returned value
A 128-bit MurmurHash3
hash value.
Type: FixedString(16).
Example
Query:
Result:
xxHash32, xxHash64
Calculates xxHash
from a string. It is proposed in two flavors, 32 and 64 bits.
Returned value
A UInt32
or UInt64
data type hash value.
Type: UInt32
for xxHash32
and UInt64
for xxHash64
.
Example
Query:
Result:
See Also
ngramSimHash
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
ngramSimHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
ngramSimHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
ngramSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
wordShingleSimHash
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
wordShingleSimHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
wordShingleSimHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optinal. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
wordShingleSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
ngramMinHash
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
ngramMinHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
ngramMinHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
ngramMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
ngramMinHashArg
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
ngramMinHashArgCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
ngramMinHashArgUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8 function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
ngramMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
wordShingleMinHash
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
wordShingleMinHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
wordShingleMinHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
wordShingleMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
Result:
wordShingleMinHashArg
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordshingleMinHash function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
wordShingleMinHashArgCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
wordShingleMinHashArgUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8 function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
wordShingleMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
Result:
INTROSPECTION FUNCTIONS
You can use functions described in this chapter to introspect ELF and DWARF for query profiling.
WARNING
These functions are slow and may impose security considerations.
For proper operation of introspection functions:
Install the
clickhouse-common-static-dbg
package.Set the allow_introspection_functions setting to 1.
ClickHouse saves profiler reports to the trace_log system table. Make sure the table and profiler are configured properly.
addressToLine
Converts virtual memory address inside ClickHouse server process to the filename and the line number in ClickHouse source code.
If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg
package.
Syntax
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Source code filename and the line number in this file delimited by colon.
Name of a binary, if the function couldn’t find the debug information.
Empty string, if the address is not valid.
Type: String.
Example
Enabling introspection functions:
Selecting the first string from the trace_log
system table:
The trace
field contains the stack trace at the moment of sampling.
Getting the source code filename and the line number for a single address:
Applying the function to the whole stack trace:
The arrayMap function allows to process each individual element of the trace
array by the addressToLine
function. The result of this processing you see in the trace_source_code_lines
column of output.
addressToLineWithInlines
Similar to addressToLine
, but it will return an Array with all inline functions, and will be much slower as a price.
If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg
package.
Syntax
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Array which first element is source code filename and the line number in this file delimited by colon. And from second element, inline functions' source code filename and line number and function name are listed.
Array with single element which is name of a binary, if the function couldn’t find the debug information.
Empty array, if the address is not valid.
Type: Array(String).
Example
Enabling introspection functions:
Applying the function to address.
Applying the function to the whole stack trace:
The arrayJoin functions will split array to rows.
addressToSymbol
Converts virtual memory address inside ClickHouse server process to the symbol from ClickHouse object files.
Syntax
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Symbol from ClickHouse object files.
Empty string, if the address is not valid.
Type: String.
Example
Enabling introspection functions:
Selecting the first string from the trace_log
system table:
The trace
field contains the stack trace at the moment of sampling.
Getting a symbol for a single address:
Applying the function to the whole stack trace:
The arrayMap function allows to process each individual element of the trace
array by the addressToSymbols
function. The result of this processing you see in the trace_symbols
column of output.
demangle
Converts a symbol that you can get using the addressToSymbol function to the C++ function name.
Syntax
Arguments
symbol
(String) — Symbol from an object file.
Returned value
Name of the C++ function.
Empty string if a symbol is not valid.
Type: String.
Example
Enabling introspection functions:
Selecting the first string from the trace_log
system table:
The trace
field contains the stack trace at the moment of sampling.
Getting a function name for a single address:
Applying the function to the whole stack trace:
The arrayMap function allows to process each individual element of the trace
array by the demangle
function. The result of this processing you see in the trace_functions
column of output.
tid
Returns id of the thread, in which current Block is processed.
Syntax
Returned value
Current thread id. Uint64.
Example
Query:
Result:
logTrace
Emits trace log message to server log for each Block.
Syntax
Arguments
message
— Message that is emitted to server log. String.
Returned value
Always returns 0.
Example
Query:
Result:
IP ADDRESSES
IPv4NumToString(num)
Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).
Alias: INET_NTOA
.
IPv4StringToNum(s)
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.
Alias: INET_ATON
.
IPv4NumToStringClassC(num)
Similar to IPv4NumToString, but using xxx instead of the last octet.
Example:
Since using ‘xxx’ is highly unusual, this may be changed in the future. We recommend that you do not rely on the exact format of this fragment.
IPv6NumToString(x)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.
Alias: INET6_NTOA
.
Examples:
IPv6StringToNum
The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it throws exception.
If the input string contains a valid IPv4 address, returns its IPv6 equivalent. HEX can be uppercase or lowercase.
Alias: INET6_ATON
.
Syntax
Argument
string
— IP address. String.
Returned value
IPv6 address in binary format.
Type: FixedString(16).
Example
Query:
Result:
See Also
IPv4ToIPv6(x)
Takes a UInt32
number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16)
value containing the IPv6 address in binary format. Examples:
cutIPv6(x, bytesToCutForIPv6, bytesToCutForIPv4)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bytes removed in text format. For example:
IPv4CIDRToRange(ipv4, Cidr),
Accepts an IPv4 and an UInt8 value containing the CIDR. Return a tuple with two IPv4 containing the lower range and the higher range of the subnet.
IPv6CIDRToRange(ipv6, Cidr),
Accepts an IPv6 and an UInt8 value containing the CIDR. Return a tuple with two IPv6 containing the lower range and the higher range of the subnet.
toIPv4(string)
An alias to IPv4StringToNum()
that takes a string form of IPv4 address and returns value of IPv4 type, which is binary equal to value returned by IPv4StringToNum()
.
toIPv6
Converts a string form of IPv6 address to IPv6 type. If the IPv6 address has an invalid format, returns an empty value. Similar to IPv6StringToNum function, which converts IPv6 address to binary format.
If the input string contains a valid IPv4 address, then the IPv6 equivalent of the IPv4 address is returned.
Syntax
Argument
string
— IP address. String
Returned value
IP address.
Type: IPv6.
Examples
Query:
Result:
Query:
Result:
isIPv4String
Determines whether the input string is an IPv4 address or not. If string
is IPv6 address returns 0
.
Syntax
Arguments
string
— IP address. String.
Returned value
1
ifstring
is IPv4 address,0
otherwise.
Type: UInt8.
Examples
Query:
Result:
isIPv6String
Determines whether the input string is an IPv6 address or not. If string
is IPv4 address returns 0
.
Syntax
Arguments
string
— IP address. String.
Returned value
1
ifstring
is IPv6 address,0
otherwise.
Type: UInt8.
Examples
Query:
Result:
isIPAddressInRange
Determines if an IP address is contained in a network represented in the CIDR notation. Returns 1
if true, or 0
otherwise.
Syntax
This function accepts both IPv4 and IPv6 addresses (and networks) represented as strings. It returns 0
if the IP version of the address and the CIDR don't match.
Arguments
Returned value
1
or0
.
Type: UInt8.
Example
Query:
Result:
Query:
Result:
Query:
Result:
JSON
ClickHouse has special functions for working with this JSON. All the JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
The following assumptions are made:
The field name (function argument) must be a constant.
The field name is somehow canonically encoded in JSON. For example:
visitParamHas('{"abc":"def"}', 'abc') = 1
, butvisitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0
Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
The JSON does not have space characters outside of string literals.
visitParamHas(params, name)
Checks whether there is a field with the name
name.
Alias: simpleJSONHas
.
visitParamExtractUInt(params, name)
Parses UInt64 from the value of the field named name
. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns 0.
Alias: simpleJSONExtractUInt
.
visitParamExtractInt(params, name)
The same as for Int64.
Alias: simpleJSONExtractInt
.
visitParamExtractFloat(params, name)
The same as for Float64.
Alias: simpleJSONExtractFloat
.
visitParamExtractBool(params, name)
Parses a true/false value. The result is UInt8.
Alias: simpleJSONExtractBool
.
visitParamExtractRaw(params, name)
Returns the value of a field, including separators.
Alias: simpleJSONExtractRaw
.
Examples:
visitParamExtractString(params, name)
Parses the string in double quotes. The value is unescaped. If unescaping failed, it returns an empty string.
Alias: simpleJSONExtractString
.
Examples:
There is currently no support for code points in the format \uXXXX\uYYYY
that are not from the basic multilingual plane (they are converted to CESU-8 instead of UTF-8).
The following functions are based on simdjson designed for more complex JSON parsing requirements. The assumption 2 mentioned above still applies.
isValidJSON(json)
Checks that passed string is a valid json.
Examples:
JSONHas(json[, indices_or_keys]…)
If the value exists in the JSON document, 1
will be returned.
If the value does not exist, 0
will be returned.
Examples:
indices_or_keys
is a list of zero or more arguments each of them can be either string or integer.
String = access object member by key.
Positive integer = access the n-th member/key from the beginning.
Negative integer = access the n-th member/key from the end.
Minimum index of the element is 1. Thus the element 0 does not exist.
You may use integers to access both JSON arrays and JSON objects.
So, for example:
JSONLength(json[, indices_or_keys]…)
Return the length of a JSON array or a JSON object.
If the value does not exist or has a wrong type, 0
will be returned.
Examples:
JSONType(json[, indices_or_keys]…)
Return the type of a JSON value.
If the value does not exist, Null
will be returned.
Examples:
JSONExtractUInt(json[, indices_or_keys]…)
JSONExtractInt(json[, indices_or_keys]…)
JSONExtractFloat(json[, indices_or_keys]…)
JSONExtractBool(json[, indices_or_keys]…)
Parses a JSON and extract a value. These functions are similar to visitParam
functions.
If the value does not exist or has a wrong type, 0
will be returned.
Examples:
JSONExtractString(json[, indices_or_keys]…)
Parses a JSON and extract a string. This function is similar to visitParamExtractString
functions.
If the value does not exist or has a wrong type, an empty string will be returned.
The value is unescaped. If unescaping failed, it returns an empty string.
Examples:
JSONExtract(json[, indices_or_keys…], Return_type)
Parses a JSON and extract a value of the given ClickHouse data type.
This is a generalization of the previous JSONExtract<type>
functions. This means JSONExtract(..., 'String')
returns exactly the same as JSONExtractString()
, JSONExtract(..., 'Float64')
returns exactly the same as JSONExtractFloat()
.
Examples:
JSONExtractKeysAndValues(json[, indices_or_keys…], Value_type)
Parses key-value pairs from a JSON where the values are of the given ClickHouse data type.
Example:
JSONExtractKeys
Parses a JSON string and extracts the keys.
Syntax
Arguments
json
— String with valid JSON.a, b, c...
— Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a String to get the field by the key or an Integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
Returned value
Array with the keys of the JSON.
Example
Query:
Result:
JSONExtractRaw(json[, indices_or_keys]…)
Returns a part of JSON as unparsed string.
If the part does not exist or has a wrong type, an empty string will be returned.
Example:
JSONExtractArrayRaw(json[, indices_or_keys…])
Returns an array with elements of JSON array, each represented as unparsed string.
If the part does not exist or isn’t array, an empty array will be returned.
Example:
JSONExtractKeysAndValuesRaw
Extracts raw data from a JSON object.
Syntax
Arguments
json
— String with valid JSON.p, a, t, h
— Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a string to get the field by the key or an integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
Returned values
Array with
('key', 'value')
tuples. Both tuple members are strings.Empty array if the requested object does not exist, or input JSON is invalid.
Type: Array(Tuple(String, String).
Examples
Query:
Result:
Query:
Result:
Query:
Result:
JSON_EXISTS(json, path)
If the value exists in the JSON document, 1
will be returned.
If the value does not exist, 0
will be returned.
Examples:
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
JSON_QUERY(json, path)
Parses a JSON and extract a value as JSON array or JSON object.
If the value does not exist, an empty string will be returned.
Example:
Result:
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json)
JSON_VALUE(json, path)
Parses a JSON and extract a value as JSON scalar.
If the value does not exist, an empty string will be returned.
Example:
Result:
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json)
toJSONString
Serializes a value to its JSON representation. Various data types and nested structures are supported. 64-bit integers or bigger (like UInt64
or Int128
) are enclosed in quotes by default. output_format_json_quote_64bit_integers controls this behavior. Special values NaN
and inf
are replaced with null
. Enable output_format_json_quote_denormals setting to show them. When serializing an Enum value, the function outputs its name.
Syntax
Arguments
value
— Value to serialize. Value may be of any data type.
Returned value
JSON representation of the value.
Type: String.
Example
The first example shows serialization of a Map. The second example shows some special values wrapped into a Tuple.
Query:
Result:
MACHINE LEARNING FUNCTIONS
evalMLMethod
Prediction using fitted regression models uses evalMLMethod
function. See link in linearRegression
.
stochasticLinearRegression
The stochasticLinearRegression aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses evalMLMethod
to predict on new data.
stochasticLogisticRegression
The stochasticLogisticRegression aggregate function implements stochastic gradient descent method for binary classification problem. Uses evalMLMethod
to predict on new data.
MAPS
map
Arranges key:value
pairs into Map(key, value) data type.
Syntax
Arguments
key
— The key part of the pair. String, Integer, LowCardinality, FixedString, UUID, Date, DateTime, Date32, Enum.
Returned value
Data structure as
key:value
pairs.
Type: Map(key, value).
Examples
Query:
Result:
Query:
Result:
See Also
Map(key, value) data type
mapAdd
Collect all the keys and sum corresponding values.
Syntax
Arguments
Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promoted to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.
Returned value
Example
Query with a tuple:
Result:
Query with Map
type:
Result:
mapSubtract
Collect all the keys and subtract corresponding values.
Syntax
Arguments
Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promote to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.
Returned value
Example
Query with a tuple map:
Result:
Query with Map
type:
Result:
mapPopulateSeries
Fills missing keys in the maps (key and value array pair), where keys are integers. Also, it supports specifying the max key, which is used to extend the keys array.
Syntax
Generates a map (a tuple with two arrays or a value of Map
type, depending on the arguments), where keys are a series of numbers, from minimum to maximum keys (or max
argument if it specified) taken from the map with a step size of one, and corresponding values. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key.
For array arguments the number of elements in keys
and values
must be the same for each row.
Arguments
Arguments are maps or two arrays, where the first array represent keys, and the second array contains values for the each key.
Mapped arrays:
max
— Maximum key value. Optional. Int8, Int16, Int32, Int64, Int128, Int256.
or
map
— Map with integer keys. Map.
Returned value
Example
Query with mapped arrays:
Result:
Query with Map
type:
Result:
mapContains
Determines whether the map
contains the key
parameter.
Syntax
Parameters
map
— Map. Map.key
— Key. Type matches the type of keys ofmap
parameter.
Returned value
1
ifmap
containskey
,0
if not.
Type: UInt8.
Example
Query:
Result:
mapKeys
Returns all keys from the map
parameter.
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only keys subcolumn instead of reading and processing the whole column data. The query SELECT mapKeys(m) FROM table
transforms to SELECT m.keys FROM table
.
Syntax
Parameters
map
— Map. Map.
Returned value
Array containing all keys from the
map
.
Type: Array.
Example
Query:
Result:
mapValues
Returns all values from the map
parameter.
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only values subcolumn instead of reading and processing the whole column data. The query SELECT mapValues(m) FROM table
transforms to SELECT m.values FROM table
.
Syntax
Parameters
map
— Map. Map.
Returned value
Array containing all the values from
map
.
Type: Array.
Example
Query:
Result:
MATHEMATICAL
All the functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.
e()
Returns a Float64 number that is close to the number e.
exp(x)
Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
log(x), ln(x)
Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.
exp2(x)
Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.
log2(x)
Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.
exp10(x)
Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.
log10(x)
Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.
sqrt(x)
Accepts a numeric argument and returns a Float64 number close to the square root of the argument.
cbrt(x)
Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.
erf(x)
If ‘x’ is non-negative, then erf(x / σ√2)
is the probability that a random variable having a normal distribution with standard deviation ‘σ’ takes the value that is separated from the expected value by more than ‘x’.
Example (three sigma rule):
erfc(x)
Accepts a numeric argument and returns a Float64 number close to 1 - erf(x), but without loss of precision for large ‘x’ values.
lgamma(x)
The logarithm of the gamma function.
tgamma(x)
Gamma function.
sin(x)
The sine.
cos(x)
The cosine.
tan(x)
The tangent.
asin(x)
The arc sine.
acos(x)
The arc cosine.
atan(x)
The arc tangent.
pow(x, y), power(x, y)
Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.
intExp2
Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.
intExp10
Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.
cosh(x)
Syntax
Arguments
x
— The angle, in radians. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
Values from the interval:
1 <= cosh(x) < +∞
.
Type: Float64.
Example
Query:
Result:
acosh(x)
Syntax
Arguments
x
— Hyperbolic cosine of angle. Values from the interval:1 <= x < +∞
. Float64.
Returned value
The angle, in radians. Values from the interval:
0 <= acosh(x) < +∞
.
Type: Float64.
Example
Query:
Result:
See Also
sinh(x)
Syntax
Arguments
x
— The angle, in radians. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
Values from the interval:
-∞ < sinh(x) < +∞
.
Type: Float64.
Example
Query:
Result:
asinh(x)
Syntax
Arguments
x
— Hyperbolic sine of angle. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
The angle, in radians. Values from the interval:
-∞ < asinh(x) < +∞
.
Type: Float64.
Example
Query:
Result:
See Also
atanh(x)
Syntax
Arguments
x
— Hyperbolic tangent of angle. Values from the interval:–1 < x < 1
. Float64.
Returned value
The angle, in radians. Values from the interval:
-∞ < atanh(x) < +∞
.
Type: Float64.
Example
Query:
Result:
atan2(y, x)
The function calculates the angle in the Euclidean plane, given in radians, between the positive x axis and the ray to the point (x, y) ≠ (0, 0)
.
Syntax
Arguments
y
— y-coordinate of the point through which the ray passes. Float64.x
— x-coordinate of the point through which the ray passes. Float64.
Returned value
The angle
θ
such that−π < θ ≤ π
, in radians.
Type: Float64.
Example
Query:
Result:
hypot(x, y)
Calculates the length of the hypotenuse of a right-angle triangle. The function avoids problems that occur when squaring very large or very small numbers.
Syntax
Arguments
x
— The first cathetus of a right-angle triangle. Float64.y
— The second cathetus of a right-angle triangle. Float64.
Returned value
The length of the hypotenuse of a right-angle triangle.
Type: Float64.
Example
Query:
Result:
log1p(x)
Calculates log(1+x)
. The function log1p(x)
is more accurate than log(1+x)
for small values of x.
Syntax
Arguments
x
— Values from the interval:-1 < x < +∞
. Float64.
Returned value
Values from the interval:
-∞ < log1p(x) < +∞
.
Type: Float64.
Example
Query:
Result:
See Also
sign(x)
Returns the sign of a real number.
Syntax
Arguments
x
— Values from-∞
to+∞
. Support all numeric types in ClickHouse.
Returned value
-1 for
x < 0
0 for
x = 0
1 for
x > 0
Examples
Sign for the zero value:
Result:
Sign for the positive value:
Result:
Sign for the negative value:
Result:
NULLABLE
isNull
Checks whether the argument is NULL.
Alias: ISNULL
.
Arguments
x
— A value with a non-compound data type.
Returned value
1
ifx
isNULL
.0
ifx
is notNULL
.
Example
Input table
Query
isNotNull
Checks whether the argument is NULL.
Arguments:
x
— A value with a non-compound data type.
Returned value
0
ifx
isNULL
.1
ifx
is notNULL
.
Example
Input table
Query
coalesce
Checks from left to right whether NULL
arguments were passed and returns the first non-NULL
argument.
Arguments:
Any number of parameters of a non-compound type. All parameters must be compatible by data type.
Returned values
The first non-
NULL
argument.NULL
, if all arguments areNULL
.
Example
Consider a list of contacts that may specify multiple ways to contact a customer.
The mail
and phone
fields are of type String, but the icq
field is UInt32
, so it needs to be converted to String
.
Get the first available contact method for the customer from the contact list:
ifNull
Returns an alternative value if the main argument is NULL
.
Arguments:
x
— The value to check forNULL
.alt
— The value that the function returns ifx
isNULL
.
Returned values
The value
x
, ifx
is notNULL
.The value
alt
, ifx
isNULL
.
Example
nullIf
Returns NULL
if the arguments are equal.
Arguments:
x
, y
— Values for comparison. They must be compatible types, or ClickHouse will generate an exception.
Returned values
NULL
, if the arguments are equal.The
x
value, if the arguments are not equal.
Example
assumeNotNull
Results in an equivalent non-Nullable
value for a Nullable type. In case the original value is NULL
the result is undetermined. See also ifNull
and coalesce
functions.
Arguments:
x
— The original value.
Returned values
The original value from the non-
Nullable
type, if it is notNULL
.Implementation specific result if the original value was
NULL
.
Example
Consider the t_null
table.
Apply the assumeNotNull
function to the y
column.
toNullable
Converts the argument type to Nullable
.
Arguments:
x
— The value of any non-compound type.
Returned value
The input value with a
Nullable
type.
Example
OTHERS
hostName()
Returns a string with the name of the host that this function was performed on. For distributed processing, this is the name of the remote server host, if the function is performed on a remote server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
getMacro
Gets a named value from the macros section of the server configuration.
Syntax
Arguments
name
— Name to retrieve from themacros
section. String.
Returned value
Value of the specified macro.
Type: String.
Example
The example macros
section in the server configuration file:
Query:
Result:
An alternative way to get the same value:
FQDN
Returns the fully qualified domain name.
Syntax
This function is case-insensitive.
Returned value
String with the fully qualified domain name.
Type: String
.
Example
Query:
Result:
basename
Extracts the trailing part of a string after the last slash or backslash. This function if often used to extract the filename from a path.
Arguments
expr
— Expression resulting in a String type value. All the backslashes must be escaped in the resulting value.
Returned Value
A string that contains:
The trailing part of a string after the last slash or backslash.
The original string if there are no slashes or backslashes.
Example
visibleWidth(x)
Calculates the approximate width when outputting values to the console in text format (tab-separated). This function is used by the system for implementing Pretty formats.
NULL
is represented as a string corresponding to NULL
in Pretty
formats.
toTypeName(x)
Returns a string containing the type name of the passed argument.
If NULL
is passed to the function as input, then it returns the Nullable(Nothing)
type, which corresponds to an internal NULL
representation in ClickHouse.
blockSize()
Gets the size of the block. In ClickHouse, queries are always run on blocks (sets of column parts). This function allows getting the size of the block that you called it for.
byteSize
Returns estimation of uncompressed byte size of its arguments in memory.
Syntax
Arguments
argument
— Value.
Returned value
Estimation of byte size of the arguments in memory.
Type: UInt64.
Examples
For String arguments the funtion returns the string length + 9 (terminating zero + length).
Query:
Result:
Query:
Result:
If the function takes multiple arguments, it returns their combined byte size.
Query:
Result:
materialize(x)
Turns a constant into a full column containing just one value. In ClickHouse, full columns and constants are represented differently in memory. Functions work differently for constant arguments and normal arguments (different code is executed), although the result is almost always the same. This function is for debugging this behavior.
ignore(…)
Accepts any arguments, including NULL
. Always returns 0. However, the argument is still evaluated. This can be used for benchmarks.
sleep(seconds)
Sleeps ‘seconds’ seconds on each data block. You can specify an integer or a floating-point number.
sleepEachRow(seconds)
Sleeps ‘seconds’ seconds on each row. You can specify an integer or a floating-point number.
currentDatabase()
Returns the name of the current database. You can use this function in table engine parameters in a CREATE TABLE query where you need to specify the database.
currentUser()
Returns the login of current user. Login of user, that initiated query, will be returned in case distibuted query.
Alias: user()
, USER()
.
Returned values
Login of current user.
Login of user that initiated query in case of disributed query.
Type: String
.
Example
Query:
Result:
isConstant
Checks whether the argument is a constant expression.
A constant expression means an expression whose resulting value is known at the query analysis (i.e. before execution). For example, expressions over literals are constant expressions.
The function is intended for development, debugging and demonstration.
Syntax
Arguments
x
— Expression to check.
Returned values
1
—x
is constant.0
—x
is non-constant.
Type: UInt8.
Examples
Query:
Result:
Query:
Result:
Query:
Result:
isFinite(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is not infinite and not a NaN, otherwise 0.
isInfinite(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is infinite, otherwise 0. Note that 0 is returned for a NaN.
ifNotFinite
Checks whether floating point value is finite.
Syntax
Arguments
Returned value
x
ifx
is finite.y
ifx
is not finite.
Example
Query:
Result:
You can get similar result by using ternary operator: isFinite(x) ? x : y
.
isNaN(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is a NaN, otherwise 0.
hasColumnInTable([‘hostname’[, ‘username’[, ‘password’]],] ‘database’, ‘table’, ‘column’)
Accepts constant strings: database name, table name, and column name. Returns a UInt8 constant expression equal to 1 if there is a column, otherwise 0. If the hostname parameter is set, the test will run on a remote server. The function throws an exception if the table does not exist. For elements in a nested data structure, the function checks for the existence of a column. For the nested data structure itself, the function returns 0.
bar
Allows building a unicode-art diagram.
bar(x, min, max, width)
draws a band with a width proportional to (x - min)
and equal to width
characters when x = max
.
Arguments
x
— Size to display.min, max
— Integer constants. The value must fit inInt64
.width
— Constant, positive integer, can be fractional.
The band is drawn with accuracy to one eighth of a symbol.
Example:
transform
Transforms a value according to the explicitly defined mapping of some elements to other ones. There are two variations of this function:
transform(x, array_from, array_to, default)
x
– What to transform.
array_from
– Constant array of values for converting.
array_to
– Constant array of values to convert the values in ‘from’ to.
default
– Which value to use if ‘x’ is not equal to any of the values in ‘from’.
array_from
and array_to
– Arrays of the same size.
Types:
transform(T, Array(T), Array(U), U) -> U
T
and U
can be numeric, string, or Date or DateTime types. Where the same letter is indicated (T or U), for numeric types these might not be matching types, but types that have a common type. For example, the first argument can have the Int64 type, while the second has the Array(UInt16) type.
If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the existing element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘default’. If there are multiple matching elements in ‘array_from’, it returns one of the matches.
Example:
transform(x, array_from, array_to)
Differs from the first variation in that the ‘default’ argument is omitted. If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the matching element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘x’.
Types:
transform(T, Array(T), Array(T)) -> T
Example:
formatReadableDecimalSize(x)
Accepts the size (number of bytes). Returns a rounded size with a suffix (KB, MB, etc.) as a string.
Example:
formatReadableSize(x)
Accepts the size (number of bytes). Returns a rounded size with a suffix (KiB, MiB, etc.) as a string.
Example:
formatReadableQuantity(x)
Accepts the number. Returns a rounded number with a suffix (thousand, million, billion, etc.) as a string.
It is useful for reading big numbers by human.
Example:
formatReadableTimeDelta
Accepts the time delta in seconds. Returns a time delta with (year, month, day, hour, minute, second) as a string.
Syntax
Arguments
column
— A column with numeric time delta.maximum_unit
— Optional. Maximum unit to show. Acceptable values seconds, minutes, hours, days, months, years.
Example:
least(a, b)
Returns the smallest value from a and b.
greatest(a, b)
Returns the largest value of a and b.
uptime()
Returns the server’s uptime in seconds. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
version()
Returns the version of the server as a string. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
blockNumber
Returns the sequence number of the data block where the row is located.
rowNumberInBlock
Returns the ordinal number of the row in the data block. Different data blocks are always recalculated.
rowNumberInAllBlocks()
Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
neighbor
The window function that provides access to a row at a specified offset which comes before or after the current row of a given column.
Syntax
The result of the function depends on the affected data blocks and the order of data in the block.
WARNING
It can reach the neighbor rows only inside the currently processed data block.
The rows order used during the calculation of neighbor
can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
Arguments
column
— A column name or scalar expression.offset
— The number of rows forwards or backwards from the current row ofcolumn
. Int64.default_value
— Optional. The value to be returned if offset goes beyond the scope of the block. Type of data blocks affected.
Returned values
Value for
column
inoffset
distance from current row ifoffset
value is not outside block bounds.Default value for
column
ifoffset
value is outside block bounds. Ifdefault_value
is given, then it will be used.
Type: type of data blocks affected or default value type.
Example
Query:
Result:
Query:
Result:
This function can be used to compute year-over-year metric value:
Query:
Result:
runningDifference(x)
Calculates the difference between successive row values in the data block. Returns 0 for the first row and the difference from the previous row for each subsequent row.
WARNING
It can reach the previous row only inside the currently processed data block.
The result of the function depends on the affected data blocks and the order of data in the block.
The rows order used during the calculation of runningDifference
can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
Example:
Please note - block size affects the result. With each new block, the runningDifference
state is reset.
runningDifferenceStartingWithFirstValue
Same as for runningDifference, the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
runningConcurrency
Calculates the number of concurrent events. Each event has a start time and an end time. The start time is included in the event, while the end time is excluded. Columns with a start time and an end time must be of the same data type. The function calculates the total number of active (concurrent) events for each event start time.
WARNING
Events must be ordered by the start time in ascending order. If this requirement is violated the function raises an exception. Every data block is processed separately. If events from different data blocks overlap then they can not be processed correctly.
Syntax
Arguments
start
— A column with the start time of events. Date, DateTime, or DateTime64.end
— A column with the end time of events. Date, DateTime, or DateTime64.
Returned values
The number of concurrent events at each event start time.
Type: UInt32
Example
Consider the table:
Query:
Result:
MACNumToString(num)
Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).
MACStringToNum(s)
The inverse function of MACNumToString. If the MAC address has an invalid format, it returns 0.
MACStringToOUI(s)
Accepts a MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Returns the first three octets as a UInt64 number. If the MAC address has an invalid format, it returns 0.
getSizeOfEnumType
Returns the number of fields in Enum.
Arguments:
value
— Value of typeEnum
.
Returned values
The number of fields with
Enum
input values.An exception is thrown if the type is not
Enum
.
Example
blockSerializedSize
Returns size on disk (without taking into account compression).
Arguments
value
— Any value.
Returned values
The number of bytes that will be written to disk for block of values (without compression).
Example
Query:
Result:
toColumnTypeName
Returns the name of the class that represents the data type of the column in RAM.
Arguments:
value
— Any type of value.
Returned values
A string with the name of the class that is used for representing the
value
data type in RAM.
Example of the difference betweentoTypeName ' and ' toColumnTypeName
The example shows that the DateTime
data type is stored in memory as Const(UInt32)
.
dumpColumnStructure
Outputs a detailed description of data structures in RAM
Arguments:
value
— Any type of value.
Returned values
A string describing the structure that is used for representing the
value
data type in RAM.
Example
defaultValueOfArgumentType
Outputs the default value for the data type.
Does not include default values for custom columns set by the user.
Arguments:
expression
— Arbitrary type of value or an expression that results in a value of an arbitrary type.
Returned values
0
for numbers.Empty string for strings.
ᴺᵁᴸᴸ
for Nullable.
Example
defaultValueOfTypeName
Outputs the default value for given type name.
Does not include default values for custom columns set by the user.
Arguments:
type
— A string representing a type name.
Returned values
0
for numbers.Empty string for strings.
ᴺᵁᴸᴸ
for Nullable.
Example
indexHint
The function is intended for debugging and introspection purposes. The function ignores it's argument and always returns 1. Arguments are not even evaluated.
But for the purpose of index analysis, the argument of this function is analyzed as if it was present directly without being wrapped inside indexHint
function. This allows to select data in index ranges by the corresponding condition but without further filtering by this condition. The index in ClickHouse is sparse and using indexHint
will yield more data than specifying the same condition directly.
Syntax
Returned value
Type: Uint8.
Example
Here is the example of test data from the table ontime.
Input table:
The table has indexes on the fields (FlightDate, (Year, FlightDate))
.
Create a query, where the index is not used.
Query:
ClickHouse processed the entire table (Processed 4.28 million rows
).
Result:
To apply the index, select a specific date.
Query:
By using the index, ClickHouse processed a significantly smaller number of rows (Processed 32.74 thousand rows
).
Result:
Now wrap the expression k = '2017-09-15'
into indexHint
function.
Query:
ClickHouse used the index in the same way as the previous time (Processed 32.74 thousand rows
). The expression k = '2017-09-15'
was not used when generating the result. In examle the indexHint
function allows to see adjacent dates.
Result:
replicate
Creates an array with a single value.
Used for internal implementation of arrayJoin.
Arguments:
arr
— Original array. ClickHouse creates a new array of the same length as the original and fills it with the valuex
.x
— The value that the resulting array will be filled with.
Returned value
An array filled with the value x
.
Type: Array
.
Example
Query:
Result:
filesystemAvailable
Returns amount of remaining space on the filesystem where the files of the databases located. It is always smaller than total free space (filesystemFree) because some space is reserved for OS.
Syntax
Returned value
The amount of remaining space available in bytes.
Type: UInt64.
Example
Query:
Result:
filesystemFree
Returns total amount of the free space on the filesystem where the files of the databases located. See also filesystemAvailable
Syntax
Returned value
Amount of free space in bytes.
Type: UInt64.
Example
Query:
Result:
filesystemCapacity
Returns the capacity of the filesystem in bytes. For evaluation, the path to the data directory must be configured.
Syntax
Returned value
Capacity information of the filesystem in bytes.
Type: UInt64.
Example
Query:
Result:
initializeAggregation
Calculates result of aggregate function based on single value. It is intended to use this function to initialize aggregate functions with combinator -State. You can create states of aggregate functions and insert them to columns of type AggregateFunction or use initialized aggregates as default values.
Syntax
Arguments
aggregate_function
— Name of the aggregation function to initialize. String.arg
— Arguments of aggregate function.
Returned value(s)
Result of aggregation for every row passed to the function.
The return type is the same as the return type of function, that initializeAgregation
takes as first argument.
Example
Query:
Result:
Query:
Result:
Example with AggregatingMergeTree
table engine and AggregateFunction
column:
See Also
finalizeAggregation
Takes state of aggregate function. Returns result of aggregation (or finalized state when using-State combinator).
Syntax
Arguments
state
— State of aggregation. AggregateFunction.
Returned value(s)
Value/values that was aggregated.
Type: Value of any types that was aggregated.
Examples
Query:
Result:
Query:
Result:
Note that NULL
values are ignored.
Query:
Result:
Combined example:
Query:
Result:
See Also
runningAccumulate
Accumulates states of an aggregate function for each row of a data block.
WARNING
The state is reset for each new data block.
Syntax
Arguments
agg_state
— State of the aggregate function. AggregateFunction.grouping
— Grouping key. Optional. The state of the function is reset if thegrouping
value is changed. It can be any of the supported data types for which the equality operator is defined.
Returned value
Each resulting row contains a result of the aggregate function, accumulated for all the input rows from 0 to the current position.
runningAccumulate
resets states for each new data block or when thegrouping
value changes.
Type depends on the aggregate function used.
Examples
Consider how you can use runningAccumulate
to find the cumulative sum of numbers without and with grouping.
Query:
Result:
The subquery generates sumState
for every number from 0
to 9
. sumState
returns the state of the sum function that contains the sum of a single number.
The whole query does the following:
For the first row,
runningAccumulate
takessumState(0)
and returns0
.For the second row, the function merges
sumState(0)
andsumState(1)
resulting insumState(0 + 1)
, and returns1
as a result.For the third row, the function merges
sumState(0 + 1)
andsumState(2)
resulting insumState(0 + 1 + 2)
, and returns3
as a result.The actions are repeated until the block ends.
The following example shows the groupping
parameter usage:
Query:
Result:
As you can see, runningAccumulate
merges states for each group of rows separately.
joinGet
The function lets you extract data from the table the same way as from a dictionary.
Gets data from Join tables using the specified join key.
Only supports tables created with the ENGINE = Join(ANY, LEFT, <join_keys>)
statement.
Syntax
Arguments
join_storage_table_name
— an identifier indicates where search is performed. The identifier is searched in the default database (see parameterdefault_database
in the config file). To override the default database, use theUSE db_name
or specify the database and the table through the separatordb_name.db_table
, see the example.value_column
— name of the column of the table that contains required data.join_keys
— list of keys.
Returned value
Returns list of values corresponded to list of keys.
If certain does not exist in source table then 0
or null
will be returned based on join_use_nulls setting.
More info about join_use_nulls
in Join operation.
Example
Input table:
Query:
Result:
catboostEvaluate(path_to_model, feature_1, feature_2, …, feature_n)
Evaluate external catboost model. CatBoost is an open-source gradient boosting library developed by Yandex for machine learing. Accepts a path to a catboost model and model arguments (features). Returns Float64.
Prerequisites
Build the catboost evaluation library
Before evaluating catboost models, the libcatboostmodel.<so|dylib>
library must be made available. See CatBoost documentation how to compile it.
Next, specify the path to libcatboostmodel.<so|dylib>
in the clickhouse configuration:
For security and isolation reasons, the model evaluation does not run in the server process but in the clickhouse-library-bridge process. At the first execution of catboostEvaluate()
, the server starts the library bridge process if it is not running already. Both processes communicate using a HTTP interface. By default, port 9012
is used. A different port can be specified as follows - this is useful if port 9012
is already assigned to a different service.
Train a catboost model using libcatboost
See Training and applying models for how to train catboost models from a training data set.
throwIf(x[, message[, error_code]])
Throw an exception if the argument is non zero. message
- is an optional parameter: a constant string providing a custom error message error_code
- is an optional parameter: a constant integer providing a custom error code
To use the error_code
argument, configuration parameter allow_custom_error_code_in_throwif
must be enabled.
identity
Returns the same value that was used as its argument. Used for debugging and testing, allows to cancel using index, and get the query performance of a full scan. When query is analyzed for possible use of index, the analyzer does not look inside identity
functions. Also constant folding is not applied too.
Syntax
Example
Query:
Result:
getSetting
Returns the current value of a custom setting.
Syntax
Parameter
custom_setting
— The setting name. String.
Returned value
The setting current value.
Example
Result
See Also
isDecimalOverflow
Checks whether the Decimal value is out of its (or specified) precision.
Syntax
Arguments
d
— value. Decimal.p
— precision. Optional. If omitted, the initial precision of the first argument is used. Using of this paratemer could be helpful for data extraction to another DBMS or file. UInt8.
Returned values
1
— Decimal value has more digits then it's precision allow,0
— Decimal value satisfies the specified precision.
Example
Query:
Result:
countDigits
Returns number of decimal digits you need to represent the value.
Syntax
Arguments
Returned value
Number of digits.
Type: UInt8.
NOTE
For Decimal
values takes into account their scales: calculates result over underlying integer type which is (value * scale)
. For example: countDigits(42) = 2
, countDigits(42.000) = 5
, countDigits(0.04200) = 4
. I.e. you may check decimal overflow for Decimal64
with countDecimal(x) > 18
. It's a slow variant of isDecimalOverflow.
Example
Query:
Result:
errorCodeToName
Returned value
Variable name for the error code.
Type: LowCardinality(String).
Syntax
Result:
tcpPort
Returns native interface TCP port number listened by this server. If it is executed in the context of a distributed table, then it generates a normal column, otherwise it produces a constant value.
Syntax
Arguments
None.
Returned value
The TCP port number.
Type: UInt16.
Example
Query:
Result:
See Also
RANDOM NUMBER AND STRING
All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The only purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.
NOTE
Non-cryptographic generators of pseudo-random numbers are used.
rand, rand32
Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers.
Uses a linear congruential generator.
rand64
Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Uses a linear congruential generator.
randCanonical
The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1).
Non-deterministic. Return type is Float64.
randConstant
Produces a constant column with a random value.
Syntax
Arguments
x
— Expression resulting in any of the supported data types. The resulting value is discarded, but the expression itself if used for bypassing common subexpression elimination if the function is called multiple times in one query. Optional parameter.
Returned value
Pseudo-random number.
Type: UInt32.
Example
Query:
Result:
randomString
randomFixedString
randomPrintableASCII
randomStringUTF8
fuzzBits
Syntax
Inverts bits of s
, each with probability prob
.
Arguments
s
-String
orFixedString
prob
- constantFloat32/64
Returned value Fuzzed string with same as s type.
Example
Result:
REPLACING IN STRINGS
NOTE
Functions for searching and other manipulations with strings are described separately.
replaceOne(haystack, pattern, replacement)
Replaces the first occurrence of the substring ‘pattern’ (if it exists) in ‘haystack’ by the ‘replacement’ string. ‘pattern’ and ‘replacement’ must be constants.
replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
Replaces all occurrences of the substring ‘pattern’ in ‘haystack’ by the ‘replacement’ string.
replaceRegexpOne(haystack, pattern, replacement)
Replaces the first occurrence of the substring matching the regular expression ‘pattern’ in ‘haystack‘ by the ‘replacement‘ string. ‘pattern‘ must be a constant re2 regular expression. ‘replacement’ must be a plain constant string or a constant string containing substitutions \0-\9
. Substitutions \1-\9
correspond to the 1st to 9th capturing group (submatch), substitution \0
corresponds to the entire match. To use a verbatim \
character in the ‘pattern‘ or ‘replacement‘ string, escape it using \
. Also keep in mind that string literals require an extra escaping.
Example 1. Converting ISO dates to American format:
Example 2. Copying a string ten times:
replaceRegexpAll(haystack, pattern, replacement)
Like ‘replaceRegexpOne‘, but replaces all occurrences of the pattern. Example:
As an exception, if a regular expression worked on an empty substring, the replacement is not made more than once. Example:
regexpQuoteMeta(s)
The function adds a backslash before some predefined characters in the string. Predefined characters: \0
, \\
, |
, (
, )
, ^
, $
, .
, [
, ]
, ?
, *
, +
, {
, :
, -
. This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0
instead of \x00
and it escapes only required characters. For more information, see the link: RE2
ROUNDING
floor(x[, N])
Returns the largest round number that is less than or equal to x
. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N isn’t exact. ‘N’ is an integer constant, optional parameter. By default it is zero, which means to round to an integer. ‘N’ may be negative.
Examples: floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.
x
is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative N
value (for non-negative N
, the function does not do anything). If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned.
ceil(x[, N]), ceiling(x[, N])
Returns the smallest round number that is greater than or equal to x
. In every other way, it is the same as the floor
function (see above).
trunc(x[, N]), truncate(x[, N])
Returns the round number with largest absolute value that has an absolute value less than or equal to x
‘s. In every other way, it is the same as the ’floor’ function (see above).
round(x[, N])
Rounds a value to a specified number of decimal places.
The function returns the nearest number of the specified order. In case when given number has equal distance to surrounding numbers, the function uses banker’s rounding for float number types and rounds away from zero for the other number types (Decimal).
Arguments
expression
— A number to be rounded. Can be any expression returning the numeric data type.decimal-places
— An integer value.If
decimal-places > 0
then the function rounds the value to the right of the decimal point.If
decimal-places < 0
then the function rounds the value to the left of the decimal point.If
decimal-places = 0
then the function rounds the value to integer. In this case the argument can be omitted.
Returned value:
The rounded number of the same type as the input number.
Examples
Example of use with Float
Example of use with Decimal
Examples of rounding
Rounding to the nearest number.
Banker’s rounding.
See Also
roundBankers
Rounds a number to a specified decimal position.
If the rounding number is halfway between two numbers, the function uses banker’s rounding.
In other cases, the function rounds numbers to the nearest integer.
Using banker’s rounding, you can reduce the effect that rounding numbers has on the results of summing or subtracting these numbers.
For example, sum numbers 1.5, 2.5, 3.5, 4.5 with different rounding:
No rounding: 1.5 + 2.5 + 3.5 + 4.5 = 12.
Banker’s rounding: 2 + 2 + 4 + 4 = 12.
Rounding to the nearest integer: 2 + 3 + 4 + 5 = 14.
Syntax
Arguments
expression
— A number to be rounded. Can be any expression returning the numeric data type.decimal-places
— Decimal places. An integer number.decimal-places > 0
— The function rounds the number to the given position right of the decimal point. Example:roundBankers(3.55, 1) = 3.6
.decimal-places < 0
— The function rounds the number to the given position left of the decimal point. Example:roundBankers(24.55, -1) = 20
.decimal-places = 0
— The function rounds the number to an integer. In this case the argument can be omitted. Example:roundBankers(2.5) = 2
.
Returned value
A value rounded by the banker’s rounding method.
Examples
Example of use
Query:
Result:
Examples of Banker’s rounding
See Also
roundToExp2(num)
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two.
roundDuration(num)
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to numbers from the set: 1, 10, 30, 60, 120, 180, 240, 300, 600, 1200, 1800, 3600, 7200, 18000, 36000.
roundAge(num)
Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55.
roundDown(num, arr)
Accepts a number and rounds it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned.
SEARCHING IN STRINGS
NOTE
Functions for replacing and other manipulations with strings are described separately.
position(haystack, needle), locate(haystack, needle)
Searches for the substring needle
in the string haystack
.
Returns the position (in bytes) of the found substring in the string, starting from 1.
For a case-insensitive search, use the function positionCaseInsensitive.
Syntax
Alias: locate(haystack, needle[, start_pos])
.
NOTE
Syntax of position(needle IN haystack)
provides SQL-compatibility, the function works the same way as to position(haystack, needle)
.
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
– Position of the first character in the string to start search. UInt. Optional.
Returned values
Starting position in bytes (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Examples
The phrase “Hello, world!” contains a set of bytes representing a single-byte encoded text. The function returns some expected result:
Query:
Result:
The same phrase in Russian contains characters which can’t be represented using a single byte. The function returns some unexpected result (use positionUTF8 function for multi-byte encoded text):
Query:
Result:
Examples for POSITION(needle IN haystack) syntax
Query:
Result:
Query:
Result:
positionCaseInsensitive
The same as position returns the position (in bytes) of the found substring in the string, starting from 1. Use the function for a case-insensitive search.
Works under the assumption that the string contains a set of bytes representing a single-byte encoded text. If this assumption is not met and a character can’t be represented using a single byte, the function does not throw an exception and returns some unexpected result. If character can be represented using two bytes, it will use two bytes and so on.
Syntax
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt.
Returned values
Starting position in bytes (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Example
Query:
Result:
positionUTF8
Returns the position (in Unicode points) of the found substring in the string, starting from 1.
Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
For a case-insensitive search, use the function positionCaseInsensitiveUTF8.
Syntax
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt
Returned values
Starting position in Unicode points (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Examples
The phrase “Hello, world!” in Russian contains a set of Unicode points representing a single-point encoded text. The function returns some expected result:
Query:
Result:
The phrase “Salut, étudiante!”, where character é
can be represented using a one point (U+00E9
) or two points (U+0065U+0301
) the function can be returned some unexpected result:
Query for the letter é
, which is represented one Unicode point U+00E9
:
Result:
Query for the letter é
, which is represented two Unicode points U+0065U+0301
:
Result:
positionCaseInsensitiveUTF8
The same as positionUTF8, but is case-insensitive. Returns the position (in Unicode points) of the found substring in the string, starting from 1.
Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
Syntax
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt
Returned value
Starting position in Unicode points (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Example
Query:
Result:
multiSearchAllPositions
The same as position but returns Array
of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1.
The search is performed on sequences of bytes without respect to string encoding and collation.
For case-insensitive ASCII search, use the function
multiSearchAllPositionsCaseInsensitive
.For search in UTF-8, use the function multiSearchAllPositionsUTF8.
For case-insensitive UTF-8 search, use the function multiSearchAllPositionsCaseInsensitiveUTF8.
Syntax
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.
Returned values
Array of starting positions in bytes (counting from 1), if the corresponding substring was found and 0 if not found.
Example
Query:
Result:
multiSearchAllPositionsUTF8
See multiSearchAllPositions
.
multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])
The same as position
but returns the leftmost offset of the string haystack
that is matched to some of the needles.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8
.
multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])
Returns the index i
(starting from 1) of the leftmost found needlei in the string haystack
and 0 otherwise.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8
.
multiSearchAny(haystack, [needle1, needle2, …, needlen])
Returns 1, if at least one string needlei matches the string haystack
and 0 otherwise.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8
.
NOTE
In all multiSearch*
functions the number of needles should be less than 28 because of implementation specification.
match(haystack, pattern)
Checks whether the string matches the regular expression pattern
in re2
syntax. Re2
has a more limited syntax than Perl regular expressions.
Returns 0 if it does not match, or 1 if it matches.
Matching is based on UTF-8, e.g. .
matches the Unicode code point ¥
which is represented in UTF-8 using two bytes. The regular expression must not contain null bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.
For patterns to search for substrings in a string, it is better to use LIKE or ‘position’, since they work much faster.
multiMatchAny(haystack, [pattern1, pattern2, …, patternn])
The same as match
, but returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. It uses hyperscan library. For patterns to search substrings in a string, it is better to use multiSearchAny
since it works much faster.
NOTE
The length of any of the haystack
string must be less than 232 bytes otherwise the exception is thrown. This restriction takes place because of hyperscan API.
multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns any index that matches the haystack.
multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns the array of all indices that match the haystack in any order.
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns 1 if any pattern matches the haystack within a constant edit distance. This function relies on the experimental feature of hyperscan library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiFuzzyMatchAny
, but returns any index that matches the haystack within a constant edit distance.
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiFuzzyMatchAny
, but returns the array of all indices in any order that match the haystack within a constant edit distance.
NOTE
multiFuzzyMatch*
functions do not support UTF-8 regular expressions, and such expressions are treated as bytes because of hyperscan restriction.
NOTE
To turn off all functions that use hyperscan, use setting SET allow_hyperscan = 0;
.
extract(haystack, pattern)
Extracts a fragment of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.
extractAll(haystack, pattern)
Extracts all the fragments of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. Returns an array of strings consisting of all matches to the regex. In general, the behavior is the same as the ‘extract’ function (it takes the first subpattern, or the entire expression if there isn’t a subpattern).
extractAllGroupsHorizontal
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.
NOTE
extractAllGroupsHorizontal
function is slower than extractAllGroupsVertical.
Syntax
Arguments
haystack
— Input string. Type: String.pattern
— Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. Ifpattern
contains no groups, an exception is thrown. Type: String.
Returned value
Type: Array.
If haystack
does not match the pattern
regex, an array of empty arrays is returned.
Example
Query:
Result:
See Also
extractAllGroupsVertical
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack
.
Syntax
Arguments
haystack
— Input string. Type: String.pattern
— Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. Ifpattern
contains no groups, an exception is thrown. Type: String.
Returned value
Type: Array.
If haystack
does not match the pattern
regex, an empty array is returned.
Example
Query:
Result:
See Also
like(haystack, pattern), haystack LIKE pattern operator
Checks whether a string matches a simple regular expression. The regular expression can contain the metasymbols %
and _
.
%
indicates any quantity of any bytes (including zero characters).
_
indicates any one byte.
Use the backslash (\
) for escaping metasymbols. See the note on escaping in the description of the ‘match’ function.
Matching is based on UTF-8, e.g. _
matches the Unicode code point ¥
which is represented in UTF-8 using two bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.
For regular expressions like %needle%
, the code is more optimal and works as fast as the position
function. For other regular expressions, the code is the same as for the ‘match’ function.
notLike(haystack, pattern), haystack NOT LIKE pattern operator
The same thing as ‘like’, but negative.
ilike
Case insensitive variant of like function. You can use ILIKE
operator instead of the ilike
function.
The function ignores the language, e.g. for Turkish (i/İ), the result might be incorrect.
Syntax
Arguments
haystack
— Input string. String.pattern
— Ifpattern
does not contain percent signs or underscores, then thepattern
only represents the string itself. An underscore (_
) inpattern
stands for (matches) any single character. A percent sign (%
) matches any sequence of zero or more characters.
Some pattern
examples:
Returned values
True, if the string matches
pattern
.False, if the string does not match
pattern
.
Example
Input table:
Query:
Result:
See Also
ngramDistance(haystack, needle)
Calculates the 4-gram distance between haystack
and needle
: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 – the closer to zero, the more strings are similar to each other. If the constant needle
or haystack
is more than 32Kb, throws an exception. If some of the non-constant haystack
or needle
strings are more than 32Kb, the distance is always one.
For case-insensitive search or/and in UTF-8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8
.
ngramSearch(haystack, needle)
Same as ngramDistance
but calculates the non-symmetric difference between needle
and haystack
– the number of n-grams from needle minus the common number of n-grams normalized by the number of needle
n-grams. The closer to one, the more likely needle
is in the haystack
. Can be useful for fuzzy string search.
For case-insensitive search or/and in UTF-8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8
.
NOTE
For UTF-8 case we use 3-gram distance. All these are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair tolower
function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.
countSubstrings
Returns the number of substring occurrences.
For a case-insensitive search, use countSubstringsCaseInsensitive or countSubstringsCaseInsensitiveUTF8 functions.
Syntax
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
– Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
Result:
Query:
Result:
Query:
Result:
countSubstringsCaseInsensitive
Returns the number of substring occurrences case-insensitive.
Syntax
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
— Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
Result:
Query:
Result:
Query:
Result:
countSubstringsCaseInsensitiveUTF8
Returns the number of substring occurrences in UTF-8
case-insensitive.
Syntax
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
— Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
Result:
Query:
Result:
countMatches(haystack, pattern)
Returns the number of regular expression matches for a pattern
in a haystack
.
Syntax
Arguments
haystack
— The string to search in. String.pattern
— The regular expression with re2 syntax. String.
Returned value
The number of matches.
Type: UInt64.
Examples
Query:
Result:
Query:
Result:
SPLITTING AND MERGING
splitByChar(separator, s[, max_substrings])
Splits a string into substrings separated by a specified character. It uses a constant string separator
which consists of exactly one character. Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
Syntax
Arguments
separator
— The separator which should contain exactly one character. String.s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings. Empty substrings may be selected when:
A separator occurs at the beginning or end of the string;
There are multiple consecutive separators;
The original string
s
is empty.
Example
splitByString(separator, s[, max_substrings])
Splits a string into substrings separated by a string. It uses a constant string separator
of multiple characters as the separator. If the string separator
is empty, it will split the string s
into an array of single characters.
Syntax
Arguments
separator
— The separator. String.s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings. Empty substrings may be selected when:
A non-empty separator occurs at the beginning or end of the string;
There are multiple consecutive non-empty separators;
The original string
s
is empty while the separator is not empty.
Example
arrayStringConcat(arr[, separator])
Concatenates string representations of values listed in the array with the separator. separator
is an optional parameter: a constant string, set to an empty string by default. Returns the string.
alphaTokens(s[, max_substrings]), splitByAlpha(s[, max_substrings])
Selects substrings of consecutive bytes from the ranges a-z and A-Z.Returns an array of substrings.
Syntax
Arguments
s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings.
Example
extractAllGroups(text, regexp)
Extracts all groups from non-overlapping substrings matched by a regular expression.
Syntax
Arguments
text
— String or FixedString.regexp
— Regular expression. Constant. String or FixedString.
Returned values
If the function finds at least one matching group, it returns
Array(Array(String))
column, clustered by group_id (1 to N, where N is number of capturing groups inregexp
).If there is no matching group, returns an empty array.
Type: Array.
Example
Query:
Result:
STRINGS
NOTE
Functions for searching and replacing in strings are described separately.
empty
Checks whether the input string is empty.
Syntax
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays or UUID.
Arguments
x
— Input value. String.
Returned value
Returns
1
for an empty string or0
for a non-empty string.
Type: UInt8.
Example
Query:
Result:
notEmpty
Checks whether the input string is non-empty.
Syntax
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays or UUID.
Arguments
x
— Input value. String.
Returned value
Returns
1
for a non-empty string or0
for an empty string string.
Type: UInt8.
Example
Query:
Result:
length
Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64. The function also works for arrays.
lengthUTF8
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
char_length, CHAR_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
character_length, CHARACTER_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
leftPad
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD
function.
Syntax
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
Result:
leftPadUTF8
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD
function. While in the leftPad function the length is measured in bytes, here in the leftPadUTF8
function it is measured in code points.
Syntax
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
Result:
rightPad
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD
function.
Syntax
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
Result:
rightPadUTF8
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD
function. While in the rightPad function the length is measured in bytes, here in the rightPadUTF8
function it is measured in code points.
Syntax
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
Result:
lower, lcase
Converts ASCII Latin symbols in a string to lowercase.
upper, ucase
Converts ASCII Latin symbols in a string to uppercase.
lowerUTF8
Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.
upperUTF8
Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.
isValidUTF8
Returns 1, if the set of bytes is valid UTF-8 encoded, otherwise 0.
toValidUTF8
Replaces invalid UTF-8 characters by the �
(U+FFFD) character. All running in a row invalid characters are collapsed into the one replacement character.
Arguments
input_string
— Any set of bytes represented as the String data type object.
Returned value: Valid UTF-8 string.
Example
repeat
Repeats a string as many times as specified and concatenates the replicated values as a single string.
Alias: REPEAT
.
Syntax
Arguments
Returned value
The single string, which contains the string s
repeated n
times. If n
\< 1, the function returns empty string.
Type: String
.
Example
Query:
Result:
reverse
Reverses the string (as a sequence of bytes).
reverseUTF8
Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does something else (it does not throw an exception).
format(pattern, s0, s1, …)
Formatting constant pattern with the string listed in the arguments. pattern
is a simplified Python format pattern. Format string contains “replacement fields” surrounded by curly braces {}
. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ '{{' }}
and {{ '}}' }}
. Field names can be numbers (starting from zero) or empty (then they are treated as consequence numbers).
concat
Concatenates the strings listed in the arguments, without a separator.
Syntax
Arguments
Values of type String or FixedString.
Returned values
Returns the String that results from concatenating the arguments.
If any of argument values is NULL
, concat
returns NULL
.
Example
Query:
Result:
concatAssumeInjective
Same as concat, the difference is that you need to ensure that concat(s1, s2, ...) → sn
is injective, it will be used for optimization of GROUP BY.
The function is named “injective” if it always returns different result for different values of arguments. In other words: different arguments never yield identical result.
Syntax
Arguments
Values of type String or FixedString.
Returned values
Returns the String that results from concatenating the arguments.
If any of argument values is NULL
, concatAssumeInjective
returns NULL
.
Example
Input table:
Query:
Result:
substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)
Returns a substring starting with the byte from the ‘offset’ index that is ‘length’ bytes long. Character indexing starts from one (as in standard SQL).
substringUTF8(s, offset, length)
The same as ‘substring’, but for Unicode code points. Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception).
appendTrailingCharIfAbsent(s, c)
If the ‘s’ string is non-empty and does not contain the ‘c’ character at the end, it appends the ‘c’ character to the end.
convertCharset(s, from, to)
Returns the string ‘s’ that was converted from the encoding in ‘from’ to the encoding in ‘to’.
base58Encode(plaintext)
Accepts a String and encodes it using Base58 encoding scheme using "Bitcoin" alphabet.
Syntax
Arguments
plaintext
— String column or constant.
Returned value
A string containing encoded value of 1st argument.
Type: String.
Example
Query:
Result:
base64Encode(s)
Encodes ‘s’ FixedString or String into base64.
Alias: TO_BASE64
.
base64Decode(s)
Decode base64-encoded FixedString or String ‘s’ into original string. In case of failure raises an exception.
Alias: FROM_BASE64
.
tryBase64Decode(s)
Similar to base64Decode, but returns an empty string in case of error.
endsWith(s, suffix)
Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
startsWith(str, prefix)
Returns 1 whether string starts with the specified prefix, otherwise it returns 0.
Returned values
1, if the string starts with the specified prefix.
0, if the string does not start with the specified prefix.
Example
Query:
Result:
trim
Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.
Syntax
Arguments
Returned value
A string without leading and (or) trailing specified characters.
Type: String
.
Example
Query:
Result:
trimLeft
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
Alias: ltrim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without leading common whitespaces.
Type: String
.
Example
Query:
Result:
trimRight
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
Alias: rtrim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without trailing common whitespaces.
Type: String
.
Example
Query:
Result:
trimBoth
Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
Alias: trim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without leading and trailing common whitespaces.
Type: String
.
Example
Query:
Result:
CRC32(s)
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial and initial value 0xffffffff
(zlib implementation).
The result type is UInt32.
CRC32IEEE(s)
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial.
The result type is UInt32.
CRC64(s)
Returns the CRC64 checksum of a string, using CRC-64-ECMA polynomial.
The result type is UInt64.
normalizeQuery
Replaces literals, sequences of literals and complex aliases with placeholders.
Syntax
Arguments
x
— Sequence of characters. String.
Returned value
Sequence of characters with placeholders.
Type: String.
Example
Query:
Result:
normalizedQueryHash
Returns identical 64bit hash values without the values of literals for similar queries. It helps to analyze query log.
Syntax
Arguments
x
— Sequence of characters. String.
Returned value
Hash value.
Type: UInt64.
Example
Query:
Result:
normalizeUTF8NFC
Converts a string to NFC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFC normalization form.
Type: String.
Example
Query:
Result:
normalizeUTF8NFD
Converts a string to NFD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFD normalization form.
Type: String.
Example
Query:
Result:
normalizeUTF8NFKC
Converts a string to NFKC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFKC normalization form.
Type: String.
Example
Query:
Result:
normalizeUTF8NFKD
Converts a string to NFKD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFKD normalization form.
Type: String.
Example
Query:
Result:
encodeXMLComponent
Escapes characters to place string into XML text node or attribute.
The following five XML predefined entities will be replaced: <
, &
, >
, "
, '
.
Syntax
Arguments
x
— The sequence of characters. String.
Returned value
The sequence of characters with escape characters.
Type: String.
Example
Query:
Result:
decodeXMLComponent
Replaces XML predefined entities with characters. Predefined entities are "
&
'
>
<
This function also replaces numeric character references with Unicode characters. Both decimal (like ✓
) and hexadecimal (✓
) forms are supported.
Syntax
Arguments
x
— A sequence of characters. String.
Returned value
The sequence of characters after replacement.
Type: String.
Example
Query:
Result:
See Also
extractTextFromHTML
A function to extract text from HTML or XHTML. It does not necessarily 100% conform to any of the HTML, XML or XHTML standards, but the implementation is reasonably accurate and it is fast. The rules are the following:
Comments are skipped. Example:
<!-- test -->
. Comment must end with-->
. Nested comments are not possible. Note: constructions like<!-->
and<!--->
are not valid comments in HTML but they are skipped by other rules.CDATA is pasted verbatim. Note: CDATA is XML/XHTML specific. But it is processed for "best-effort" approach.
script
andstyle
elements are removed with all their content. Note: it is assumed that closing tag cannot appear inside content. For example, in JS string literal has to be escaped like"<\/script>"
. Note: comments and CDATA are possible insidescript
orstyle
- then closing tags are not searched inside CDATA. Example:<script><![CDATA[</script>]]></script>
. But they are still searched inside comments. Sometimes it becomes complicated:<script>var x = "<!--"; </script> var y = "-->"; alert(x + y);</script>
Note:script
andstyle
can be the names of XML namespaces - then they are not treated like usualscript
orstyle
elements. Example:<script:a>Hello</script:a>
. Note: whitespaces are possible after closing tag name:</script >
but not before:< / script>
.Other tags or tag-like elements are skipped without inner content. Example:
<a>.</a>
Note: it is expected that this HTML is illegal:<a test=">"></a>
Note: it also skips something like tags:<>
,<!>
, etc. Note: tag without end is skipped to the end of input:<hello
HTML and XML entities are not decoded. They must be processed by separate function.
Whitespaces in the text are collapsed or inserted by specific rules.
Whitespaces at the beginning and at the end are removed.
Consecutive whitespaces are collapsed.
But if the text is separated by other elements and there is no whitespace, it is inserted.
It may cause unnatural examples:
Hello<b>world</b>
,Hello<!-- -->world
- there is no whitespace in HTML, but the function inserts it. Also consider:Hello<p>world</p>
,Hello<br>world
. This behavior is reasonable for data analysis, e.g. to convert HTML to a bag of words.
Also note that correct handling of whitespaces requires the support of
<pre></pre>
and CSSdisplay
andwhite-space
properties.
Syntax
Arguments
x
— input text. String.
Returned value
Extracted text.
Type: String.
Example
The first example contains several tags and a comment and also shows whitespace processing. The second example shows CDATA
and script
tag processing. In the third example text is extracted from the full HTML response received by the url function.
Query:
Result:
TUPLES
tuple
A function that allows grouping multiple columns. For columns with the types T1, T2, …, it returns a Tuple(T1, T2, …) type tuple containing these columns. There is no cost to execute the function. Tuples are normally used as intermediate values for an argument of IN operators, or for creating a list of formal parameters of lambda functions. Tuples can’t be written to a table.
The function implements the operator (x, y, …)
.
Syntax
tupleElement
A function that allows getting a column from a tuple. ‘N’ is the column index, starting from 1. ‘N’ must be a constant. ‘N’ must be a strict postive integer no greater than the size of the tuple. There is no cost to execute the function.
The function implements the operator x.N
.
Syntax
untuple
Performs syntactic substitution of tuple elements in the call location.
Syntax
You can use the EXCEPT
expression to skip columns as a result of the query.
Arguments
x
— Atuple
function, column, or tuple of elements. Tuple.
Returned value
None.
Examples
Input table:
Example of using a Tuple
-type column as the untuple
function parameter:
Query:
Result:
Note: the names are implementation specific and are subject to change. You should not assume specific names of the columns after application of the untuple
.
Example of using an EXCEPT
expression:
Query:
Result:
See Also
tupleHammingDistance
Returns the Hamming Distance between two tuples of the same size.
Syntax
Arguments
Tuples should have the same type of the elements.
Returned value
The Hamming distance.
Type: The result type is calculed the same way it is for Arithmetic functions, based on the number of elements in the input tuples.
Examples
Query:
Result:
Can be used with MinHash functions for detection of semi-duplicate strings:
Result:
TYPE CONVERSION
Common Issues of Numeric Conversions
When you convert a value from one to another data type, you should remember that in common case, it is an unsafe operation that can lead to a data loss. A data loss can occur if you try to fit value from a larger data type to a smaller data type, or if you convert values between different data types.
ClickHouse has the same behavior as C++ programs.
toInt(8|16|32|64|128|256)
Converts an input value to the Int data type. This function family includes:
toInt8(expr)
— Results in theInt8
data type.toInt16(expr)
— Results in theInt16
data type.toInt32(expr)
— Results in theInt32
data type.toInt64(expr)
— Results in theInt64
data type.toInt128(expr)
— Results in theInt128
data type.toInt256(expr)
— Results in theInt256
data type.
Arguments
expr
— Expression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.
Returned value
Integer value in the Int8
, Int16
, Int32
, Int64
, Int128
or Int256
data type.
Functions use rounding towards zero, meaning they truncate fractional digits of numbers.
The behavior of functions for the NaN and Inf arguments is undefined. Remember about numeric conversions issues, when using the functions.
Example
Query:
Result:
toInt(8|16|32|64|128|256)OrZero
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns 0.
Example
Query:
Result:
toInt(8|16|32|64|128|256)OrNull
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns NULL.
Example
Query:
Result:
toInt(8|16|32|64|128|256)OrDefault
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns the default type value.
Example
Query:
Result:
toUInt(8|16|32|64|256)
Converts an input value to the UInt data type. This function family includes:
toUInt8(expr)
— Results in theUInt8
data type.toUInt16(expr)
— Results in theUInt16
data type.toUInt32(expr)
— Results in theUInt32
data type.toUInt64(expr)
— Results in theUInt64
data type.toUInt256(expr)
— Results in theUInt256
data type.
Arguments
expr
— Expression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.
Returned value
Integer value in the UInt8
, UInt16
, UInt32
, UInt64
or UInt256
data type.
Functions use rounding towards zero, meaning they truncate fractional digits of numbers.
The behavior of functions for negative arguments and for the NaN and Inf arguments is undefined. If you pass a string with a negative number, for example '-32'
, ClickHouse raises an exception. Remember about numeric conversions issues, when using the functions.
Example
Query:
Result:
toUInt(8|16|32|64|256)OrZero
toUInt(8|16|32|64|256)OrNull
toUInt(8|16|32|64|256)OrDefault
toFloat(32|64)
toFloat(32|64)OrZero
toFloat(32|64)OrNull
toFloat(32|64)OrDefault
toDate
Converts the argument to Date
data type.
If the argument is DateTime
or DateTime64
, it truncates it, leaving the date component of the DateTime:
If the argument is a string, it is parsed as Date or DateTime. If it was parsed as DateTime, the date component is being used:
If the argument is a number and it looks like a UNIX timestamp (is greater than 65535), it is interpreted as a DateTime, then truncated to Date in the current timezone. The timezone argument can be specified as a second argument of the function. The truncation to Date depends on the timezone:
The example above demonstrates how the same UNIX timestamp can be interpreted as different dates in different time zones.
If the argument is a number and it is smaller than 65536, it is interpreted as the number of days since 1970-01-01 (a UNIX day) and converted to Date. It corresponds to the internal numeric representation of the Date
data type. Example:
This conversion does not depend on timezones.
If the argument does not fit in the range of the Date type, it results in an implementation-defined behavior, that can saturate to the maximum supported date or overflow:
The function toDate
can be also written in alternative forms:
Have a nice day working with dates and times.
toDateOrZero
toDateOrNull
toDateOrDefault
toDateTime
toDateTimeOrZero
toDateTimeOrNull
toDateTimeOrDefault
toDate32
Converts the argument to the Date32 data type. If the value is outside the range, toDate32
returns the border values supported by Date32
. If the argument has Date type, borders of Date
are taken into account.
Syntax
Arguments
Returned value
A calendar date.
Type: Date32.
Example
The value is within the range:
The value is outside the range:
With
Date
-type argument:
toDate32OrZero
The same as toDate32 but returns the min value of Date32 if an invalid argument is received.
Example
Query:
Result:
toDate32OrNull
The same as toDate32 but returns NULL
if an invalid argument is received.
Example
Query:
Result:
toDate32OrDefault
Converts the argument to the Date32 data type. If the value is outside the range, toDate32OrDefault
returns the lower border value supported by Date32
. If the argument has Date type, borders of Date
are taken into account. Returns default value if an invalid argument is received.
Example
Query:
Result:
toDateTime64
Converts the argument to the DateTime64 data type.
Syntax
Arguments
scale
- Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ].timezone
- Time zone of the specified datetime64 object.
Returned value
A calendar date and time of day, with sub-second precision.
Type: DateTime64.
Example
The value is within the range:
As decimal with precision:
Without the decimal point the value is still treated as Unix Timestamp in seconds:
With
timezone
:
toDecimal(32|64|128|256)
Converts value
to the Decimal data type with precision of S
. The value
can be a number or a string. The S
(scale) parameter specifies the number of decimal places.
toDecimal32(value, S)
toDecimal64(value, S)
toDecimal128(value, S)
toDecimal256(value, S)
toDecimal(32|64|128|256)OrNull
Converts an input string to a Nullable(Decimal(P,S)) data type value. This family of functions includes:
toDecimal32OrNull(expr, S)
— Results inNullable(Decimal32(S))
data type.toDecimal64OrNull(expr, S)
— Results inNullable(Decimal64(S))
data type.toDecimal128OrNull(expr, S)
— Results inNullable(Decimal128(S))
data type.toDecimal256OrNull(expr, S)
— Results inNullable(Decimal256(S))
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a NULL
value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Nullable(Decimal(P,S))
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.NULL
, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Examples
Query:
Result:
Query:
Result:
toDecimal(32|64|128|256)OrDefault
Converts an input string to a Decimal(P,S) data type value. This family of functions includes:
toDecimal32OrDefault(expr, S)
— Results inDecimal32(S)
data type.toDecimal64OrDefault(expr, S)
— Results inDecimal64(S)
data type.toDecimal128OrDefault(expr, S)
— Results inDecimal128(S)
data type.toDecimal256OrDefault(expr, S)
— Results inDecimal256(S)
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a default value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Decimal(P,S)
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.Default
Decimal(P,S)
data type value, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Examples
Query:
Result:
Query:
Result:
toDecimal(32|64|128|256)OrZero
Converts an input value to the Decimal(P,S) data type. This family of functions includes:
toDecimal32OrZero( expr, S)
— Results inDecimal32(S)
data type.toDecimal64OrZero( expr, S)
— Results inDecimal64(S)
data type.toDecimal128OrZero( expr, S)
— Results inDecimal128(S)
data type.toDecimal256OrZero( expr, S)
— Results inDecimal256(S)
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a 0
value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Nullable(Decimal(P,S))
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.0 with
S
decimal places, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Example
Query:
Result:
Query:
Result:
toString
Functions for converting between numbers, strings (but not fixed strings), dates, and dates with times. All these functions accept one argument.
When converting to or from a string, the value is formatted or parsed using the same rules as for the TabSeparated format (and almost all other text formats). If the string can’t be parsed, an exception is thrown and the request is canceled.
When converting dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix epoch. When converting dates with times to numbers or vice versa, the date with time corresponds to the number of seconds since the beginning of the Unix epoch.
The date and date-with-time formats for the toDate/toDateTime functions are defined as follows:
As an exception, if converting from UInt32, Int32, UInt64, or Int64 numeric types to Date, and if the number is greater than or equal to 65536, the number is interpreted as a Unix timestamp (and not as the number of days) and is rounded to the date. This allows support for the common occurrence of writing ‘toDate(unix_timestamp)’, which otherwise would be an error and would require writing the more cumbersome ‘toDate(toDateTime(unix_timestamp))’.
Conversion between a date and a date with time is performed the natural way: by adding a null time or dropping the time.
Conversion between numeric types uses the same rules as assignments between different numeric types in C++.
Additionally, the toString function of the DateTime argument can take a second String argument containing the name of the time zone. Example: Asia/Yekaterinburg
In this case, the time is formatted according to the specified time zone.
Example
Query:
Result:
Also see the toUnixTimestamp
function.
toFixedString(s, N)
Converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is padded with null bytes to the right. If the string has more bytes than N, an exception is thrown.
toStringCutToZero(s)
Accepts a String or FixedString argument. Returns the String with the content truncated at the first zero byte found.
Example
Query:
Result:
Query:
Result:
reinterpretAsUInt(8|16|32|64)
reinterpretAsInt(8|16|32|64)
reinterpretAsFloat(32|64)
reinterpretAsDate
reinterpretAsDateTime
These functions accept a string and interpret the bytes placed at the beginning of the string as a number in host order (little endian). If the string isn’t long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date is interpreted as the number of days since the beginning of the Unix Epoch, and a date with time is interpreted as the number of seconds since the beginning of the Unix Epoch.
reinterpretAsString
This function accepts a number or date or date with time and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.
reinterpretAsFixedString
This function accepts a number or date or date with time and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.
reinterpretAsUUID
Accepts 16 bytes string and returns UUID containing bytes representing the corresponding value in network byte order (big-endian). If the string isn't long enough, the function works as if the string is padded with the necessary number of null bytes to the end. If the string is longer than 16 bytes, the extra bytes at the end are ignored.
Syntax
Arguments
fixed_string
— Big-endian byte string. FixedString.
Returned value
The UUID type value. UUID.
Examples
String to UUID.
Query:
Result:
Going back and forth from String to UUID.
Query:
Result:
reinterpret(x, T)
Uses the same source in-memory bytes sequence for x
value and reinterprets it to destination type.
Syntax
Arguments
x
— Any type.type
— Destination type. String.
Returned value
Destination type value.
Examples
Query:
Result:
CAST(x, T)
Converts an input value to the specified data type. Unlike the reinterpret function, CAST
tries to present the same value using the new data type. If the conversion can not be done then an exception is raised. Several syntax variants are supported.
Syntax
Arguments
x
— A value to convert. May be of any type.T
— The name of the target data type. String.t
— The target data type.
Returned value
Converted value.
NOTE
If the input value does not fit the bounds of the target type, the result overflows. For example, CAST(-1, 'UInt8')
returns 255
.
Examples
Query:
Result:
Query:
Result:
Conversion to FixedString(N) only works for arguments of type String or FixedString.
Type conversion to Nullable and back is supported.
Example
Query:
Result:
Query:
Result:
See also
cast_keep_nullable setting
accurateCast(x, T)
Converts x
to the T
data type.
The difference from cast(x, T) is that accurateCast
does not allow overflow of numeric types during cast if type value x
does not fit the bounds of type T
. For example, accurateCast(-1, 'UInt8')
throws an exception.
Example
Query:
Result:
Query:
Result:
accurateCastOrNull(x, T)
Converts input value x
to the specified data type T
. Always returns Nullable type and returns NULL if the casted value is not representable in the target type.
Syntax
Parameters
x
— Input value.T
— The name of the returned data type.
Returned value
The value, converted to the specified data type
T
.
Example
Query:
Result:
Query:
Result:
accurateCastOrDefault(x, T[, default_value])
Converts input value x
to the specified data type T
. Returns default type value or default_value
if specified if the casted value is not representable in the target type.
Syntax
Parameters
x
— Input value.T
— The name of the returned data type.default_value
— Default value of returned data type.
Returned value
The value converted to the specified data type
T
.
Example
Query:
Result:
Query:
Result:
toInterval(Year|Quarter|Month|Week|Day|Hour|Minute|Second)
Converts a Number type argument to an Interval data type.
Syntax
Arguments
number
— Duration of interval. Positive integer number.
Returned values
The value in
Interval
data type.
Example
Query:
Result:
parseDateTimeBestEffort
parseDateTime32BestEffort
Converts a date and time in the String representation to DateTime data type.
The function parses ISO 8601, RFC 1123 - 5.2.14 RFC-822 Date and Time Specification, ClickHouse’s and some other date and time formats.
Syntax
Arguments
time_string
— String containing a date and time to convert. String.time_zone
— Time zone. The function parsestime_string
according to the time zone. String.
Supported non-standard formats
A string containing 9..10 digit unix timestamp.
A string with a date and a time component:
YYYYMMDDhhmmss
,DD/MM/YYYY hh:mm:ss
,DD-MM-YY hh:mm
,YYYY-MM-DD hh:mm:ss
, etc.A string with a date, but no time component:
YYYY
,YYYYMM
,YYYY*MM
,DD/MM/YYYY
,DD-MM-YY
etc.A string with a day and time:
DD
,DD hh
,DD hh:mm
. In this caseYYYY-MM
are substituted as2000-01
.A string that includes the date and time along with time zone offset information:
YYYY-MM-DD hh:mm:ss ±h:mm
, etc. For example,2020-12-12 17:36:00 -5:00
.
For all of the formats with separator the function parses months names expressed by their full name or by the first three letters of a month name. Examples: 24/DEC/18
, 24-Dec-18
, 01-September-2018
.
Returned value
time_string
converted to theDateTime
data type.
Examples
Query:
Result:
Query:
Result:
Query:
Result:
Query:
Result:
Query:
Result:
See Also
parseDateTimeBestEffortUS
This function behaves like parseDateTimeBestEffort for ISO date formats, e.g. YYYY-MM-DD hh:mm:ss
, and other date formats where the month and date components can be unambiguously extracted, e.g. YYYYMMDDhhmmss
, YYYY-MM
, DD hh
, or YYYY-MM-DD hh:mm:ss ±h:mm
. If the month and the date components cannot be unambiguously extracted, e.g. MM/DD/YYYY
, MM-DD-YYYY
, or MM-DD-YY
, it prefers the US date format instead of DD/MM/YYYY
, DD-MM-YYYY
, or DD-MM-YY
. As an exception from the latter, if the month is bigger than 12 and smaller or equal than 31, this function falls back to the behavior of parseDateTimeBestEffort, e.g. 15/08/2020
is parsed as 2020-08-15
.
parseDateTimeBestEffortOrNull
parseDateTime32BestEffortOrNull
Same as for parseDateTimeBestEffort except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTimeBestEffortOrZero
parseDateTime32BestEffortOrZero
Same as for parseDateTimeBestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
parseDateTimeBestEffortUSOrNull
Same as parseDateTimeBestEffortUS function except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTimeBestEffortUSOrZero
Same as parseDateTimeBestEffortUS function except that it returns zero date (1970-01-01
) or zero date with time (1970-01-01 00:00:00
) when it encounters a date format that cannot be processed.
parseDateTime64BestEffort
Same as parseDateTimeBestEffort function but also parse milliseconds and microseconds and returns DateTime data type.
Syntax
Parameters
time_string
— String containing a date or date with time to convert. String.precision
— Required precision.3
— for milliseconds,6
— for microseconds. Default —3
. Optional. UInt8.
Returned value
time_string
converted to the DateTime data type.
Examples
Query:
Result:
parseDateTime64BestEffortUS
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity.
parseDateTime64BestEffortOrNull
Same as for parseDateTime64BestEffort except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTime64BestEffortOrZero
Same as for parseDateTime64BestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
parseDateTime64BestEffortUSOrNull
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity and returns NULL
when it encounters a date format that cannot be processed.
parseDateTime64BestEffortUSOrZero
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity and returns zero date or zero date time when it encounters a date format that cannot be processed.
toLowCardinality
Converts input parameter to the LowCardinality version of same data type.
To convert data from the LowCardinality
data type use the CAST function. For example, CAST(x as String)
.
Syntax
Arguments
expr
— Expression resulting in one of the supported data types.
Returned values
Result of
expr
.
Type: LowCardinality(expr_result_type)
Example
Query:
Result:
toUnixTimestamp64Milli
toUnixTimestamp64Micro
toUnixTimestamp64Nano
Converts a DateTime64
to a Int64
value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision.
NOTE
The output value is a timestamp in UTC, not in the timezone of DateTime64
.
Syntax
Arguments
value
— DateTime64 value with any precision.
Returned value
value
converted to theInt64
data type.
Examples
Query:
Result:
Query:
Result:
fromUnixTimestamp64Milli
fromUnixTimestamp64Micro
fromUnixTimestamp64Nano
Converts an Int64
to a DateTime64
value with fixed sub-second precision and optional timezone. Input value is scaled up or down appropriately depending on it’s precision. Please note that input value is treated as UTC timestamp, not timestamp at given (or implicit) timezone.
Syntax
Arguments
value
—Int64
value with any precision.timezone
—String
(optional) timezone name of the result.
Returned value
value
converted to theDateTime64
data type.
Example
Query:
Result:
formatRow
Converts arbitrary expressions into a string via given format.
Syntax
Arguments
x
,y
, ... — Expressions.
Returned value
A formatted string. (for text formats it's usually terminated with the new line character).
Example
Query:
Result:
Note: If format contains suffix/prefix, it will be written in each row.
Example
Query:
Result:
Note: Only row-based formats are supported in this function.
formatRowNoNewline
Converts arbitrary expressions into a string via given format. Differs from formatRow in that this function trims the last if any.
Syntax
Arguments
x
,y
, ... — Expressions.
Returned value
A formatted string.
Example
Query:
Result:
URLs
All these functions do not follow the RFC. They are maximally simplified for improved performance.
protocol
Extracts the protocol from a URL.
Examples of typical returned values: http, https, ftp, mailto, tel, magnet…
domain
Extracts the hostname from a URL.
Arguments
url
— URL. Type: String.
The URL can be specified with or without a scheme. Examples:
For these examples, the domain
function returns the following results:
Returned values
Host name. If ClickHouse can parse the input string as a URL.
Empty string. If ClickHouse can’t parse the input string as a URL.
Type: String
.
Example
domainWithoutWWW
Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present.
topLevelDomain
Extracts the the top-level domain from a URL.
Arguments
url
— URL. Type: String.
The URL can be specified with or without a scheme. Examples:
Returned values
Domain name. If ClickHouse can parse the input string as a URL.
Empty string. If ClickHouse cannot parse the input string as a URL.
Type: String
.
Example
firstSignificantSubdomain
Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’
. The list of “insignificant” second-level domains and other implementation details may change in the future.
cutToFirstSignificantSubdomain
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above).
For example:
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'
.cutToFirstSignificantSubdomain('www.tr') = 'tr'
.cutToFirstSignificantSubdomain('tr') = ''
.
cutToFirstSignificantSubdomainWithWWW
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www".
For example:
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'
.cutToFirstSignificantSubdomain('www.tr') = 'www.tr'
.cutToFirstSignificantSubdomain('tr') = ''
.
cutToFirstSignificantSubdomainCustom
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
Syntax
Parameters
Returned value
Part of the domain that includes top-level subdomains up to the first significant subdomain.
Type: String.
Example
Query:
Result:
See Also
cutToFirstSignificantSubdomainCustomWithWWW
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping www
. Accepts custom TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
Syntax
Parameters
Returned value
Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping
www
.
Type: String.
Example
Query:
Result:
See Also
firstSignificantSubdomainCustom
Returns the first significant subdomain. Accepts customs TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
Syntax
Parameters
Returned value
First significant subdomain.
Type: String.
Example
Query:
Result:
See Also
port(URL[, default_port = 0])
Returns the port or default_port
if there is no port in the URL (or in case of validation error).
path
Returns the path. Example: /top/news.html
The path does not include the query string.
pathFull
The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments
queryString
Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #.
fragment
Returns the fragment identifier. fragment does not include the initial hash symbol.
queryStringAndFragment
Returns the query string and fragment identifier. Example: page=1#29390.
extractURLParameter(URL, name)
Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.
extractURLParameters(URL)
Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
extractURLParameterNames(URL)
Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.
URLHierarchy(URL)
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.
URLPathHierarchy(URL)
The same as above, but without the protocol and host in the result. The / element (root) is not included.
decodeURLComponent(URL)
Returns the decoded URL. Example:
netloc
Extracts network locality (username:password@host:port
) from a URL.
Syntax
Arguments
url
— URL. String.
Returned value
username:password@host:port
.
Type: String
.
Example
Query:
Result:
cutWWW
Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present.
cutQueryString
Removes query string. The question mark is also removed.
cutFragment
Removes the fragment identifier. The number sign is also removed.
cutQueryStringAndFragment
Removes the query string and fragment identifier. The question mark and number sign are also removed.
cutURLParameter(URL, name)
Removes the name
parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. Client ID
and Client%20ID
are treated as different parameter names.
Syntax
Arguments
url
— URL. String.
Returned value
URL with
name
URL parameter removed.
Type: String
.
Example
Query:
Result:
UUID
The functions for working with UUID are listed below.
generateUUIDv4
Generates the UUID of version 4.
Syntax
Arguments
x
— Expression resulting in any of the supported data types. The resulting value is discarded, but the expression itself if used for bypassing common subexpression elimination if the function is called multiple times in one query. Optional parameter.
Returned value
The UUID type value.
Usage example
This example demonstrates creating a table with the UUID type column and inserting a value into the table.
Usage example if it is needed to generate multiple values in one row
toUUID (x)
Converts String type value to UUID type.
Returned value
The UUID type value.
Usage example
toUUIDOrNull (x)
It takes an argument of type String and tries to parse it into UUID. If failed, returns NULL.
Returned value
The Nullable(UUID) type value.
Usage example
toUUIDOrZero (x)
It takes an argument of type String and tries to parse it into UUID. If failed, returns zero UUID.
Returned value
The UUID type value.
Usage example
UUIDStringToNum
Accepts string
containing 36 characters in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
, and returns a FixedString(16) as its binary representation, with its format optionally specified by variant
(Big-endian
by default).
Syntax
Arguments
string
— String of 36 characters or FixedString(36). String.variant
— Integer, representing a variant as specified by RFC4122. 1 =Big-endian
(default), 2 =Microsoft
.
Returned value
FixedString(16)
Usage examples
UUIDNumToString
Accepts binary
containing a binary representation of a UUID, with its format optionally specified by variant
(Big-endian
by default), and returns a string containing 36 characters in text format.
Syntax
Arguments
binary
— FixedString(16) as a binary representation of a UUID.variant
— Integer, representing a variant as specified by RFC4122. 1 =Big-endian
(default), 2 =Microsoft
.
Returned value
String.
Usage example
Last updated