Functions
There are at least* two types of functions - regular functions (they are just called “functions”) and aggregate functions. These are completely different concepts. Regular functions work as if they are applied to each row separately (for each row, the result of the function does not depend on the other rows). Aggregate functions accumulate a set of values from various rows (i.e. they depend on the entire set of rows).
In this section we discuss regular functions. For aggregate functions, see the section “Aggregate functions”.
* - There is a third type of function that the ‘arrayJoin’ function belongs to; table functions can also be mentioned separately.*
ARITHMETIC
For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it floats. If there are not enough bits, the highest bit type is taken.
Example
SELECT toTypeName(0), toTypeName(0 + 0), toTypeName(0 + 0 + 0), toTypeName(0 + 0 + 0 + 0)
┌─toTypeName(0)─┬─toTypeName(plus(0, 0))─┬─toTypeName(plus(plus(0, 0), 0))─┬─toTypeName(plus(plus(plus(0, 0), 0), 0))─┐
│ UInt8 │ UInt16 │ UInt32 │ UInt64 │
└───────────────┴────────────────────────┴─────────────────────────────────┴──────────────────────────────────────────┘
Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.
Overflow is produced the same way as in C++.
plus(a, b), a + b operator
Calculates the sum of the numbers. You can also add integer numbers with a date or date and time. In the case of a date, adding an integer means adding the corresponding number of days. For a date with time, it means adding the corresponding number of seconds.
Example
"plus(1,2) = 3"
minus(a, b), a - b operator
Calculates the difference. The result is always signed.
You can also calculate integer numbers from a date or date with time. The idea is the same – see above for ‘plus’.
Example
"minus(5,2) = 3"
multiply(a, b), a * b operator
Calculates the product of the numbers.
Example
"divide(50,2) = 2.5e+01"
divide(a, b), a / b operator
Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the ‘intDiv’ function. When dividing by zero you get ‘inf’, ‘-inf’, or ‘nan’.
Example
"divide(50,2) = 2.5e+01"
intDiv(a, b)
Calculates the quotient of the numbers. Divides into integers, rounding down (by the absolute value). An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
"intDiv(10, -2) = -5"
intDivOrZero(a, b)
Differs from ‘intDiv’ in that it returns zero when dividing by zero or when dividing a minimal negative number by minus one.
Example
"intDivOrZero(10, -2) = -5"
modulo(a, b), a % b operator
Calculates the remainder when dividing a
by b
. The result type is an integer if both inputs are integers. If one of the inputs is a floating-point number, the result is a floating-point number. The remainder is computed like in C++. Truncated division is used for negative numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
"modulo(10, 3) = 1"
moduloOrZero(a, b)
Differs from modulo in that it returns zero when the divisor is zero.
Example
"moduloOrZero(10, 5) = 0"
negate(a), -a operator
Calculates a number with the reverse sign. The result is always signed.
Example
"negate(20) = -20"
abs(a)
Calculates the absolute value of the number (a). That is, if a \< 0, it returns -a. For unsigned types it does not do anything. For signed integer types, it returns an unsigned number.
Example
"abs(-2) = 2"
gcd(a, b)
Returns the greatest common divisor of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
"gcd(27,18) = 9"
lcm(a, b)
Returns the least common multiple of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Example
"lcm(27,18) = 54"
ARRAY
empty
Checks whether the input array is empty.
Syntax
empty([x])
An array is considered empty if it does not contain any elements.
NOTE
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT empty(arr) FROM TABLE;
transforms to SELECT arr.size0 = 0 FROM TABLE;
.
The function also works for strings or UUID.
Arguments
[x]
— Input array. Array.
Returned value
Returns
1
for an empty array or0
for a non-empty array.
Type: UInt8.
Example
Query:
SELECT empty([]);
Result:
┌─empty(array())─┐
│ 1 │
└────────────────┘
notEmpty
Checks whether the input array is non-empty.
Syntax
notEmpty([x])
An array is considered non-empty if it contains at least one element.
NOTE
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT notEmpty(arr) FROM table
transforms to SELECT arr.size0 != 0 FROM TABLE
.
The function also works for strings or UUID.
Arguments
[x]
— Input array. Array.
Returned value
Returns
1
for a non-empty array or0
for an empty array.
Type: UInt8.
Example
Query:
SELECT notEmpty([1,2]);
Result:
┌─notEmpty([1, 2])─┐
│ 1 │
└──────────────────┘
length
Returns the number of items in the array. The result type is UInt64. The function also works for strings.
Syntax
length(string)
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only size0 subcolumn instead of reading and processing the whole array column. The query SELECT length(arr) FROM table
transforms to SELECT arr.size0 FROM TABLE
.
Example
length(\"ABC Corporation\")
emptyArrayUInt8, emptyArrayUInt16, emptyArrayUInt32, emptyArrayUInt64
emptyArrayInt8, emptyArrayInt16, emptyArrayInt32, emptyArrayInt64
emptyArrayFloat32, emptyArrayFloat64
emptyArrayDate, emptyArrayDateTime
emptyArrayString
Accepts zero arguments and returns an empty array of the appropriate type.
emptyArrayToSingle
Accepts an empty array and returns a one-element array that is equal to the default value.
range(end), range([start, ] end [, step])
Returns an array of UInt
numbers from start
to end - 1
by step
.
Syntax
range([start, ] end [, step])
Arguments
start
— The first element of the array. Optional, required ifstep
is used. Default value: 0. UIntend
— The number before which the array is constructed. Required. UIntstep
— Determines the incremental step between each element in the array. Optional. Default value: 1. UInt
Returned value
Array of
UInt
numbers fromstart
toend - 1
bystep
.
Implementation details
All arguments must be positive values:
start
,end
,step
areUInt
data types, as well as elements of the returned array.An exception is thrown if query results in arrays with a total length of more than number of elements specified by the function_range_max_elements_in_block setting.
Examples
Query:
SELECT range(5), range(1, 5), range(1, 5, 2);
Result:
┌─range(5)────┬─range(1, 5)─┬─range(1, 5, 2)─┐
│ [0,1,2,3,4] │ [1,2,3,4] │ [1,3] │
└─────────────┴─────────────┴────────────────┘
array(x1, …), operator [x1, …]
Creates an array from the function arguments. The arguments must be constants and have types that have the smallest common type. At least one argument must be passed, because otherwise it isn’t clear which type of array to create. That is, you can’t use this function to create an empty array (to do that, use the ‘emptyArray*’ function described above). Returns an ‘Array(T)’ type result, where ‘T’ is the smallest common type out of the passed arguments.
Example
SELECT array(1,2,3);
arrayConcat
Combines arrays passed as arguments.
arrayConcat(arrays)
Arguments
arrays
– Arbitrary number of arguments of Array type. Example
SELECT arrayConcat([1, 2], [3, 4], [5, 6]) AS res
┌─res───────────┐
│ [1,2,3,4,5,6] │
└───────────────┘
has(arr, elem)
Checks whether the ‘arr’ array has the ‘elem’ element. Returns 0 if the element is not in the array, or 1 if it is.
NULL
is processed as a value.
SELECT has([1, 2, NULL], NULL)
┌─has([1, 2, NULL], NULL)─┐
│ 1 │
└─────────────────────────┘
hasAll
Checks whether one array is a subset of another.
hasAll(set, subset)
Arguments
set
– Array of any type with a set of elements.subset
– Array of any type with elements that should be tested to be a subset ofset
.
Return values
1
, ifset
contains all of the elements fromsubset
.0
, otherwise.
Peculiar properties
An empty array is a subset of any array.
Null
processed as a value.Order of values in both of arrays does not matter.
Examples
SELECT hasAll([], [])
returns 1.
SELECT hasAll([1, Null], [Null])
returns 1.
SELECT hasAll([1.0, 2, 3, 4], [1, 3])
returns 1.
SELECT hasAll(['a', 'b'], ['a'])
returns 1.
SELECT hasAll([1], ['a'])
returns 0.
SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [3, 5]])
returns 0.
hasAny
Checks whether two arrays have intersection by some elements.
hasAny(array1, array2)
Arguments
array1
– Array of any type with a set of elements.array2
– Array of any type with a set of elements.
Return values
1
, ifarray1
andarray2
have one similar element at least.0
, otherwise.
Peculiar properties
Null
processed as a value.Order of values in both of arrays does not matter.
Examples
SELECT hasAny([1], [])
returns 0
.
SELECT hasAny([Null], [Null, 1])
returns 1
.
SELECT hasAny([-128, 1., 512], [1])
returns 1
.
SELECT hasAny([[1, 2], [3, 4]], ['a', 'c'])
returns 0
.
SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [1, 2]])
returns 1
.
hasSubstr
Checks whether all the elements of array2 appear in array1 in the same exact order. Therefore, the function will return 1, if and only if array1 = prefix + array2 + suffix
.
hasSubstr(array1, array2)
In other words, the functions will check whether all the elements of array2
are contained in array1
like the hasAll
function. In addition, it will check that the elements are observed in the same order in both array1
and array2
.
For Example:
hasSubstr([1,2,3,4], [2,3])
returns 1. However,hasSubstr([1,2,3,4], [3,2])
will return0
.hasSubstr([1,2,3,4], [1,2,3])
returns 1. However,hasSubstr([1,2,3,4], [1,2,4])
will return0
.
Arguments
array1
– Array of any type with a set of elements.array2
– Array of any type with a set of elements.
Return values
1
, ifarray1
containsarray2
.0
, otherwise.
Peculiar properties
The function will return
1
ifarray2
is empty.Null
processed as a value. In other wordshasSubstr([1, 2, NULL, 3, 4], [2,3])
will return0
. However,hasSubstr([1, 2, NULL, 3, 4], [2,NULL,3])
will return1
Order of values in both of arrays does matter.
Examples
SELECT hasSubstr([], [])
returns 1.
SELECT hasSubstr([1, Null], [Null])
returns 1.
SELECT hasSubstr([1.0, 2, 3, 4], [1, 3])
returns 0.
SELECT hasSubstr(['a', 'b'], ['a'])
returns 1.
SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'b'])
returns 1.
SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'c'])
returns 0.
SELECT hasSubstr([[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4]])
returns 1.
indexOf(arr, x)
Returns the index of the first ‘x’ element (starting from 1) if it is in the array, or 0 if it is not.
Example:
SELECT indexOf([1, 3, NULL, NULL], NULL)
┌─indexOf([1, 3, NULL, NULL], NULL)─┐
│ 3 │
└───────────────────────────────────┘
Elements set to NULL
are handled as normal values.
arrayCount([func,] arr1, …)
Returns the number of elements for which func(arr1[i], …, arrN[i])
returns something other than 0. If func
is not specified, it returns the number of non-zero elements in the array.
Note that the arrayCount
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
arrayCount(lambda(tuple(x, y), equals(x, y)), [1, 2, 3], [1, 5, 3]) = 2
countEqual(arr, x)
Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem -> elem = x, arr).
NULL
elements are handled as separate values.
Example
SELECT countEqual([1, 2, NULL, NULL], NULL)
┌─countEqual([1, 2, NULL, NULL], NULL)─┐
│ 2 │
└──────────────────────────────────────┘
arrayEnumerate(arr)
Returns the array [1, 2, 3, …, length (arr) ]
This function is normally used with ARRAY JOIN. It allows counting something just once for each array after applying ARRAY JOIN. Example:
SELECT
count() AS Reaches,
countIf(num = 1) AS Hits
FROM test.hits
ARRAY JOIN
GoalsReached,
arrayEnumerate(GoalsReached) AS num
WHERE CounterID = 160656
LIMIT 10
┌─Reaches─┬──Hits─┐
│ 95606 │ 31406 │
└─────────┴───────┘
In this example, Reaches is the number of conversions (the strings received after applying ARRAY JOIN), and Hits is the number of pageviews (strings before ARRAY JOIN). In this particular case, you can get the same result in an easier way:
SELECT
sum(length(GoalsReached)) AS Reaches,
count() AS Hits
FROM test.hits
WHERE (CounterID = 160656) AND notEmpty(GoalsReached)
┌─Reaches─┬──Hits─┐
│ 95606 │ 31406 │
└─────────┴───────┘
This function can also be used in higher-order functions. For example, you can use it to get array indexes for elements that match a condition.
arrayEnumerateUniq(arr, …)
Returns an array the same size as the source array, indicating for each element what its position is among elements with the same value. For example: arrayEnumerateUniq([10, 20, 10, 30]) = [1, 1, 2, 1].
This function is useful when using ARRAY JOIN and aggregation of array elements. Example:
SELECT
Goals.ID AS GoalID,
sum(Sign) AS Reaches,
sumIf(Sign, num = 1) AS Visits
FROM test.visits
ARRAY JOIN
Goals,
arrayEnumerateUniq(Goals.ID) AS num
WHERE CounterID = 160656
GROUP BY GoalID
ORDER BY Reaches DESC
LIMIT 10
┌──GoalID─┬─Reaches─┬─Visits─┐
│ 53225 │ 3214 │ 1097 │
│ 2825062 │ 3188 │ 1097 │
│ 56600 │ 2803 │ 488 │
│ 1989037 │ 2401 │ 365 │
│ 2830064 │ 2396 │ 910 │
│ 1113562 │ 2372 │ 373 │
│ 3270895 │ 2262 │ 812 │
│ 1084657 │ 2262 │ 345 │
│ 56599 │ 2260 │ 799 │
│ 3271094 │ 2256 │ 812 │
└─────────┴─────────┴────────┘
In this example, each goal ID has a calculation of the number of conversions (each element in the Goals nested data structure is a goal that was reached, which we refer to as a conversion) and the number of sessions. Without ARRAY JOIN, we would have counted the number of sessions as sum(Sign). But in this particular case, the rows were multiplied by the nested Goals structure, so in order to count each session one time after this, we apply a condition to the value of the arrayEnumerateUniq(Goals.ID) function.
The arrayEnumerateUniq function can take multiple arrays of the same size as arguments. In this case, uniqueness is considered for tuples of elements in the same positions in all the arrays.
SELECT arrayEnumerateUniq([1, 1, 1, 2, 2, 2], [1, 1, 2, 1, 1, 2]) AS res
┌─res───────────┐
│ [1,2,1,1,2,1] │
└───────────────┘
This is necessary when using ARRAY JOIN with a nested data structure and further aggregation across multiple elements in this structure.
arrayPopBack
Removes the last item from the array.
arrayPopBack(array)
Arguments
array
– Array.
Example
SELECT arrayPopBack([1, 2, 3]) AS res;
┌─res───┐
│ [1,2] │
└───────┘
arrayPopFront
Removes the first item from the array.
arrayPopFront(array)
Arguments
array
– Array.
Example
SELECT arrayPopFront([1, 2, 3]) AS res;
┌─res───┐
│ [2,3] │
└───────┘
arrayPushBack
Adds one item to the end of the array.
arrayPushBack(array, single_value)
Arguments
array
– Array.single_value
– A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets thesingle_value
type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can beNULL
. The function adds aNULL
element to an array, and the type of array elements converts toNullable
.
Example
SELECT arrayPushBack(['a'], 'b') AS res;
┌─res───────┐
│ ['a','b'] │
└───────────┘
arrayPushFront
Adds one element to the beginning of the array.
arrayPushFront(array, single_value)
Arguments
array
– Array.single_value
– A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets thesingle_value
type for the data type of the array. For more information about the types of data in ClickHouse, see “Data types”. Can beNULL
. The function adds aNULL
element to an array, and the type of array elements converts toNullable
.
Example
SELECT arrayPushFront(['b'], 'a') AS res;
┌─res───────┐
│ ['a','b'] │
└───────────┘
arrayResize
Changes the length of the array.
arrayResize(array, size[, extender])
Arguments:
array
— Array.size
— Required length of the array.If
size
is less than the original size of the array, the array is truncated from the right.
If
size
is larger than the initial size of the array, the array is extended to the right withextender
values or default values for the data type of the array items.extender
— Value for extending an array. Can beNULL
.
Returned value:
An array of length size
.
Examples of calls
SELECT arrayResize([1], 3);
┌─arrayResize([1], 3)─┐
│ [1,0,0] │
└─────────────────────┘
SELECT arrayResize([1], 3, NULL);
┌─arrayResize([1], 3, NULL)─┐
│ [1,NULL,NULL] │
└───────────────────────────┘
arraySlice
Returns a slice of the array.
arraySlice(array, offset[, length])
Arguments
array
– Array of data.offset
– Indent from the edge of the array. A positive value indicates an offset on the left, and a negative value is an indent on the right. Numbering of the array items begins with 1.length
– The length of the required slice. If you specify a negative value, the function returns an open slice[offset, array_length - length]
. If you omit the value, the function returns the slice[offset, the_end_of_array]
.
Example
SELECT arraySlice([1, 2, NULL, 4, 5], 2, 3) AS res;
┌─res────────┐
│ [2,NULL,4] │
└────────────┘
Array elements set to NULL
are handled as normal values.
arraySort([func,] arr, …)
Sorts the elements of the arr
array in ascending order. If the func
function is specified, sorting order is determined by the result of the func
function applied to the elements of the array. If func
accepts multiple arguments, the arraySort
function is passed several arrays that the arguments of func
will correspond to. Detailed examples are shown at the end of arraySort
description.
Example of integer values sorting:
SELECT arraySort([1, 3, 3, 0]);
┌─arraySort([1, 3, 3, 0])─┐
│ [0,1,3,3] │
└─────────────────────────┘
Example of string values sorting:
SELECT arraySort(['hello', 'world', '!']);
┌─arraySort(['hello', 'world', '!'])─┐
│ ['!','hello','world'] │
└────────────────────────────────────┘
Consider the following sorting order for the NULL
, NaN
and Inf
values:
SELECT arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]);
┌─arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf])─┐
│ [-inf,-4,1,2,3,inf,nan,nan,NULL,NULL] │
└───────────────────────────────────────────────────────────┘
-Inf
values are first in the array.NULL
values are last in the array.NaN
values are right beforeNULL
.Inf
values are right beforeNaN
.
Note that arraySort
is a higher-order function. You can pass a lambda function to it as the first argument. In this case, sorting order is determined by the result of the lambda function applied to the elements of the array.
Let’s consider the following example:
SELECT arraySort((x) -> -x, [1, 2, 3]) as res;
┌─res─────┐
│ [3,2,1] │
└─────────┘
For each element of the source array, the lambda function returns the sorting key, that is, [1 –> -1, 2 –> -2, 3 –> -3]. Since the arraySort
function sorts the keys in ascending order, the result is [3, 2, 1]. Thus, the (x) –> -x
lambda function sets the descending order in a sorting.
The lambda function can accept multiple arguments. In this case, you need to pass the arraySort
function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:
SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]) as res;
┌─res────────────────┐
│ ['world', 'hello'] │
└────────────────────┘
Here, the elements that are passed in the second array ([2, 1]) define a sorting key for the corresponding element from the source array ([‘hello’, ‘world’]), that is, [‘hello’ –> 2, ‘world’ –> 1]. Since the lambda function does not use x
, actual values of the source array do not affect the order in the result. So, ‘hello’ will be the second element in the result, and ‘world’ will be the first.
Other examples are shown below.
SELECT arraySort((x, y) -> y, [0, 1, 2], ['c', 'b', 'a']) as res;
┌─res─────┐
│ [2,1,0] │
└─────────┘
SELECT arraySort((x, y) -> -y, [0, 1, 2], [1, 2, 3]) as res;
┌─res─────┐
│ [2,1,0] │
└─────────┘
NOTE
To improve sorting efficiency, the Schwartzian transform is used.
arrayReverseSort([func,] arr, …)
Sorts the elements of the arr
array in descending order. If the func
function is specified, arr
is sorted according to the result of the func
function applied to the elements of the array, and then the sorted array is reversed. If func
accepts multiple arguments, the arrayReverseSort
function is passed several arrays that the arguments of func
will correspond to. Detailed examples are shown at the end of arrayReverseSort
description.
Example of integer values sorting:
SELECT arrayReverseSort([1, 3, 3, 0]);
┌─arrayReverseSort([1, 3, 3, 0])─┐
│ [3,3,1,0] │
└────────────────────────────────┘
Example of string values sorting:
SELECT arrayReverseSort(['hello', 'world', '!']);
┌─arrayReverseSort(['hello', 'world', '!'])─┐
│ ['world','hello','!'] │
└───────────────────────────────────────────┘
Consider the following sorting order for the NULL
, NaN
and Inf
values:
SELECT arrayReverseSort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]) as res;
┌─res───────────────────────────────────┐
│ [inf,3,2,1,-4,-inf,nan,nan,NULL,NULL] │
└───────────────────────────────────────┘
Inf
values are first in the array.NULL
values are last in the array.NaN
values are right beforeNULL
.-Inf
values are right beforeNaN
.
Note that the arrayReverseSort
is a higher-order function. You can pass a lambda function to it as the first argument. Example is shown below.
SELECT arrayReverseSort((x) -> -x, [1, 2, 3]) as res;
┌─res─────┐
│ [1,2,3] │
└─────────┘
The array is sorted in the following way:
At first, the source array ([1, 2, 3]) is sorted according to the result of the lambda function applied to the elements of the array. The result is an array [3, 2, 1].
Array that is obtained on the previous step, is reversed. So, the final result is [1, 2, 3].
The lambda function can accept multiple arguments. In this case, you need to pass the arrayReverseSort
function several arrays of identical length that the arguments of lambda function will correspond to. The resulting array will consist of elements from the first input array; elements from the next input array(s) specify the sorting keys. For example:
SELECT arrayReverseSort((x, y) -> y, ['hello', 'world'], [2, 1]) as res;
┌─res───────────────┐
│ ['hello','world'] │
└───────────────────┘
In this example, the array is sorted in the following way:
At first, the source array ([‘hello’, ‘world’]) is sorted according to the result of the lambda function applied to the elements of the arrays. The elements that are passed in the second array ([2, 1]), define the sorting keys for corresponding elements from the source array. The result is an array [‘world’, ‘hello’].
Array that was sorted on the previous step, is reversed. So, the final result is [‘hello’, ‘world’].
Other examples are shown below.
SELECT arrayReverseSort((x, y) -> y, [4, 3, 5], ['a', 'b', 'c']) AS res;
┌─res─────┐
│ [5,3,4] │
└─────────┘
SELECT arrayReverseSort((x, y) -> -y, [4, 3, 5], [1, 2, 3]) AS res;
┌─res─────┐
│ [4,3,5] │
└─────────┘
arrayUniq(arr, …)
If one argument is passed, it counts the number of different elements in the array. If multiple arguments are passed, it counts the number of different tuples of elements at corresponding positions in multiple arrays.
If you want to get a list of unique items in an array, you can use arrayReduce(‘groupUniqArray’, arr).
Example
SELECT arrayUniq([2, 3]) AS res;
arrayJoin(arr)
A special function. See the section “ArrayJoin function”.
arrayDifference
Calculates the difference between adjacent array elements. Returns an array where the first element will be 0, the second is the difference between a[1] - a[0]
, etc. The type of elements in the resulting array is determined by the type inference rules for subtraction (e.g. UInt8
- UInt8
= Int16
).
Syntax
arrayDifference(array)
Arguments
array
– Array.
Returned values
Returns an array of differences between adjacent elements.
Example
Query:
SELECT arrayDifference([1, 2, 3, 4]);
Result:
┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1] │
└───────────────────────────────┘
Example of the overflow due to result type Int64:
Query:
SELECT arrayDifference([0, 10000000000000000000]);
Result:
┌─arrayDifference([0, 10000000000000000000])─┐
│ [0,-8446744073709551616] │
└────────────────────────────────────────────┘
arrayDistinct
Takes an array, returns an array containing the distinct elements only.
Syntax
arrayDistinct(array)
Arguments
array
– Array.
Returned values
Returns an array containing the distinct elements.
Example
Query:
SELECT arrayDistinct([1, 2, 2, 3, 1]);
Result:
┌─arrayDistinct([1, 2, 2, 3, 1])─┐
│ [1,2,3] │
└────────────────────────────────┘
arrayEnumerateDense(arr)
Returns an array of the same size as the source array, indicating where each element first appears in the source array.
Example
SELECT arrayEnumerateDense([10, 20, 10, 30])
┌─arrayEnumerateDense([10, 20, 10, 30])─┐
│ [1,2,1,3] │
└───────────────────────────────────────┘
arrayIntersect(arr)
Takes multiple arrays, returns an array with elements that are present in all source arrays.
Example
SELECT
arrayIntersect([1, 2], [1, 3], [2, 3]) AS no_intersect,
arrayIntersect([1, 2], [1, 3], [1, 4]) AS intersect
┌─no_intersect─┬─intersect─┐
│ [] │ [1] │
└──────────────┴───────────┘
arrayReduce
Applies an aggregate function to array elements and returns its result. The name of the aggregation function is passed as a string in single quotes 'max'
, 'sum'
. When using parametric aggregate functions, the parameter is indicated after the function name in parentheses 'uniqUpTo(6)'
.
Syntax
arrayReduce(agg_func, arr1, arr2, ..., arrN)
Arguments
agg_func
— The name of an aggregate function which should be a constant string.arr
— Any number of array type columns as the parameters of the aggregation function.
Returned value
Example
Query:
SELECT arrayReduce('max', [1, 2, 3]);
Result:
┌─arrayReduce('max', [1, 2, 3])─┐
│ 3 │
└───────────────────────────────┘
If an aggregate function takes multiple arguments, then this function must be applied to multiple arrays of the same size.
Query:
SELECT arrayReduce('maxIf', [3, 5], [1, 0]);
Result:
┌─arrayReduce('maxIf', [3, 5], [1, 0])─┐
│ 3 │
└──────────────────────────────────────┘
Example with a parametric aggregate function:
Query:
SELECT arrayReduce('uniqUpTo(3)', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
Result:
┌─arrayReduce('uniqUpTo(3)', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])─┐
│ 4 │
└─────────────────────────────────────────────────────────────┘
arrayReduceInRanges
Applies an aggregate function to array elements in given ranges and returns an array containing the result corresponding to each range. The function will return the same result as multiple arrayReduce(agg_func, arraySlice(arr1, index, length), ...)
.
Syntax
arrayReduceInRanges(agg_func, ranges, arr1, arr2, ..., arrN)
Arguments
agg_func
— The name of an aggregate function which should be a constant string.arr
— Any number of Array type columns as the parameters of the aggregation function.
Returned value
Array containing results of the aggregate function over specified ranges.
Type: Array.
Example
Query:
SELECT arrayReduceInRanges(
'sum',
[(1, 5), (2, 3), (3, 4), (4, 4)],
[1000000, 200000, 30000, 4000, 500, 60, 7]
) AS res
Result:
┌─res─────────────────────────┐
│ [1234500,234000,34560,4567] │
└─────────────────────────────┘
arrayReverse(arr)
Returns an array of the same size as the original array containing the elements in reverse order.
Example:
SELECT arrayReverse([1, 2, 3])
┌─arrayReverse([1, 2, 3])─┐
│ [3,2,1] │
└─────────────────────────┘
reverse(arr)
Synonym for “arrayReverse”
arrayFlatten
Converts an array of arrays to a flat array.
Function:
Applies to any depth of nested arrays.
Does not change arrays that are already flat.
The flattened array contains all the elements from all source arrays.
Syntax
flatten(array_of_arrays)
Alias: flatten
.
Arguments
array_of_arrays
— Array of arrays. For example,[[1,2,3], [4,5]]
.
Examples
SELECT flatten([[[1]], [[2], [3]]]);
┌─flatten(array(array([1]), array([2], [3])))─┐
│ [1,2,3] │
└─────────────────────────────────────────────┘
arrayCompact
Removes consecutive duplicate elements from an array. The order of result values is determined by the order in the source array.
Syntax
arrayCompact(arr)
Arguments
arr
— The array to inspect.
Returned value
The array without duplicate.
Type: Array
.
Example
Query:
SELECT arrayCompact([1, 1, nan, nan, 2, 3, 3, 3]);
Result:
┌─arrayCompact([1, 1, nan, nan, 2, 3, 3, 3])─┐
│ [1,nan,nan,2,3] │
└────────────────────────────────────────────┘
arrayZip
Combines multiple arrays into a single array. The resulting array contains the corresponding elements of the source arrays grouped into tuples in the listed order of arguments.
Syntax
arrayZip(arr1, arr2, ..., arrN)
Arguments
arrN
— Array.
The function can take any number of arrays of different types. All the input arrays must be of equal size.
Returned value
Array with elements from the source arrays grouped into tuples. Data types in the tuple are the same as types of the input arrays and in the same order as arrays are passed.
Type: Array.
Example
Query:
SELECT arrayZip(['a', 'b', 'c'], [5, 2, 1]);
Result:
┌─arrayZip(['a', 'b', 'c'], [5, 2, 1])─┐
│ [('a',5),('b',2),('c',1)] │
└──────────────────────────────────────┘
arrayAUC
Calculate AUC (Area Under the Curve, which is a concept in machine learning, see more details: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
Syntax
arrayAUC(arr_scores, arr_labels)
Arguments
arr_scores
— scores prediction model gives.arr_labels
— labels of samples, usually 1 for positive sample and 0 for negtive sample.
Returned value
Returns AUC value with type Float64.
Example
Query:
select arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);
Result:
┌─arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
│ 0.75 │
└───────────────────────────────────────────────┘
arrayMap(func, arr1, …)
Returns an array obtained from the original arrays by application of func(arr1[i], …, arrN[i])
for each element. Arrays arr1
… arrN
must have the same number of elements.
Examples
SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res;
┌─res─────┐
│ [3,4,5] │
└─────────┘
The following example shows how to create a tuple of elements from different arrays:
SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res
┌─res─────────────────┐
│ [(1,4),(2,5),(3,6)] │
└─────────────────────┘
Note that the arrayMap
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayFilter(func, arr1, …)
Returns an array containing only the elements in arr1
for which func(arr1[i], …, arrN[i])
returns something other than 0.
Examples
SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res
┌─res───────────┐
│ ['abc World'] │
└───────────────┘
SELECT
arrayFilter(
(i, x) -> x LIKE '%World%',
arrayEnumerate(arr),
['Hello', 'abc World'] AS arr)
AS res
┌─res─┐
│ [2] │
└─────┘
Note that the arrayFilter
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayFill(func, arr1, …)
Scan through arr1
from the first element to the last element and replace arr1[i]
by arr1[i - 1]
if func(arr1[i], …, arrN[i])
returns 0. The first element of arr1
will not be replaced.
Examples
SELECT arrayFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res
┌─res──────────────────────────────┐
│ [1,1,3,11,12,12,12,5,6,14,14,14] │
└──────────────────────────────────┘
Note that the arrayFill
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayReverseFill(func, arr1, …)
Scan through arr1
from the last element to the first element and replace arr1[i]
by arr1[i + 1]
if func(arr1[i], …, arrN[i])
returns 0. The last element of arr1
will not be replaced.
Examples:
SELECT arrayReverseFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res
┌─res────────────────────────────────┐
│ [1,3,3,11,12,5,5,5,6,14,NULL,NULL] │
└────────────────────────────────────┘
Note that the arrayReverseFill
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arraySplit(func, arr1, …)
Split arr1
into multiple arrays. When func(arr1[i], …, arrN[i])
returns something other than 0, the array will be split on the left hand side of the element. The array will not be split before the first element.
Examples:
SELECT arraySplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res
┌─res─────────────┐
│ [[1,2,3],[4,5]] │
└─────────────────┘
Note that the arraySplit
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayReverseSplit(func, arr1, …)
Split arr1
into multiple arrays. When func(arr1[i], …, arrN[i])
returns something other than 0, the array will be split on the right hand side of the element. The array will not be split after the last element.
Examples:
SELECT arrayReverseSplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res
┌─res───────────────┐
│ [[1],[2,3,4],[5]] │
└───────────────────┘
Note that the arrayReverseSplit
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
arrayExists([func,] arr1, …)
Returns 1 if there is at least one element in arr
for which func(arr1[i], …, arrN[i])
returns something other than 0. Otherwise, it returns 0.
Note that the arrayExists
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
SELECT arrayAll((x,y)->x==y,[1,2,3],[4,5,6]);
arrayAll([func,] arr1, …)
Returns 1 if func(arr1[i], …, arrN[i])
returns something other than 0 for all the elements in arrays. Otherwise, it returns 0.
Note that the arrayAll
is a higher-order function. You can pass a lambda function to it as the first argument.
Example
SELECT arrayAll((x,y)->x==y,[1,2,3],[4,5,6]);
arrayFirst(func, arr1, …)
Returns the first element in the arr1
array for which func(arr1[i], …, arrN[i])
returns something other than 0.
Note that the arrayFirst
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
Example
SELECT arrayFirst(x -> x LIKE '%World%', ['Hello World', 'abc World']) AS res
arrayFirstIndex(func, arr1, …)
Returns the index of the first element in the arr1
array for which func(arr1[i], …, arrN[i])
returns something other than 0.
Note that the arrayFirstIndex
is a higher-order function. You must pass a lambda function to it as the first argument, and it can’t be omitted.
Example
SELECT arrayFirstIndex(x -> x LIKE '%World%', ['Hello World', 'abc World']) AS res
```json
Returns a bitwise 'AND' of two numbers.
```
arrayMin
Returns the minimum of elements in the source array.
If the func
function is specified, returns the mininum of elements converted by this function.
Note that the arrayMin
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
arrayMin([func,] arr)
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The minimum of function values (or the array minimum).
Type: if func
is specified, matches func
return value type, else matches the array elements type.
Examples
Query:
SELECT arrayMin([1, 2, 4]) AS res;
Result:
┌─res─┐
│ 1 │
└─────┘
Query:
SELECT arrayMin(x -> (-x), [1, 2, 4]) AS res;
Result:
┌─res─┐
│ -4 │
└─────┘
arrayMax
Returns the maximum of elements in the source array.
If the func
function is specified, returns the maximum of elements converted by this function.
Note that the arrayMax
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
arrayMax([func,] arr)
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The maximum of function values (or the array maximum).
Type: if func
is specified, matches func
return value type, else matches the array elements type.
Examples
Query:
SELECT arrayMax([1, 2, 4]) AS res;
Result:
┌─res─┐
│ 4 │
└─────┘
Query:
SELECT arrayMax(x -> (-x), [1, 2, 4]) AS res;
Result:
┌─res─┐
│ -1 │
└─────┘
arraySum
Returns the sum of elements in the source array.
If the func
function is specified, returns the sum of elements converted by this function.
Note that the arraySum
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
arraySum([func,] arr)
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The sum of the function values (or the array sum).
Type: for decimal numbers in source array (or for converted values, if func
is specified) — Decimal128, for floating point numbers — Float64, for numeric unsigned — UInt64, and for numeric signed — Int64.
Examples
Query:
SELECT arraySum([2, 3]) AS res;
Result:
┌─res─┐
│ 5 │
└─────┘
Query:
SELECT arraySum(x -> x*x, [2, 3]) AS res;
Result:
┌─res─┐
│ 13 │
└─────┘
arrayAvg
Returns the average of elements in the source array.
If the func
function is specified, returns the average of elements converted by this function.
Note that the arrayAvg
is a higher-order function. You can pass a lambda function to it as the first argument.
Syntax
arrayAvg([func,] arr)
Arguments
func
— Function. Expression.arr
— Array. Array.
Returned value
The average of function values (or the array average).
Type: Float64.
Examples
Query:
SELECT arrayAvg([1, 2, 4]) AS res;
Result:
┌────────────────res─┐
│ 2.3333333333333335 │
└────────────────────┘
Query:
SELECT arrayAvg(x -> (x * x), [2, 4]) AS res;
Result:
┌─res─┐
│ 10 │
└─────┘
arrayCumSum([func,] arr1, …)
Returns an array of partial sums of elements in the source array (a running sum). If the func
function is specified, then the values of the array elements are converted by func(arr1[i], …, arrN[i])
before summing.
Example:
SELECT arrayCumSum([1, 1, 1, 1]) AS res
┌─res──────────┐
│ [1, 2, 3, 4] │
└──────────────┘
Note that the arrayCumSum
is a higher-order function. You can pass a lambda function to it as the first argument.
arrayCumSumNonNegative(arr)
Same as arrayCumSum
, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum
, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example:
SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res
┌─res───────┐
│ [1,2,0,1] │
└───────────┘
Note that the arraySumNonNegative
is a higher-order function. You can pass a lambda function to it as the first argument.
arrayProduct
Multiplies elements of an array.
Syntax
arrayProduct(arr)
Arguments
arr
— Array of numeric values.
Returned value
A product of array's elements.
Type: Float64.
Examples
Query:
SELECT arrayProduct([1,2,3,4,5,6]) as res;
Result:
┌─res───┐
│ 720 │
└───────┘
Query:
SELECT arrayProduct([toDecimal64(1,8), toDecimal64(2,8), toDecimal64(3,8)]) as res, toTypeName(res);
Return value type is always Float64. Result:
┌─res─┬─toTypeName(arrayProduct(array(toDecimal64(1, 8), toDecimal64(2, 8), toDecimal64(3, 8))))─┐
│ 6 │ Float64 │
└─────┴──────────────────────────────────────────────────────────────────────────────────────────┘
BIT
Bit functions work for any pair of types from UInt8
, UInt16
, UInt32
, UInt64
, Int8
, Int16
, Int32
, Int64
, Float32
, or Float64
. Some functions support String
and FixedString
types.
The result type is an integer with bits equal to the maximum bits of its arguments. If at least one of the arguments is signed, the result is a signed number. If an argument is a floating-point number, it is cast to Int64.
bitAnd(a, b)
Returns a bitwise 'AND' of two numbers.
bitOr(a, b)
bitXor(a, b)
bitNot(a)
bitShiftLeft(a, b)
Shifts the binary representation of a value to the left by a specified number of bit positions.
A FixedString
or a String
is treated as a single multibyte value.
Bits of a FixedString
value are lost as they are shifted out. On the contrary, a String
value is extended with additional bytes, so no bits are lost.
Syntax
bitShiftLeft(a, b)
Arguments
a
— A value to shift. Integer types, String or FixedString.b
— The number of shift positions. Unsigned integer types, 64 bit types or less are allowed.
Returned value
Shifted value.
The type of the returned value is the same as the type of the input value.
Example
In the following queries bin and hex functions are used to show bits of shifted values.
SELECT 99 AS a, bin(a), bitShiftLeft(a, 2) AS a_shifted, bin(a_shifted);
SELECT 'abc' AS a, hex(a), bitShiftLeft(a, 4) AS a_shifted, hex(a_shifted);
SELECT toFixedString('abc', 3) AS a, hex(a), bitShiftLeft(a, 4) AS a_shifted, hex(a_shifted);
Result:
┌──a─┬─bin(99)──┬─a_shifted─┬─bin(bitShiftLeft(99, 2))─┐
│ 99 │ 01100011 │ 140 │ 10001100 │
└────┴──────────┴───────────┴──────────────────────────┘
┌─a───┬─hex('abc')─┬─a_shifted─┬─hex(bitShiftLeft('abc', 4))─┐
│ abc │ 616263 │ &0 │ 06162630 │
└─────┴────────────┴───────────┴─────────────────────────────┘
┌─a───┬─hex(toFixedString('abc', 3))─┬─a_shifted─┬─hex(bitShiftLeft(toFixedString('abc', 3), 4))─┐
│ abc │ 616263 │ &0 │ 162630 │
└─────┴──────────────────────────────┴───────────┴───────────────────────────────────────────────┘
bitShiftRight(a, b)
Shifts the binary representation of a value to the right by a specified number of bit positions.
A FixedString
or a String
is treated as a single multibyte value. Note that the length of a String
value is reduced as bits are shifted out.
Syntax
bitShiftRight(a, b)
Arguments
a
— A value to shift. Integer types, String or FixedString.b
— The number of shift positions. Unsigned integer types, 64 bit types or less are allowed.
Returned value
Shifted value.
The type of the returned value is the same as the type of the input value.
Example
Query:
SELECT 101 AS a, bin(a), bitShiftRight(a, 2) AS a_shifted, bin(a_shifted);
SELECT 'abc' AS a, hex(a), bitShiftRight(a, 12) AS a_shifted, hex(a_shifted);
SELECT toFixedString('abc', 3) AS a, hex(a), bitShiftRight(a, 12) AS a_shifted, hex(a_shifted);
Result:
┌───a─┬─bin(101)─┬─a_shifted─┬─bin(bitShiftRight(101, 2))─┐
│ 101 │ 01100101 │ 25 │ 00011001 │
└─────┴──────────┴───────────┴────────────────────────────┘
┌─a───┬─hex('abc')─┬─a_shifted─┬─hex(bitShiftRight('abc', 12))─┐
│ abc │ 616263 │ │ 0616 │
└─────┴────────────┴───────────┴───────────────────────────────┘
┌─a───┬─hex(toFixedString('abc', 3))─┬─a_shifted─┬─hex(bitShiftRight(toFixedString('abc', 3), 12))─┐
│ abc │ 616263 │ │ 000616 │
└─────┴──────────────────────────────┴───────────┴─────────────────────────────────────────────────┘
bitRotateLeft(a, b)
bitRotateRight(a, b)
bitTest
Takes any integer and converts it into binary form, returns the value of a bit at specified position. The countdown starts from 0 from the right to the left.
Syntax
SELECT bitTest(number, index)
Arguments
number
– Integer number.index
– Position of bit.
Returned values
Returns a value of bit at specified position.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
SELECT bitTest(43, 1);
Result:
┌─bitTest(43, 1)─┐
│ 1 │
└────────────────┘
Another example:
Query:
SELECT bitTest(43, 2);
Result:
┌─bitTest(43, 2)─┐
│ 0 │
└────────────────┘
bitTestAll
Returns result of logical conjuction (AND operator) of all bits at given positions. The countdown starts from 0 from the right to the left.
The conjuction for bitwise operations:
0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1
Syntax
SELECT bitTestAll(number, index1, index2, index3, index4, ...)
Arguments
number
– Integer number.index1
,index2
,index3
,index4
– Positions of bit. For example, for set of positions (index1
,index2
,index3
,index4
) is true if and only if all of its positions are true (index1
⋀index2
, ⋀index3
⋀index4
).
Returned values
Returns result of logical conjuction.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
SELECT bitTestAll(43, 0, 1, 3, 5);
Result:
┌─bitTestAll(43, 0, 1, 3, 5)─┐
│ 1 │
└────────────────────────────┘
Another example:
Query:
SELECT bitTestAll(43, 0, 1, 3, 5, 2);
Result:
┌─bitTestAll(43, 0, 1, 3, 5, 2)─┐
│ 0 │
└───────────────────────────────┘
bitTestAny
Returns result of logical disjunction (OR operator) of all bits at given positions. The countdown starts from 0 from the right to the left.
The disjunction for bitwise operations:
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
Syntax
SELECT bitTestAny(number, index1, index2, index3, index4, ...)
Arguments
number
– Integer number.index1
,index2
,index3
,index4
– Positions of bit.
Returned values
Returns result of logical disjuction.
Type: UInt8
.
Example
For example, the number 43 in base-2 (binary) numeral system is 101011.
Query:
SELECT bitTestAny(43, 0, 2);
Result:
┌─bitTestAny(43, 0, 2)─┐
│ 1 │
└──────────────────────┘
Another example:
Query:
SELECT bitTestAny(43, 4, 2);
Result:
┌─bitTestAny(43, 4, 2)─┐
│ 0 │
└──────────────────────┘
bitCount
Calculates the number of bits set to one in the binary representation of a number.
Syntax
bitCount(x)
Arguments
x
— Integer or floating-point number. The function uses the value representation in memory. It allows supporting floating-point numbers.
Returned value
Number of bits set to one in the input number.
The function does not convert input value to a larger type (sign extension). So, for example, bitCount(toUInt8(-1)) = 8
.
Type: UInt8
.
Example
Take for example the number 333. Its binary representation: 0000000101001101.
Query:
SELECT bitCount(333);
Result:
┌─bitCount(333)─┐
│ 5 │
└───────────────┘
bitHammingDistance
Returns the Hamming Distance between the bit representations of two integer values. Can be used with SimHash functions for detection of semi-duplicate strings. The smaller is the distance, the more likely those strings are the same.
Syntax
bitHammingDistance(int1, int2)
Arguments
Returned value
The Hamming distance.
Type: UInt8.
Examples
Query:
SELECT bitHammingDistance(111, 121);
Result:
┌─bitHammingDistance(111, 121)─┐
│ 3 │
└──────────────────────────────┘
With SimHash:
SELECT bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'));
Result:
┌─bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'))─┐
│ 5 │
└──────────────────────────────────────────────────────────────────────────────┘
BITMAP
Bitmap functions work for two bitmaps Object value calculation, it is to return new bitmap or cardinality while using formula calculation, such as and, or, xor, and not, etc.
There are 2 kinds of construction methods for Bitmap Object. One is to be constructed by aggregation function groupBitmap with -State, the other is to be constructed by Array Object. It is also to convert Bitmap Object to Array Object.
RoaringBitmap is wrapped into a data structure while actual storage of Bitmap objects. When the cardinality is less than or equal to 32, it uses Set objet. When the cardinality is greater than 32, it uses RoaringBitmap object. That is why storage of low cardinality set is faster.
For more information on RoaringBitmap, see: CRoaring.
bitmapBuild
Build a bitmap from unsigned integer array.
bitmapBuild(array)
Arguments
array
– Unsigned integer array.
Example
SELECT bitmapBuild([1, 2, 3, 4, 5]) AS res, toTypeName(res);
┌─res─┬─toTypeName(bitmapBuild([1, 2, 3, 4, 5]))─────┐
│ │ AggregateFunction(groupBitmap, UInt8) │
└─────┴──────────────────────────────────────────────┘
bitmapToArray
Convert bitmap to integer array.
bitmapToArray(bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapToArray(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─────────┐
│ [1,2,3,4,5] │
└─────────────┘
bitmapSubsetInRange
Return subset in specified range (not include the range_end).
bitmapSubsetInRange(bitmap, range_start, range_end)
Arguments
bitmap
– Bitmap object.range_start
– Range start point. Type: UInt32.range_end
– Range end point (excluded). Type: UInt32.
Example
SELECT bitmapToArray(bitmapSubsetInRange(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res;
┌─res───────────────┐
│ [30,31,32,33,100] │
└───────────────────┘
bitmapSubsetLimit
Creates a subset of bitmap with n elements taken between range_start
and cardinality_limit
.
Syntax
bitmapSubsetLimit(bitmap, range_start, cardinality_limit)
Arguments
bitmap
– Bitmap object.range_start
– The subset starting point. Type: UInt32.cardinality_limit
– The subset cardinality upper limit. Type: UInt32.
Returned value
The subset.
Type: Bitmap object.
Example
Query:
SELECT bitmapToArray(bitmapSubsetLimit(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res;
Result:
┌─res───────────────────────┐
│ [30,31,32,33,100,200,500] │
└───────────────────────────┘
subBitmap
Returns the bitmap elements, starting from the offset
position. The number of returned elements is limited by the cardinality_limit
parameter. Analog of the substring) string function, but for bitmap.
Syntax
subBitmap(bitmap, offset, cardinality_limit)
Arguments
bitmap
– The bitmap. Type: Bitmap object.offset
– The position of the first element of the subset. Type: UInt32.cardinality_limit
– The maximum number of elements in the subset. Type: UInt32.
Returned value
The subset.
Type: Bitmap object.
Example
Query:
SELECT bitmapToArray(subBitmap(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(10), toUInt32(10))) AS res;
Result:
┌─res─────────────────────────────┐
│ [10,11,12,13,14,15,16,17,18,19] │
└─────────────────────────────────┘
bitmapContains
Checks whether the bitmap contains an element.
bitmapContains(haystack, needle)
Arguments
haystack
– Bitmap object, where the function searches.needle
– Value that the function searches. Type: UInt32.
Returned values
0 — If
haystack
does not containneedle
.1 — If
haystack
containsneedle
.
Type: UInt8
.
Example
SELECT bitmapContains(bitmapBuild([1,5,7,9]), toUInt32(9)) AS res;
┌─res─┐
│ 1 │
└─────┘
bitmapHasAny
Checks whether two bitmaps have intersection by some elements.
bitmapHasAny(bitmap1, bitmap2)
If you are sure that bitmap2
contains strictly one element, consider using the bitmapContains function. It works more efficiently.
Arguments
bitmap*
– Bitmap object.
Return values
1
, ifbitmap1
andbitmap2
have one similar element at least.0
, otherwise.
Example
SELECT bitmapHasAny(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 1 │
└─────┘
bitmapHasAll
Analogous to hasAll(array, array)
returns 1 if the first bitmap contains all the elements of the second one, 0 otherwise. If the second argument is an empty bitmap then returns 1.
bitmapHasAll(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapHasAll(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 0 │
└─────┘
bitmapCardinality
Retrun bitmap cardinality of type UInt64.
bitmapCardinality(bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapCardinality(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─┐
│ 5 │
└─────┘
bitmapMin
Retrun the smallest value of type UInt64 in the set, UINT32_MAX if the set is empty.
bitmapMin(bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapMin(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─┐
│ 1 │
└─────┘
bitmapMax
Retrun the greatest value of type UInt64 in the set, 0 if the set is empty.
bitmapMax(bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapMax(bitmapBuild([1, 2, 3, 4, 5])) AS res;
┌─res─┐
│ 5 │
└─────┘
bitmapTransform
Transform an array of values in a bitmap to another array of values, the result is a new bitmap.
bitmapTransform(bitmap, from_array, to_array)
Arguments
bitmap
– Bitmap object.from_array
– UInt32 array. For idx in range [0, from_array.size()), if bitmap contains from_array[idx], then replace it with to_array[idx]. Note that the result depends on array ordering if there are common elements between from_array and to_array.to_array
– UInt32 array, its size shall be the same to from_array.
Example
SELECT bitmapToArray(bitmapTransform(bitmapBuild([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), cast([5,999,2] as Array(UInt32)), cast([2,888,20] as Array(UInt32)))) AS res;
┌─res───────────────────┐
│ [1,3,4,6,7,8,9,10,20] │
└───────────────────────┘
bitmapAnd
Two bitmap and calculation, the result is a new bitmap.
bitmapAnd(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapToArray(bitmapAnd(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res─┐
│ [3] │
└─────┘
bitmapOr
Two bitmap or calculation, the result is a new bitmap.
bitmapOr(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapToArray(bitmapOr(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res─────────┐
│ [1,2,3,4,5] │
└─────────────┘
bitmapXor
Two bitmap xor calculation, the result is a new bitmap.
bitmapXor(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapToArray(bitmapXor(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res───────┐
│ [1,2,4,5] │
└───────────┘
bitmapAndnot
Two bitmap andnot calculation, the result is a new bitmap.
bitmapAndnot(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapToArray(bitmapAndnot(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res;
┌─res───┐
│ [1,2] │
└───────┘
bitmapAndCardinality
Two bitmap and calculation, return cardinality of type UInt64.
bitmapAndCardinality(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapAndCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 1 │
└─────┘
bitmapOrCardinality
Two bitmap or calculation, return cardinality of type UInt64.
bitmapOrCardinality(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapOrCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 5 │
└─────┘
bitmapXorCardinality
Two bitmap xor calculation, return cardinality of type UInt64.
bitmapXorCardinality(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapXorCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 4 │
└─────┘
bitmapAndnotCardinality
Two bitmap andnot calculation, return cardinality of type UInt64.
bitmapAndnotCardinality(bitmap,bitmap)
Arguments
bitmap
– Bitmap object.
Example
SELECT bitmapAndnotCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
┌─res─┐
│ 2 │
└─────┘
CONDITIONAL
if
Controls conditional branching. Unlike most systems, ClickHouse always evaluates both expressions then
and else
.
Syntax
if(cond, then, else)
If the condition cond
evaluates to a non-zero value, returns the result of the expression then
, and the result of the expression else
, if present, is skipped. If the cond
is zero or NULL
, then the result of the then
expression is skipped and the result of the else
expression, if present, is returned.
You can use the short_circuit_function_evaluation setting to calculate the if
function according to a short scheme. If this setting is enabled, then
expression is evaluated only on rows where cond
is true, else
expression – where cond
is false. For example, an exception about division by zero is not thrown when executing the query SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10)
, because intDiv(42, number)
will be evaluated only for numbers that doesn't satisfy condition number = 0
.
Arguments
cond
– The condition for evaluation that can be zero or not. The type is UInt8, Nullable(UInt8) or NULL.then
– The expression to return if condition is met.else
– The expression to return if condition is not met.
Returned values
The function executes then
and else
expressions and returns its result, depending on whether the condition cond
ended up being zero or not.
Example
Query:
SELECT if(1, plus(2, 2), plus(2, 6));
Result:
┌─plus(2, 2)─┐
│ 4 │
└────────────┘
Query:
SELECT if(0, plus(2, 2), plus(2, 6));
Result:
┌─plus(2, 6)─┐
│ 8 │
└────────────┘
then
andelse
must have the lowest common type.
Example:
Take this LEFT_RIGHT
table:
SELECT *
FROM LEFT_RIGHT
┌─left─┬─right─┐
│ ᴺᵁᴸᴸ │ 4 │
│ 1 │ 3 │
│ 2 │ 2 │
│ 3 │ 1 │
│ 4 │ ᴺᵁᴸᴸ │
└──────┴───────┘
The following query compares left
and right
values:
SELECT
left,
right,
if(left < right, 'left is smaller than right', 'right is greater or equal than left') AS is_smaller
FROM LEFT_RIGHT
WHERE isNotNull(left) AND isNotNull(right)
┌─left─┬─right─┬─is_smaller──────────────────────────┐
│ 1 │ 3 │ left is smaller than right │
│ 2 │ 2 │ right is greater or equal than left │
│ 3 │ 1 │ right is greater or equal than left │
└──────┴───────┴─────────────────────────────────────┘
Note: NULL
values are not used in this example, check NULL values in conditionals section.
Ternary Operator
It works same as if
function.
Syntax: cond ? then : else
Returns then
if the cond
evaluates to be true (greater than zero), otherwise returns else
.
cond
must be of type ofUInt8
, andthen
andelse
must have the lowest common type.then
andelse
can beNULL
See also
multiIf
Allows you to write the CASE operator more compactly in the query.
Syntax
multiIf(cond_1, then_1, cond_2, then_2, ..., else)
You can use the short_circuit_function_evaluation setting to calculate the multiIf
function according to a short scheme. If this setting is enabled, then_i
expression is evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}) AND cond_i)
is true, cond_i
will be evaluated only on rows where ((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}))
is true. For example, an exception about division by zero is not thrown when executing the query SELECT multiIf(number = 2, intDiv(1, number), number = 5) FROM numbers(10)
.
Arguments
cond_N
— The condition for the function to returnthen_N
.then_N
— The result of the function when executed.else
— The result of the function if none of the conditions is met.
The function accepts 2N+1
parameters.
Returned values
The function returns one of the values then_N
or else
, depending on the conditions cond_N
.
Example
Again using LEFT_RIGHT
table.
SELECT
left,
right,
multiIf(left < right, 'left is smaller', left > right, 'left is greater', left = right, 'Both equal', 'Null value') AS result
FROM LEFT_RIGHT
┌─left─┬─right─┬─result──────────┐
│ ᴺᵁᴸᴸ │ 4 │ Null value │
│ 1 │ 3 │ left is smaller │
│ 2 │ 2 │ Both equal │
│ 3 │ 1 │ left is greater │
│ 4 │ ᴺᵁᴸᴸ │ Null value │
└──────┴───────┴─────────────────┘
Case
DATES AND TIMES
Support for time zones.
All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument. Example: Asia/Yekaterinburg. In this case, they use the specified time zone instead of the local (default) one.
SELECT
toDateTime('2016-06-15 23:00:00') AS time,
toDate(time) AS date_local,
toDate(time, 'Asia/Yekaterinburg') AS date_yekat,
toString(time, 'US/Samoa') AS time_samoa
┌────────────────time─┬─date_local─┬─date_yekat─┬─time_samoa──────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-16 │ 2016-06-15 09:00:00 │
└─────────────────────┴────────────┴────────────┴─────────────────────┘
timeZone
Returns the timezone of the server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
Syntax
timeZone()
Alias: timezone
.
Returned value
Timezone.
Type: String.
toTimeZone
Converts time or date and time to the specified time zone. The time zone is an attribute of the Date
and DateTime
data types. The internal value (number of seconds) of the table field or of the resultset's column does not change, the column's type changes and its string representation changes accordingly.
Syntax
toTimezone(value, timezone)
Alias: toTimezone
.
Arguments
value
— Time or date and time. DateTime64.timezone
— Timezone for the returned value. String. This argument is a constant, becausetoTimezone
changes the timezone of a column (timezone is an attribute ofDateTime*
types).
Returned value
Date and time.
Type: DateTime.
Example
Query:
SELECT toDateTime('2019-01-01 00:00:00', 'UTC') AS time_utc,
toTypeName(time_utc) AS type_utc,
toInt32(time_utc) AS int32utc,
toTimeZone(time_utc, 'Asia/Yekaterinburg') AS time_yekat,
toTypeName(time_yekat) AS type_yekat,
toInt32(time_yekat) AS int32yekat,
toTimeZone(time_utc, 'US/Samoa') AS time_samoa,
toTypeName(time_samoa) AS type_samoa,
toInt32(time_samoa) AS int32samoa
FORMAT Vertical;
Result:
Row 1:
──────
time_utc: 2019-01-01 00:00:00
type_utc: DateTime('UTC')
int32utc: 1546300800
time_yekat: 2019-01-01 05:00:00
type_yekat: DateTime('Asia/Yekaterinburg')
int32yekat: 1546300800
time_samoa: 2018-12-31 13:00:00
type_samoa: DateTime('US/Samoa')
int32samoa: 1546300800
toTimeZone(time_utc, 'Asia/Yekaterinburg')
changes the DateTime('UTC')
type to DateTime('Asia/Yekaterinburg')
. The value (Unixtimestamp) 1546300800 stays the same, but the string representation (the result of the toString() function) changes from time_utc: 2019-01-01 00:00:00
to time_yekat: 2019-01-01 05:00:00
.
timeZoneOf
Returns the timezone name of DateTime or DateTime64 data types.
Syntax
timeZoneOf(value)
Alias: timezoneOf
.
Arguments
value
— Date and time. DateTime or DateTime64.
Returned value
Timezone name.
Type: String.
Example
Query:
SELECT timezoneOf(now());
Result:
┌─timezoneOf(now())─┐
│ Etc/UTC │
└───────────────────┘
timeZoneOffset
Returns a timezone offset in seconds from UTC. The function takes into account daylight saving time and historical timezone changes at the specified date and time. IANA timezone database is used to calculate the offset.
Syntax
timeZoneOffset(value)
Alias: timezoneOffset
.
Arguments
value
— Date and time. DateTime or DateTime64.
Returned value
Offset from UTC in seconds.
Type: Int32.
Example
Query:
SELECT toDateTime('2021-04-21 10:20:30', 'America/New_York') AS Time, toTypeName(Time) AS Type,
timeZoneOffset(Time) AS Offset_in_seconds, (Offset_in_seconds / 3600) AS Offset_in_hours;
Result:
┌────────────────Time─┬─Type─────────────────────────┬─Offset_in_seconds─┬─Offset_in_hours─┐
│ 2021-04-21 10:20:30 │ DateTime('America/New_York') │ -14400 │ -4 │
└─────────────────────┴──────────────────────────────┴───────────────────┴─────────────────┘
toYear
Converts a date or date with time to a UInt16 number containing the year number (AD).
Alias: YEAR
.
toQuarter
Converts a date or date with time to a UInt8 number containing the quarter number.
Alias: QUARTER
.
toMonth
Converts a date or date with time to a UInt8 number containing the month number (1-12).
Alias: MONTH
.
toDayOfYear
Converts a date or date with time to a UInt16 number containing the number of the day of the year (1-366).
Alias: DAYOFYEAR
.
toDayOfMonth
Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
Aliases: DAYOFMONTH
, DAY
.
toDayOfWeek
Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
Alias: DAYOFWEEK
.
toHour
Converts a date with time to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true – even in Moscow the clocks were twice changed at a different time).
Alias: HOUR
.
toMinute
Converts a date with time to a UInt8 number containing the number of the minute of the hour (0-59).
Alias: MINUTE
.
toSecond
Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for.
Alias: SECOND
.
toUnixTimestamp
For DateTime argument: converts value to the number with type UInt32 -- Unix Timestamp (https://en.wikipedia.org/wiki/Unix_time). For String argument: converts the input string to the datetime according to the timezone (optional second argument, server timezone is used by default) and returns the corresponding unix timestamp.
Syntax
toUnixTimestamp(datetime)
toUnixTimestamp(str, [timezone])
Returned value
Returns the unix timestamp.
Type: UInt32
.
Example
Query:
SELECT toUnixTimestamp('2017-11-05 08:07:47', 'Asia/Tokyo') AS unix_timestamp
Result:
┌─unix_timestamp─┐
│ 1509836867 │
└────────────────┘
NOTE
The return type of toStartOf*
, toLastDayOfMonth
, toMonday
, timeSlot
functions described below is determined by the configuration parameter enable_extended_results_for_datetime_functions which is 0
by default.
Behavior for
enable_extended_results_for_datetime_functions = 0
: FunctionstoStartOfYear
,toStartOfISOYear
,toStartOfQuarter
,toStartOfMonth
,toStartOfWeek
,toLastDayOfMonth
,toMonday
returnDate
orDateTime
. FunctionstoStartOfDay
,toStartOfHour
,toStartOfFifteenMinutes
,toStartOfTenMinutes
,toStartOfFiveMinutes
,toStartOfMinute
,timeSlot
returnDateTime
. Though these functions can take values of the extended typesDate32
andDateTime64
as an argument, passing them a time outside the normal range (year 1970 to 2149 forDate
/ 2106 forDateTime
) will produce wrong results.enable_extended_results_for_datetime_functions = 1
:Functions
toStartOfYear
,toStartOfISOYear
,toStartOfQuarter
,toStartOfMonth
,toStartOfWeek
,toLastDayOfMonth
,toMonday
returnDate
orDateTime
if their argument is aDate
orDateTime
, and they returnDate32
orDateTime64
if their argument is aDate32
orDateTime64
.Functions
toStartOfDay
,toStartOfHour
,toStartOfFifteenMinutes
,toStartOfTenMinutes
,toStartOfFiveMinutes
,toStartOfMinute
,timeSlot
returnDateTime
if their argument is aDate
orDateTime
, and they returnDateTime64
if their argument is aDate32
orDateTime64
.
toStartOfYear
Rounds down a date or date with time to the first day of the year. Returns the date.
toStartOfISOYear
Rounds down a date or date with time to the first day of ISO year. Returns the date.
toStartOfQuarter
Rounds down a date or date with time to the first day of the quarter. The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. Returns the date.
toStartOfMonth
Rounds down a date or date with time to the first day of the month. Returns the date.
NOTE
The behavior of parsing incorrect dates is implementation specific. ClickHouse may return zero date, throw an exception or do “natural” overflow.
If toLastDayOfMonth
is called with an argument of type Date
greater then 2149-05-31, the result will be calculated from the argument 2149-05-31 instead.
toMonday
Rounds down a date or date with time to the nearest Monday. Returns the date.
toStartOfWeek(t[,mode])
Rounds down a date or date with time to the nearest Sunday or Monday by mode. Returns the date. The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.
toStartOfDay
Rounds down a date with time to the start of the day.
toStartOfHour
Rounds down a date with time to the start of the hour.
toStartOfMinute
Rounds down a date with time to the start of the minute.
toStartOfSecond
Truncates sub-seconds.
Syntax
toStartOfSecond(value, [timezone])
Arguments
value
— Date and time. DateTime64.
Returned value
Input value without sub-seconds.
Type: DateTime64.
Examples
Query without timezone:
WITH toDateTime64('2020-01-01 10:20:30.999', 3) AS dt64
SELECT toStartOfSecond(dt64);
Result:
┌───toStartOfSecond(dt64)─┐
│ 2020-01-01 10:20:30.000 │
└─────────────────────────┘
Query with timezone:
WITH toDateTime64('2020-01-01 10:20:30.999', 3) AS dt64
SELECT toStartOfSecond(dt64, 'Asia/Istanbul');
Result:
┌─toStartOfSecond(dt64, 'Asia/Istanbul')─┐
│ 2020-01-01 13:20:30.000 │
└────────────────────────────────────────┘
See also
Timezone server configuration parameter.
toStartOfFiveMinutes
Rounds down a date with time to the start of the five-minute interval.
toStartOfTenMinutes
Rounds down a date with time to the start of the ten-minute interval.
toStartOfFifteenMinutes
Rounds down the date with time to the start of the fifteen-minute interval.
toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone])
This is a generalization of other functions named toStartOf*
. For example, toStartOfInterval(t, INTERVAL 1 year)
returns the same as toStartOfYear(t)
, toStartOfInterval(t, INTERVAL 1 month)
returns the same as toStartOfMonth(t)
, toStartOfInterval(t, INTERVAL 1 day)
returns the same as toStartOfDay(t)
, toStartOfInterval(t, INTERVAL 15 minute)
returns the same as toStartOfFifteenMinutes(t)
etc.
toTime
Converts a date with time to a certain fixed date, while preserving the time.
toRelativeYearNum
Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.
toRelativeQuarterNum
Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past.
toRelativeMonthNum
Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.
toRelativeWeekNum
Converts a date with time or date to the number of the week, starting from a certain fixed point in the past.
toRelativeDayNum
Converts a date with time or date to the number of the day, starting from a certain fixed point in the past.
toRelativeHourNum
Converts a date with time or date to the number of the hour, starting from a certain fixed point in the past.
toRelativeMinuteNum
Converts a date with time or date to the number of the minute, starting from a certain fixed point in the past.
toRelativeSecondNum
Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.
toISOYear
Converts a date or date with time to a UInt16 number containing the ISO Year number.
toISOWeek
Converts a date or date with time to a UInt8 number containing the ISO Week number.
toWeek(date[,mode])
This function returns the week number for date or datetime. The two-argument form of toWeek() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the default mode is 0. toISOWeek()
is a compatibility function that is equivalent to toWeek(date,3)
. The following table describes how the mode argument works.
0
Sunday
0-53
with a Sunday in this year
1
Monday
0-53
with 4 or more days this year
2
Sunday
1-53
with a Sunday in this year
3
Monday
1-53
with 4 or more days this year
4
Sunday
0-53
with 4 or more days this year
5
Monday
0-53
with a Monday in this year
6
Sunday
1-53
with 4 or more days this year
7
Monday
1-53
with a Monday in this year
8
Sunday
1-53
contains January 1
9
Monday
1-53
contains January 1
For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:
If the week containing January 1 has 4 or more days in the new year, it is week 1.
Otherwise, it is the last week of the previous year, and the next week is week 1.
For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It does not matter how many days in the new year the week contained, even if it contained only one day.
toWeek(date, [, mode][, Timezone])
Arguments
date
– Date or DateTime.mode
– Optional parameter, Range of values is [0,9], default is 0.Timezone
– Optional parameter, it behaves like any other conversion function.
Example
SELECT toDate('2016-12-27') AS date, toWeek(date) AS week0, toWeek(date,1) AS week1, toWeek(date,9) AS week9;
┌───────date─┬─week0─┬─week1─┬─week9─┐
│ 2016-12-27 │ 52 │ 52 │ 1 │
└────────────┴───────┴───────┴───────┘
toYearWeek(date[,mode])
Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.
The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used.
toISOYear()
is a compatibility function that is equivalent to intDiv(toYearWeek(date,3),100)
.
Example
SELECT toDate('2016-12-27') AS date, toYearWeek(date) AS yearWeek0, toYearWeek(date,1) AS yearWeek1, toYearWeek(date,9) AS yearWeek9;
┌───────date─┬─yearWeek0─┬─yearWeek1─┬─yearWeek9─┐
│ 2016-12-27 │ 201652 │ 201652 │ 201701 │
└────────────┴───────────┴───────────┴───────────┘
date_trunc
Truncates date and time data to the specified part of date.
Syntax
date_trunc(unit, value[, timezone])
Alias: dateTrunc
.
Arguments
unit
— The type of interval to truncate the result. String Literal. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Date and time. DateTime or DateTime64.timezone
— Timezone name for the returned value (optional). If not specified, the function uses the timezone of thevalue
parameter. String.
Returned value
Value, truncated to the specified part of date.
Type: DateTime.
Example
Query without timezone:
SELECT now(), date_trunc('hour', now());
Result:
┌───────────────now()─┬─date_trunc('hour', now())─┐
│ 2020-09-28 10:40:45 │ 2020-09-28 10:00:00 │
└─────────────────────┴───────────────────────────┘
Query with the specified timezone:
SELECT now(), date_trunc('hour', now(), 'Asia/Istanbul');
Result:
┌───────────────now()─┬─date_trunc('hour', now(), 'Asia/Istanbul')─┐
│ 2020-09-28 10:46:26 │ 2020-09-28 13:00:00 │
└─────────────────────┴────────────────────────────────────────────┘
See Also
date_add
Adds the time interval or date interval to the provided date or date with time.
Syntax
date_add(unit, value, date)
Aliases: dateAdd
, DATE_ADD
.
Arguments
unit
— The type of interval to add. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to add. Int.
Returned value
Date or date with time obtained by adding value
, expressed in unit
, to date
.
Example
Query:
SELECT date_add(YEAR, 3, toDate('2018-01-01'));
Result:
┌─plus(toDate('2018-01-01'), toIntervalYear(3))─┐
│ 2021-01-01 │
└───────────────────────────────────────────────┘
date_diff
Returns the difference between two dates or dates with time values. The difference is calculated using relative units, e.g. the difference between 2022-01-01
and 2021-12-29
is 3 days for day unit (see toRelativeDayNum), 1 month for month unit (see toRelativeMonthNum), 1 year for year unit (see toRelativeYearNum).
Syntax
date_diff('unit', startdate, enddate, [timezone])
Aliases: dateDiff
, DATE_DIFF
.
Arguments
unit
— The type of interval for result. String. Possible values:second
minute
hour
day
week
month
quarter
year
startdate
— The first time value to subtract (the subtrahend). Date, Date32, DateTime or DateTime64.enddate
— The second time value to subtract from (the minuend). Date, Date32, DateTime or DateTime64.timezone
— Timezone name (optional). If specified, it is applied to bothstartdate
andenddate
. If not specified, timezones ofstartdate
andenddate
are used. If they are not the same, the result is unspecified. String.
Returned value
Difference between enddate
and startdate
expressed in unit
.
Type: Int.
Example
Query:
SELECT dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'));
Result:
┌─dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'))─┐
│ 25 │
└────────────────────────────────────────────────────────────────────────────────────────┘
Query:
SELECT
toDate('2022-01-01') AS e,
toDate('2021-12-29') AS s,
dateDiff('day', s, e) AS day_diff,
dateDiff('month', s, e) AS month__diff,
dateDiff('year', s, e) AS year_diff;
Result:
┌──────────e─┬──────────s─┬─day_diff─┬─month__diff─┬─year_diff─┐
│ 2022-01-01 │ 2021-12-29 │ 3 │ 1 │ 1 │
└────────────┴────────────┴──────────┴─────────────┴───────────┘
date_sub
Subtracts the time interval or date interval from the provided date or date with time.
Syntax
date_sub(unit, value, date)
Aliases: dateSub
, DATE_SUB
.
Arguments
unit
— The type of interval to subtract. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to subtract. Int.
Returned value
Date or date with time obtained by subtracting value
, expressed in unit
, from date
.
Example
Query:
SELECT date_sub(YEAR, 3, toDate('2018-01-01'));
Result:
┌─minus(toDate('2018-01-01'), toIntervalYear(3))─┐
│ 2015-01-01 │
└────────────────────────────────────────────────┘
timestamp_add
Adds the specified time value with the provided date or date time value.
Syntax
timestamp_add(date, INTERVAL value unit)
Aliases: timeStampAdd
, TIMESTAMP_ADD
.
Arguments
value
— Value of interval to add. Int.unit
— The type of interval to add. String. Possible values:second
minute
hour
day
week
month
quarter
year
Returned value
Date or date with time with the specified value
expressed in unit
added to date
.
Example
Query:
select timestamp_add(toDate('2018-01-01'), INTERVAL 3 MONTH);
Result:
┌─plus(toDate('2018-01-01'), toIntervalMonth(3))─┐
│ 2018-04-01 │
└────────────────────────────────────────────────┘
timestamp_sub
Subtracts the time interval from the provided date or date with time.
Syntax
timestamp_sub(unit, value, date)
Aliases: timeStampSub
, TIMESTAMP_SUB
.
Arguments
unit
— The type of interval to subtract. String. Possible values:second
minute
hour
day
week
month
quarter
year
value
— Value of interval to subtract. Int.
Returned value
Date or date with time obtained by subtracting value
, expressed in unit
, from date
.
Example
Query:
select timestamp_sub(MONTH, 5, toDateTime('2018-12-18 01:02:03'));
Result:
┌─minus(toDateTime('2018-12-18 01:02:03'), toIntervalMonth(5))─┐
│ 2018-07-18 01:02:03 │
└──────────────────────────────────────────────────────────────┘
now
Returns the current date and time at the moment of query analysis. The function is a constant expression.
Syntax
now([timezone])
Arguments
timezone
— Timezone name for the returned value (optional). String.
Returned value
Current date and time.
Type: DateTime.
Example
Query without timezone:
SELECT now();
Result:
┌───────────────now()─┐
│ 2020-10-17 07:42:09 │
└─────────────────────┘
Query with the specified timezone:
SELECT now('Asia/Istanbul');
Result:
┌─now('Asia/Istanbul')─┐
│ 2020-10-17 10:42:23 │
└──────────────────────┘
Returns the current date and time with sub-second precision at the moment of query analysis. The function is a constant expression.
Syntax
now64([scale], [timezone])
Arguments
scale
- Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ]. Typically are used - 3 (default) (milliseconds), 6 (microseconds), 9 (nanoseconds).timezone
— Timezone name for the returned value (optional). String.
Returned value
Current date and time with sub-second precision.
Type: DateTime64.
Example
SELECT now64(), now64(9, 'Asia/Istanbul');
Result:
┌─────────────────now64()─┬─────now64(9, 'Asia/Istanbul')─┐
│ 2022-08-21 19:34:26.196 │ 2022-08-21 22:34:26.196542766 │
└─────────────────────────┴───────────────────────────────┘
today
Accepts zero arguments and returns the current date at one of the moments of query analysis. The same as ‘toDate(now())’.
yesterday
Accepts zero arguments and returns yesterday’s date at one of the moments of query analysis. The same as ‘today() - 1’.
timeSlot
Rounds the time to the half hour.
toYYYYMM
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM).
toYYYYMMDD
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).
toYYYYMMDDhhmmss
Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters
Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
WITH
toDate('2018-01-01') AS date,
toDateTime('2018-01-01 00:00:00') AS date_time
SELECT
addYears(date, 1) AS add_years_with_date,
addYears(date_time, 1) AS add_years_with_date_time
┌─add_years_with_date─┬─add_years_with_date_time─┐
│ 2019-01-01 │ 2019-01-01 00:00:00 │
└─────────────────────┴──────────────────────────┘
subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters
Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
WITH
toDate('2019-01-01') AS date,
toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
subtractYears(date, 1) AS subtract_years_with_date,
subtractYears(date_time, 1) AS subtract_years_with_date_time
┌─subtract_years_with_date─┬─subtract_years_with_date_time─┐
│ 2018-01-01 │ 2018-01-01 00:00:00 │
└──────────────────────────┴───────────────────────────────┘
formatDateTime
Formats a Time according to the given Format string. Format is a constant expression, so you cannot have multiple formats for a single result column.
formatDateTime uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.
Syntax
formatDateTime(Time, Format[, Timezone])
Returned value(s)
Returns time and date values according to the determined format.
Replacement fields Using replacement fields, you can define a pattern for the resulting string. “Example” column shows formatting result for 2018-01-02 22:33:44
.
%C
year divided by 100 and truncated to integer (00-99)
20
%d
day of the month, zero-padded (01-31)
02
%D
Short MM/DD/YY date, equivalent to %m/%d/%y
01/02/18
%e
day of the month, space-padded ( 1-31)
2
%f
fractional second from the fractional part of DateTime64
1234560
%F
short YYYY-MM-DD date, equivalent to %Y-%m-%d
2018-01-02
%G
four-digit year format for ISO week number, calculated from the week-based year defined by the ISO 8601 standard, normally useful only with %V
2018
%g
two-digit year format, aligned to ISO 8601, abbreviated from four-digit notation
18
%H
hour in 24h format (00-23)
22
%I
hour in 12h format (01-12)
10
%j
day of the year (001-366)
002
%m
month as a decimal number (01-12)
01
%M
minute (00-59)
33
%n
new-line character (‘’)
%p
AM or PM designation
PM
%Q
Quarter (1-4)
1
%R
24-hour HH:MM time, equivalent to %H:%M
22:33
%S
second (00-59)
44
%t
horizontal-tab character (’)
%T
ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S
22:33:44
%u
ISO 8601 weekday as number with Monday as 1 (1-7)
2
%V
ISO 8601 week number (01-53)
01
%w
weekday as a decimal number with Sunday as 0 (0-6)
2
%y
Year, last two digits (00-99)
18
%Y
Year
2018
%z
Time offset from UTC as +HHMM or -HHMM
-0500
%%
a % sign
%
Example
Query:
SELECT formatDateTime(toDate('2010-01-04'), '%g')
Result:
┌─formatDateTime(toDate('2010-01-04'), '%g')─┐
│ 10 │
└────────────────────────────────────────────┘
Query:
SELECT formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f')
Result:
┌─formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f')─┐
│ 1234560 │
└─────────────────────────────────────────────────────────────────────┘
See Also
formatDateTimeInJodaSyntax
Similar to formatDateTime, except that it formats datetime in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.
Replacement fields
Using replacement fields, you can define a pattern for the resulting string.
G
era
text
AD
C
century of era (>=0)
number
20
Y
year of era (>=0)
year
1996
x
weekyear(not supported yet)
year
1996
w
week of weekyear(not supported yet)
number
27
e
day of week
number
2
E
day of week
text
Tuesday; Tue
y
year
year
1996
D
day of year
number
189
M
month of year
month
July; Jul; 07
d
day of month
number
10
a
halfday of day
text
PM
K
hour of halfday (0~11)
number
0
h
clockhour of halfday (1~12)
number
12
H
hour of day (0~23)
number
0
k
clockhour of day (1~24)
number
24
m
minute of hour
number
30
s
second of minute
number
55
S
fraction of second(not supported yet)
number
978
z
time zone(short name not supported yet)
text
Pacific Standard Time; PST
Z
time zone offset/id(not supported yet)
zone
-0800; -08:00; America/Los_Angeles
'
escape for text
delimiter
''
single quote
literal
'
Example
Query:
SELECT formatDateTimeInJodaSyntax(toDateTime('2010-01-04 12:34:56'), 'yyyy-MM-dd HH:mm:ss')
Result:
┌─formatDateTimeInJodaSyntax(toDateTime('2010-01-04 12:34:56'), 'yyyy-MM-dd HH:mm:ss')─┐
│ 2010-01-04 12:34:56 │
└─────────────────────────────────────────────────────────────────────────────────────────┘
dateName
Returns specified part of date.
Syntax
dateName(date_part, date)
Arguments
date_part
— Date part. Possible values: 'year', 'quarter', 'month', 'week', 'dayofyear', 'day', 'weekday', 'hour', 'minute', 'second'. String.date
— Date. Date, Date32, DateTime or DateTime64.timezone
— Timezone. Optional. String.
Returned value
The specified part of date.
Type: String
Example
Query:
WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT
dateName('year', date_value),
dateName('month', date_value),
dateName('day', date_value);
Result:
┌─dateName('year', date_value)─┬─dateName('month', date_value)─┬─dateName('day', date_value)─┐
│ 2021 │ April │ 14 │
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────┘
FROM_UNIXTIME
Function converts Unix timestamp to a calendar date and a time of a day. When there is only a single argument of Integer type, it acts in the same way as toDateTime and return DateTime type.
FROM_UNIXTIME uses MySQL datetime format style, refer to https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format.
Alias: fromUnixTimestamp
.
Example:
Query:
SELECT FROM_UNIXTIME(423543535);
Result:
┌─FROM_UNIXTIME(423543535)─┐
│ 1983-06-04 10:58:55 │
└──────────────────────────┘
When there are two or three arguments, the first an Integer, Date, Date32, DateTime or DateTime64, the second a constant format string and the third an optional constant time zone string — it acts in the same way as formatDateTime and return String type.
For example:
SELECT FROM_UNIXTIME(1234334543, '%Y-%m-%d %R:%S') AS DateTime;
┌─DateTime────────────┐
│ 2009-02-11 14:42:23 │
└─────────────────────┘
See Also
fromUnixTimestampInJodaSyntax
Similar to FROM_UNIXTIME, except that it formats time in Joda style instead of MySQL style. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.
Example: Query:
SELECT fromUnixTimestampInJodaSyntax(1669804872, 'yyyy-MM-dd HH:mm:ss', 'UTC');
Result:
┌─fromUnixTimestampInJodaSyntax(1669804872, 'yyyy-MM-dd HH:mm:ss', 'UTC')─┐
│ 2022-11-30 10:41:12 │
└────────────────────────────────────────────────────────────────────────────┘
toModifiedJulianDay
Converts a Proleptic Gregorian calendar date in text form YYYY-MM-DD
to a Modified Julian Day number in Int32. This function supports date from 0000-01-01
to 9999-12-31
. It raises an exception if the argument cannot be parsed as a date, or the date is invalid.
Syntax
toModifiedJulianDay(date)
Arguments
date
— Date in text form. String or FixedString.
Returned value
Modified Julian Day number.
Type: Int32.
Example
Query:
SELECT toModifiedJulianDay('2020-01-01');
Result:
┌─toModifiedJulianDay('2020-01-01')─┐
│ 58849 │
└───────────────────────────────────┘
toModifiedJulianDayOrNull
Similar to toModifiedJulianDay(), but instead of raising exceptions it returns NULL
.
Syntax
toModifiedJulianDayOrNull(date)
Arguments
date
— Date in text form. String or FixedString.
Returned value
Modified Julian Day number.
Type: Nullable(Int32).
Example
Query:
SELECT toModifiedJulianDayOrNull('2020-01-01');
Result:
┌─toModifiedJulianDayOrNull('2020-01-01')─┐
│ 58849 │
└─────────────────────────────────────────┘
fromModifiedJulianDay
Converts a Modified Julian Day number to a Proleptic Gregorian calendar date in text form YYYY-MM-DD
. This function supports day number from -678941
to 2973119
(which represent 0000-01-01 and 9999-12-31 respectively). It raises an exception if the day number is outside of the supported range.
Syntax
fromModifiedJulianDay(day)
Arguments
day
— Modified Julian Day number. Any integral types.
Returned value
Date in text form.
Type: String
Example
Query:
SELECT fromModifiedJulianDay(58849);
Result:
┌─fromModifiedJulianDay(58849)─┐
│ 2020-01-01 │
└──────────────────────────────┘
fromModifiedJulianDayOrNull
Similar to fromModifiedJulianDayOrNull(), but instead of raising exceptions it returns NULL
.
Syntax
fromModifiedJulianDayOrNull(day)
Arguments
day
— Modified Julian Day number. Any integral types.
Returned value
Date in text form.
Type: Nullable(String)
Example
Query:
SELECT fromModifiedJulianDayOrNull(58849);
Result:
┌─fromModifiedJulianDayOrNull(58849)─┐
│ 2020-01-01 │
└────────────────────────────────────┘
DICTIONARIES
For information on connecting and configuring dictionaries, see Dictionaries.
dictGet, dictGetOrDefault, dictGetOrNull
Retrieves values from a dictionary.
dictGet('dict_name', attr_names, id_expr)
dictGetOrDefault('dict_name', attr_names, id_expr, default_value_expr)
dictGetOrNull('dict_name', attr_name, id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.attr_names
— Name of the column of the dictionary, String literal, or tuple of column names, Tuple(String literal).id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.default_value_expr
— Values returned if the dictionary does not contain a row with theid_expr
key. Expression or Tuple(Expression), returning the value (or values) in the data types configured for theattr_names
attribute.
Returned value
If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to
id_expr
.If there is no the key, corresponding to
id_expr
, in the dictionary, then:- `dictGet` returns the content of the `<null_value>` element specified for the attribute in the dictionary configuration. - `dictGetOrDefault` returns the value passed as the `default_value_expr` parameter. - `dictGetOrNull` returns `NULL` in case key was not found in dictionary.
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
Example for simple key dictionary
Create a text file ext-dict-test.csv
containing the following:
1,1
2,2
The first column is id
, the second column is c1
.
Configure the dictionary:
<clickhouse>
<dictionary>
<name>ext-dict-test</name>
<source>
<file>
<path>/path-to/ext-dict-test.csv</path>
<format>CSV</format>
</file>
</source>
<layout>
<flat />
</layout>
<structure>
<id>
<name>id</name>
</id>
<attribute>
<name>c1</name>
<type>UInt32</type>
<null_value></null_value>
</attribute>
</structure>
<lifetime>0</lifetime>
</dictionary>
</clickhouse>
Perform the query:
SELECT
dictGetOrDefault('ext-dict-test', 'c1', number + 1, toUInt32(number * 10)) AS val,
toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─┬─type───┐
│ 1 │ UInt32 │
│ 2 │ UInt32 │
│ 20 │ UInt32 │
└─────┴────────┘
Example for complex key dictionary
Create a text file ext-dict-mult.csv
containing the following:
1,1,'1'
2,2,'2'
3,3,'3'
The first column is id
, the second is c1
, the third is c2
.
Configure the dictionary:
<clickhouse>
<dictionary>
<name>ext-dict-mult</name>
<source>
<file>
<path>/path-to/ext-dict-mult.csv</path>
<format>CSV</format>
</file>
</source>
<layout>
<flat />
</layout>
<structure>
<id>
<name>id</name>
</id>
<attribute>
<name>c1</name>
<type>UInt32</type>
<null_value></null_value>
</attribute>
<attribute>
<name>c2</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>
<lifetime>0</lifetime>
</dictionary>
</clickhouse>
Perform the query:
SELECT
dictGet('ext-dict-mult', ('c1','c2'), number + 1) AS val,
toTypeName(val) AS type
FROM system.numbers
LIMIT 3;
┌─val─────┬─type──────────────────┐
│ (1,'1') │ Tuple(UInt8, String) │
│ (2,'2') │ Tuple(UInt8, String) │
│ (3,'3') │ Tuple(UInt8, String) │
└─────────┴───────────────────────┘
Example for range key dictionary
Input table:
CREATE TABLE range_key_dictionary_source_table
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
ENGINE = TinyLog();
INSERT INTO range_key_dictionary_source_table VALUES(1, toDate('2019-05-20'), toDate('2019-05-20'), 'First', 'First');
INSERT INTO range_key_dictionary_source_table VALUES(2, toDate('2019-05-20'), toDate('2019-05-20'), 'Second', NULL);
INSERT INTO range_key_dictionary_source_table VALUES(3, toDate('2019-05-20'), toDate('2019-05-20'), 'Third', 'Third');
Create the dictionary:
CREATE DICTIONARY range_key_dictionary
(
key UInt64,
start_date Date,
end_date Date,
value String,
value_nullable Nullable(String)
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() TABLE 'range_key_dictionary_source_table'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(RANGE_HASHED())
RANGE(MIN start_date MAX end_date);
Perform the query:
SELECT
(number, toDate('2019-05-20')),
dictHas('range_key_dictionary', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', 'value', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', 'value_nullable', number, toDate('2019-05-20')),
dictGetOrNull('range_key_dictionary', ('value', 'value_nullable'), number, toDate('2019-05-20'))
FROM system.numbers LIMIT 5 FORMAT TabSeparated;
Result:
(0,'2019-05-20') 0 \N \N (NULL,NULL)
(1,'2019-05-20') 1 First First ('First','First')
(2,'2019-05-20') 1 Second \N ('Second',NULL)
(3,'2019-05-20') 1 Third Third ('Third','Third')
(4,'2019-05-20') 0 \N \N (NULL,NULL)
See Also
dictHas
Checks whether a key is present in a dictionary.
dictHas('dict_name', id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.id_expr
— Key value. Expression returning dictionary key-type value or Tuple-type value depending on the dictionary configuration.
Returned value
0, if there is no key.
1, if there is a key.
Type: UInt8
.
dictGetHierarchy
Creates an array, containing all the parents of a key in the hierarchical dictionary.
Syntax
dictGetHierarchy('dict_name', key)
Arguments
dict_name
— Name of the dictionary. String literal.key
— Key value. Expression returning a UInt64-type value.
Returned value
Parents for the key.
Type: Array(UInt64).
dictIsIn
Checks the ancestor of a key through the whole hierarchical chain in the dictionary.
dictIsIn('dict_name', child_id_expr, ancestor_id_expr)
Arguments
dict_name
— Name of the dictionary. String literal.child_id_expr
— Key to be checked. Expression returning a UInt64-type value.ancestor_id_expr
— Alleged ancestor of thechild_id_expr
key. Expression returning a UInt64-type value.
Returned value
0, if
child_id_expr
is not a child ofancestor_id_expr
.1, if
child_id_expr
is a child ofancestor_id_expr
or ifchild_id_expr
is anancestor_id_expr
.
Type: UInt8
.
Other Functions
ClickHouse supports specialized functions that convert dictionary attribute values to a specific data type regardless of the dictionary configuration.
Functions:
dictGetInt8
,dictGetInt16
,dictGetInt32
,dictGetInt64
dictGetUInt8
,dictGetUInt16
,dictGetUInt32
,dictGetUInt64
dictGetFloat32
,dictGetFloat64
dictGetDate
dictGetDateTime
dictGetUUID
dictGetString
All these functions have the OrDefault
modification. For example, dictGetDateOrDefault
.
Syntax:
dictGet[Type]('dict_name', 'attr_name', id_expr)
dictGet[Type]OrDefault('dict_name', 'attr_name', id_expr, default_value_expr)
Arguments
dict_name
— Name of the dictionary. String literal.attr_name
— Name of the column of the dictionary. String literal.id_expr
— Key value. Expression returning a UInt64 or Tuple-type value depending on the dictionary configuration.default_value_expr
— Value returned if the dictionary does not contain a row with theid_expr
key. Expression returning the value in the data type configured for theattr_name
attribute.
Returned value
If ClickHouse parses the attribute successfully in the attribute’s data type, functions return the value of the dictionary attribute that corresponds to
id_expr
.If there is no requested
id_expr
in the dictionary then:- `dictGet[Type]` returns the content of the `<null_value>` element specified for the attribute in the dictionary configuration. - `dictGet[Type]OrDefault` returns the value passed as the `default_value_expr` parameter.
ClickHouse throws an exception if it cannot parse the value of the attribute or the value does not match the attribute data type.
ENCODING
char
Returns the string with the length as the number of passed arguments and each byte has the value of corresponding argument. Accepts multiple arguments of numeric types. If the value of argument is out of range of UInt8 data type, it is converted to UInt8 with possible rounding and overflow.
Syntax
char(number_1, [number_2, ..., number_n]);
Arguments
Returned value
a string of given bytes.
Type: String
.
Example
Query:
SELECT char(104.1, 101, 108.9, 108.9, 111) AS hello;
Result:
┌─hello─┐
│ hello │
└───────┘
You can construct a string of arbitrary encoding by passing the corresponding bytes. Here is example for UTF-8:
Query:
SELECT char(0xD0, 0xBF, 0xD1, 0x80, 0xD0, 0xB8, 0xD0, 0xB2, 0xD0, 0xB5, 0xD1, 0x82) AS hello;
Result:
┌─hello──┐
│ привет │
└────────┘
Query:
SELECT char(0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD) AS hello;
Result:
┌─hello─┐
│ 你好 │
└───────┘
hex
Returns a string containing the argument’s hexadecimal representation.
Alias: HEX
.
Syntax
hex(arg)
The function is using uppercase letters A-F
and not using any prefixes (like 0x
) or suffixes (like h
).
For integer arguments, it prints hex digits (“nibbles”) from the most significant to least significant (big-endian or “human-readable” order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero.
Values of type Date and DateTime are formatted as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime).
For String and FixedString, all bytes are simply encoded as two hexadecimal numbers. Zero bytes are not omitted.
Values of Float and Decimal types are encoded as their representation in memory. As we support little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted.
Values of UUID type are encoded as big-endian order string.
Arguments
Returned value
A string with the hexadecimal representation of the argument.
Type: String.
Examples
Query:
SELECT hex(1);
Result:
01
Query:
SELECT hex(toFloat32(number)) AS hex_presentation FROM numbers(15, 2);
Result:
┌─hex_presentation─┐
│ 00007041 │
│ 00008041 │
└──────────────────┘
Query:
SELECT hex(toFloat64(number)) AS hex_presentation FROM numbers(15, 2);
Result:
┌─hex_presentation─┐
│ 0000000000002E40 │
│ 0000000000003040 │
└──────────────────┘
Query:
SELECT lower(hex(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0'))) as uuid_hex
Result:
┌─uuid_hex─────────────────────────┐
│ 61f0c4045cb311e7907ba6006ad3dba0 │
└──────────────────────────────────┘
unhex
Performs the opposite operation of hex. It interprets each pair of hexadecimal digits (in the argument) as a number and converts it to the byte represented by the number. The return value is a binary string (BLOB).
If you want to convert the result to a number, you can use the reverse and reinterpretAs<Type> functions.
NOTE
If unhex
is invoked from within the clickhouse-client
, binary strings display using UTF-8.
Alias: UNHEX
.
Syntax
unhex(arg)
Arguments
arg
— A string containing any number of hexadecimal digits. Type: String, FixedString.
Supports both uppercase and lowercase letters A-F
. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the least significant half of the 00-0F
byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn’t thrown). For a numeric argument the inverse of hex(N) is not performed by unhex().
Returned value
A binary string (BLOB).
Type: String.
Example
Query:
SELECT unhex('303132'), UNHEX('4D7953514C');
Result:
┌─unhex('303132')─┬─unhex('4D7953514C')─┐
│ 012 │ MySQL │
└─────────────────┴─────────────────────┘
Query:
SELECT reinterpretAsUInt64(reverse(unhex('FFF'))) AS num;
Result:
┌──num─┐
│ 4095 │
└──────┘
bitmaskToList(num)
Accepts an integer. Returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.
bitmaskToArray(num)
Accepts an integer. Returns an array of UInt64 numbers containing the list of powers of two that total the source number when summed. Numbers in the array are in ascending order.
ENCRYPTION
These functions implement encryption and decryption of data with AES (Advanced Encryption Standard) algorithm.
Key length depends on encryption mode. It is 16, 24, and 32 bytes long for -128-
, -196-
, and -256-
modes respectively.
Initialization vector length is always 16 bytes (bytes in excess of 16 are ignored).
Note that these functions work slowly until ClickHouse 21.1.
encrypt
This function encrypts data using these modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
aes-128-gcm, aes-192-gcm, aes-256-gcm
aes-128-ctr, aes-192-ctr, aes-256-ctr
Syntax
encrypt('mode', 'plaintext', 'key' [, iv, aad])
Arguments
mode
— Encryption mode. String.plaintext
— Text thats need to be encrypted. String.key
— Encryption key. String.iv
— Initialization vector. Required for-gcm
modes, optinal for others. String.aad
— Additional authenticated data. It isn't encrypted, but it affects decryption. Works only in-gcm
modes, for others would throw an exception. String.
Returned value
Ciphertext binary string. String.
Examples
Create this table:
Query:
CREATE TABLE encryption_test
(
`comment` String,
`secret` String
)
ENGINE = Memory;
Insert some data (please avoid storing the keys/ivs in the database as this undermines the whole concept of encryption), also storing 'hints' is unsafe too and used only for illustrative purposes:
Query:
INSERT INTO encryption_test VALUES('aes-256-ofb no IV', encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212')),\
('aes-256-ofb no IV, different key', encrypt('aes-256-ofb', 'Secret', 'keykeykeykeykeykeykeykeykeykeyke')),\
('aes-256-ofb with IV', encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv')),\
('aes-256-cbc no IV', encrypt('aes-256-cbc', 'Secret', '12345678910121314151617181920212'));
Query:
SELECT comment, hex(secret) FROM encryption_test;
Result:
┌─comment──────────────────────────┬─hex(secret)──────────────────────┐
│ aes-256-ofb no IV │ B4972BDC4459 │
│ aes-256-ofb no IV, different key │ 2FF57C092DC9 │
│ aes-256-ofb with IV │ 5E6CB398F653 │
│ aes-256-cbc no IV │ 1BC0629A92450D9E73A00E7D02CF4142 │
└──────────────────────────────────┴──────────────────────────────────┘
Example with -gcm
:
Query:
INSERT INTO encryption_test VALUES('aes-256-gcm', encrypt('aes-256-gcm', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv')), \
('aes-256-gcm with AAD', encrypt('aes-256-gcm', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv', 'aad'));
SELECT comment, hex(secret) FROM encryption_test WHERE comment LIKE '%gcm%';
Result:
┌─comment──────────────┬─hex(secret)──────────────────────────────────┐
│ aes-256-gcm │ A8A3CCBC6426CFEEB60E4EAE03D3E94204C1B09E0254 │
│ aes-256-gcm with AAD │ A8A3CCBC6426D9A1017A0A932322F1852260A4AD6837 │
└──────────────────────┴──────────────────────────────────────────────┘
aes_encrypt_mysql
Compatible with mysql encryption and resulting ciphertext can be decrypted with AES_DECRYPT function.
Will produce the same ciphertext as encrypt
on equal inputs. But when key
or iv
are longer than they should normally be, aes_encrypt_mysql
will stick to what MySQL's aes_encrypt
does: 'fold' key
and ignore excess bits of iv
.
Supported encryption modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
Syntax
aes_encrypt_mysql('mode', 'plaintext', 'key' [, iv])
Arguments
mode
— Encryption mode. String.plaintext
— Text that needs to be encrypted. String.key
— Encryption key. If key is longer than required by mode, MySQL-specific key folding is performed. String.iv
— Initialization vector. Optional, only first 16 bytes are taken into account String.
Returned value
Ciphertext binary string. String.
Examples
Given equal input encrypt
and aes_encrypt_mysql
produce the same ciphertext:
Query:
SELECT encrypt('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv') = aes_encrypt_mysql('aes-256-ofb', 'Secret', '12345678910121314151617181920212', 'iviviviviviviviv') AS ciphertexts_equal;
Result:
┌─ciphertexts_equal─┐
│ 1 │
└───────────────────┘
But encrypt
fails when key
or iv
is longer than expected:
Query:
SELECT encrypt('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123');
Result:
Received exception from server (version 22.6.1):
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123').
While aes_encrypt_mysql
produces MySQL-compatitalbe output:
Query:
SELECT hex(aes_encrypt_mysql('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123')) AS ciphertext;
Result:
┌─ciphertext───┐
│ 24E9E4966469 │
└──────────────┘
Notice how supplying even longer IV
produces the same result
Query:
SELECT hex(aes_encrypt_mysql('aes-256-ofb', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456')) AS ciphertext
Result:
┌─ciphertext───┐
│ 24E9E4966469 │
└──────────────┘
Which is binary equal to what MySQL produces on same inputs:
mysql> SET block_encryption_mode='aes-256-ofb';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT aes_encrypt('Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456') as ciphertext;
+------------------------+
| ciphertext |
+------------------------+
| 0x24E9E4966469 |
+------------------------+
1 row in set (0.00 sec)
decrypt
This function decrypts ciphertext into a plaintext using these modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-ofb, aes-192-ofb, aes-256-ofb
aes-128-gcm, aes-192-gcm, aes-256-gcm
aes-128-ctr, aes-192-ctr, aes-256-ctr
Syntax
decrypt('mode', 'ciphertext', 'key' [, iv, aad])
Arguments
mode
— Decryption mode. String.ciphertext
— Encrypted text that needs to be decrypted. String.key
— Decryption key. String.iv
— Initialization vector. Required for-gcm
modes, optinal for others. String.aad
— Additional authenticated data. Won't decrypt if this value is incorrect. Works only in-gcm
modes, for others would throw an exception. String.
Returned value
Decrypted String. String.
Examples
Re-using table from encrypt.
Query:
SELECT comment, hex(secret) FROM encryption_test;
Result:
┌─comment──────────────┬─hex(secret)──────────────────────────────────┐
│ aes-256-gcm │ A8A3CCBC6426CFEEB60E4EAE03D3E94204C1B09E0254 │
│ aes-256-gcm with AAD │ A8A3CCBC6426D9A1017A0A932322F1852260A4AD6837 │
└──────────────────────┴──────────────────────────────────────────────┘
┌─comment──────────────────────────┬─hex(secret)──────────────────────┐
│ aes-256-ofb no IV │ B4972BDC4459 │
│ aes-256-ofb no IV, different key │ 2FF57C092DC9 │
│ aes-256-ofb with IV │ 5E6CB398F653 │
│ aes-256-cbc no IV │ 1BC0629A92450D9E73A00E7D02CF4142 │
└──────────────────────────────────┴──────────────────────────────────┘
Now let's try to decrypt all that data.
Query:
SELECT comment, decrypt('aes-256-cfb128', secret, '12345678910121314151617181920212') as plaintext FROM encryption_test
Result:
┌─comment──────────────┬─plaintext──┐
│ aes-256-gcm │ OQ�E
�t�7T�\���\� │
│ aes-256-gcm with AAD │ OQ�E
�\��si����;�o�� │
└──────────────────────┴────────────┘
┌─comment──────────────────────────┬─plaintext─┐
│ aes-256-ofb no IV │ Secret │
│ aes-256-ofb no IV, different key │ �4�
� │
│ aes-256-ofb with IV │ ���6�~ │
│aes-256-cbc no IV │ �2*4�h3c�4w��@
└──────────────────────────────────┴───────────┘
Notice how only a portion of the data was properly decrypted, and the rest is gibberish since either mode
, key
, or iv
were different upon encryption.
aes_decrypt_mysql
Compatible with mysql encryption and decrypts data encrypted with AES_ENCRYPT function.
Will produce same plaintext as decrypt
on equal inputs. But when key
or iv
are longer than they should normally be, aes_decrypt_mysql
will stick to what MySQL's aes_decrypt
does: 'fold' key
and ignore excess bits of IV
.
Supported decryption modes:
aes-128-ecb, aes-192-ecb, aes-256-ecb
aes-128-cbc, aes-192-cbc, aes-256-cbc
aes-128-cfb128
aes-128-ofb, aes-192-ofb, aes-256-ofb
Syntax
aes_decrypt_mysql('mode', 'ciphertext', 'key' [, iv])
Arguments
mode
— Decryption mode. String.ciphertext
— Encrypted text that needs to be decrypted. String.key
— Decryption key. String.iv
— Initialization vector. Optinal. String.
Returned value
Decrypted String. String.
Examples
Let's decrypt data we've previously encrypted with MySQL:
mysql> SET block_encryption_mode='aes-256-ofb';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT aes_encrypt('Secret', '123456789101213141516171819202122', 'iviviviviviviviv123456') as ciphertext;
+------------------------+
| ciphertext |
+------------------------+
| 0x24E9E4966469 |
+------------------------+
1 row in set (0.00 sec)
Query:
SELECT aes_decrypt_mysql('aes-256-ofb', unhex('24E9E4966469'), '123456789101213141516171819202122', 'iviviviviviviviv123456') AS plaintext
Result:
┌─plaintext─┐
│ Secret │
└───────────┘
FILE
file
Reads file as a String. The file content is not parsed, so any information is read as one string and placed into the specified column.
Syntax
file(path[, default])
Arguments
path
— The relative path to the file from user_files_path. Path to file support following wildcards:*
,?
,{abc,def}
and{N..M}
whereN
,M
— numbers,'abc', 'def'
— strings.
Example
Inserting data from files a.txt and b.txt into a table as strings:
Query:
INSERT INTO table SELECT file('a.txt'), file('b.txt');
See Also
GEOGRAPHICAL COORDINATES
greatCircleDistance
Calculates the distance between two points on the Earth’s surface using the great-circle formula.
greatCircleDistance(lon1Deg, lat1Deg, lon2Deg, lat2Deg)
Input parameters
lon1Deg
— Longitude of the first point in degrees. Range:[-180°, 180°]
.lat1Deg
— Latitude of the first point in degrees. Range:[-90°, 90°]
.lon2Deg
— Longitude of the second point in degrees. Range:[-180°, 180°]
.lat2Deg
— Latitude of the second point in degrees. Range:[-90°, 90°]
.
Positive values correspond to North latitude and East longitude, and negative values correspond to South latitude and West longitude.
Returned value
The distance between two points on the Earth’s surface, in meters.
Generates an exception when the input parameter values fall outside of the range.
Example
SELECT greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)
┌─greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)─┐
│ 14132374.194975413 │
└───────────────────────────────────────────────────────────────────┘
geoDistance
Similar to greatCircleDistance
but calculates the distance on WGS-84 ellipsoid instead of sphere. This is more precise approximation of the Earth Geoid. The performance is the same as for greatCircleDistance
(no performance drawback). It is recommended to use geoDistance
to calculate the distances on Earth.
Technical note: for close enough points we calculate the distance using planar approximation with the metric on the tangent plane at the midpoint of the coordinates.
greatCircleAngle
Calculates the central angle between two points on the Earth’s surface using the great-circle formula.
greatCircleAngle(lon1Deg, lat1Deg, lon2Deg, lat2Deg)
Input parameters
lon1Deg
— Longitude of the first point in degrees.lat1Deg
— Latitude of the first point in degrees.lon2Deg
— Longitude of the second point in degrees.lat2Deg
— Latitude of the second point in degrees.
Returned value
The central angle between two points in degrees.
Example
SELECT greatCircleAngle(0, 0, 45, 0) AS arc
┌─arc─┐
│ 45 │
└─────┘
pointInEllipses
Checks whether the point belongs to at least one of the ellipses. Coordinates are geometric in the Cartesian coordinate system.
pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
Input parameters
x, y
— Coordinates of a point on the plane.xᵢ, yᵢ
— Coordinates of the center of thei
-th ellipsis.aᵢ, bᵢ
— Axes of thei
-th ellipsis in units of x, y coordinates.
The input parameters must be 2+4⋅n
, where n
is the number of ellipses.
Returned values
1
if the point is inside at least one of the ellipses; 0
if it is not.
Example
SELECT pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)
┌─pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)─┐
│ 1 │
└─────────────────────────────────────────────────┘
pointInPolygon
Checks whether the point belongs to the polygon on the plane.
pointInPolygon((x, y), [(a, b), (c, d) ...], ...)
Input values
(x, y)
— Coordinates of a point on the plane. Data type — Tuple — A tuple of two numbers.[(a, b), (c, d) ...]
— Polygon vertices. Data type — Array. Each vertex is represented by a pair of coordinates(a, b)
. Vertices should be specified in a clockwise or counterclockwise order. The minimum number of vertices is 3. The polygon must be constant.The function also supports polygons with holes (cut out sections). In this case, add polygons that define the cut out sections using additional arguments of the function. The function does not support non-simply-connected polygons.
Returned values
1
if the point is inside the polygon, 0
if it is not. If the point is on the polygon boundary, the function may return either 0 or 1.
Example
SELECT pointInPolygon((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]) AS res
┌─res─┐
│ 1 │
└─────┘
GEOHASH
Geohash is the geocode system, which subdivides Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location.
If you need to manually convert geographic coordinates to geohash strings, you can use geohash.org.
geohashEncode
Encodes latitude and longitude as a geohash-string.
geohashEncode(longitude, latitude, [precision])
Input values
longitude - longitude part of the coordinate you want to encode. Floating in range
[-180°, 180°]
latitude - latitude part of the coordinate you want to encode. Floating in range
[-90°, 90°]
precision - Optional, length of the resulting encoded string, defaults to
12
. Integer in range[1, 12]
. Any value less than1
or greater than12
is silently converted to12
.
Returned values
alphanumeric
String
of encoded coordinate (modified version of the base32-encoding alphabet is used).
Example
SELECT geohashEncode(-5.60302734375, 42.593994140625, 0) AS res;
┌─res──────────┐
│ ezs42d000000 │
└──────────────┘
geohashDecode
Decodes any geohash-encoded string into longitude and latitude.
Input values
encoded string - geohash-encoded string.
Returned values
(longitude, latitude) - 2-tuple of
Float64
values of longitude and latitude.
Example
SELECT geohashDecode('ezs42') AS res;
┌─res─────────────────────────────┐
│ (-5.60302734375,42.60498046875) │
└─────────────────────────────────┘
geohashesInBox
Returns an array of geohash-encoded strings of given precision that fall inside and intersect boundaries of given box, basically a 2D grid flattened into array.
Syntax
geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision)
Arguments
longitude_min
— Minimum longitude. Range:[-180°, 180°]
. Type: Float.latitude_min
— Minimum latitude. Range:[-90°, 90°]
. Type: Float.longitude_max
— Maximum longitude. Range:[-180°, 180°]
. Type: Float.latitude_max
— Maximum latitude. Range:[-90°, 90°]
. Type: Float.precision
— Geohash precision. Range:[1, 12]
. Type: UInt8.
NOTE
All coordinate parameters must be of the same type: either Float32
or Float64
.
Returned values
Array of precision-long strings of geohash-boxes covering provided area, you should not rely on order of items.
[]
- Empty array if minimum latitude and longitude values aren’t less than corresponding maximum values.
NOTE
Function throws an exception if resulting array is over 10’000’000 items long.
Example
Query:
SELECT geohashesInBox(24.48, 40.56, 24.785, 40.81, 4) AS thasos;
Result:
┌─thasos──────────────────────────────────────┐
│ ['sx1q','sx1r','sx32','sx1w','sx1x','sx38'] │
└─────────────────────────────────────────────┘
H3 INDEXES
H3 is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be split into seven even but smaller ones ("children"), and so on.
The level of the hierarchy is called resolution
and can receive a value from 0
till 15
, where 0
is the base
level with the largest and coarsest cells.
A latitude and longitude pair can be transformed to a 64-bit H3 index, identifying a grid cell.
The H3 index is used primarily for bucketing locations and other geospatial manipulations.
The full description of the H3 system is available at the Uber Engeneering site.
h3IsValid
Verifies whether the number is a valid H3 index.
Syntax
h3IsValid(h3index)
Parameter
h3index
— Hexagon index number. Type: UInt64.
Returned values
1 — The number is a valid H3 index.
0 — The number is not a valid H3 index.
Type: UInt8.
Example
Query:
SELECT h3IsValid(630814730351855103) AS h3IsValid;
Result:
┌─h3IsValid─┐
│ 1 │
└───────────┘
h3GetResolution
Defines the resolution of the given H3 index.
Syntax
h3GetResolution(h3index)
Parameter
h3index
— Hexagon index number. Type: UInt64.
Returned values
Index resolution. Range:
[0, 15]
.If the index is not valid, the function returns a random value. Use h3IsValid to verify the index.
Type: UInt8.
Example
Query:
SELECT h3GetResolution(639821929606596015) AS resolution;
Result:
┌─resolution─┐
│ 14 │
└────────────┘
h3EdgeAngle
Calculates the average length of the H3 hexagon edge in grades.
Syntax
h3EdgeAngle(resolution)
Parameter
resolution
— Index resolution. Type: UInt8. Range:[0, 15]
.
Returned values
Example
Query:
SELECT h3EdgeAngle(10) AS edgeAngle;
Result:
┌───────h3EdgeAngle(10)─┐
│ 0.0005927224846720883 │
└───────────────────────┘
h3EdgeLengthM
Calculates the average length of the H3 hexagon edge in meters.
Syntax
h3EdgeLengthM(resolution)
Parameter
resolution
— Index resolution. Type: UInt8. Range:[0, 15]
.
Returned values
Example
Query:
SELECT h3EdgeLengthM(15) AS edgeLengthM;
Result:
┌─edgeLengthM─┐
│ 0.509713273 │
└─────────────┘
geoToH3
Returns H3 point index (lon, lat)
with specified resolution.
Syntax
geoToH3(lon, lat, resolution)
Arguments
lon
— Longitude. Type: Float64.lat
— Latitude. Type: Float64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned values
Hexagon index number.
0 in case of error.
Type: UInt64.
Example
Query:
SELECT geoToH3(37.79506683, 55.71290588, 15) AS h3Index;
Result:
┌────────────h3Index─┐
│ 644325524701193974 │
└────────────────────┘
h3kRing
Lists all the H3 hexagons in the raduis of k
from the given hexagon in random order.
Syntax
h3kRing(h3index, k)
Arguments
Returned values
Array of H3 indexes.
Example
Query:
SELECT arrayJoin(h3kRing(644325529233966508, 1)) AS h3index;
Result:
┌────────────h3index─┐
│ 644325529233966508 │
│ 644325529233966497 │
│ 644325529233966510 │
│ 644325529233966504 │
│ 644325529233966509 │
│ 644325529233966355 │
│ 644325529233966354 │
└────────────────────┘
h3GetBaseCell
Returns the base cell number of the H3 index.
Syntax
h3GetBaseCell(index)
Parameter
index
— Hexagon index number. Type: UInt64.
Returned value
Hexagon base cell number.
Type: UInt8.
Example
Query:
SELECT h3GetBaseCell(612916788725809151) AS basecell;
Result:
┌─basecell─┐
│ 12 │
└──────────┘
h3HexAreaM2
Returns average hexagon area in square meters at the given resolution.
Syntax
h3HexAreaM2(resolution)
Parameter
resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned value
Area in square meters.
Type: Float64.
Example
Query:
SELECT h3HexAreaM2(13) AS area;
Result:
┌─area─┐
│ 43.9 │
└──────┘
h3IndexesAreNeighbors
Returns whether or not the provided H3 indexes are neighbors.
Syntax
h3IndexesAreNeighbors(index1, index2)
Arguments
Returned value
1
— Indexes are neighbours.0
— Indexes are not neighbours.
Type: UInt8.
Example
Query:
SELECT h3IndexesAreNeighbors(617420388351344639, 617420388352655359) AS n;
Result:
┌─n─┐
│ 1 │
└───┘
h3ToChildren
Returns an array of child indexes for the given H3 index.
Syntax
h3ToChildren(index, resolution)
Arguments
index
— Hexagon index number. Type: UInt64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned values
Array of the child H3-indexes.
Example
Query:
SELECT h3ToChildren(599405990164561919, 6) AS children;
Result:
┌─children───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [603909588852408319,603909588986626047,603909589120843775,603909589255061503,603909589389279231,603909589523496959,603909589657714687] │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
h3ToParent
Returns the parent (coarser) index containing the given H3 index.
Syntax
h3ToParent(index, resolution)
Arguments
index
— Hexagon index number. Type: UInt64.resolution
— Index resolution. Range:[0, 15]
. Type: UInt8.
Returned value
Parent H3 index.
Type: UInt64.
Example
Query:
SELECT h3ToParent(599405990164561919, 3) AS parent;
Result:
┌─────────────parent─┐
│ 590398848891879423 │
└────────────────────┘
h3ToString
Converts the H3Index
representation of the index to the string representation.
h3ToString(index)
Parameter
index
— Hexagon index number. Type: UInt64.
Returned value
String representation of the H3 index.
Type: String.
Example
Query:
SELECT h3ToString(617420388352917503) AS h3_string;
Result:
┌─h3_string───────┐
│ 89184926cdbffff │
└─────────────────┘
stringToH3
Converts the string representation to the H3Index
(UInt64) representation.
Syntax
stringToH3(index_str)
Parameter
index_str
— String representation of the H3 index. Type: String.
Returned value
Hexagon index number. Returns 0 on error. Type: UInt64.
Example
Query:
SELECT stringToH3('89184926cc3ffff') AS index;
Result:
┌──────────────index─┐
│ 617420388351344639 │
└────────────────────┘
HASH
Hash functions can be used for the deterministic pseudo-random shuffling of elements.
Simhash is a hash function, which returns close hash values for close (similar) arguments.
halfMD5
Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64
in big-endian byte order.
halfMD5(par1, ...)
The function is relatively slow (5 million short strings per second per processor core). Consider using the sipHash64 function instead.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
SELECT halfMD5(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS halfMD5hash, toTypeName(halfMD5hash) AS type;
┌────────halfMD5hash─┬─type───┐
│ 186182704141653334 │ UInt64 │
└────────────────────┴────────┘
MD5
Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16). If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘sipHash128’ function instead. If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).
sipHash64
Produces a 64-bit SipHash hash value.
sipHash64(par1,...)
This is a cryptographic hash function. It works at least three times faster than the MD5 function.
Function interprets all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
After hashing all the input parameters, the function gets the array of hashes.
Function takes the first and the second elements and calculates a hash for the array of them.
Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.
The previous step is repeated for all the remaining elements of the initial hash array.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
SELECT sipHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS SipHash, toTypeName(SipHash) AS type;
┌──────────────SipHash─┬─type───┐
│ 13726873534472839665 │ UInt64 │
└──────────────────────┴────────┘
sipHash128
Produces a 128-bit SipHash hash value. Differs from sipHash64 in that the final xor-folding state is done up to 128 bits.
Syntax
sipHash128(par1,...)
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned value
A 128-bit SipHash
hash value.
Type: FixedString(16).
Example
Query:
SELECT hex(sipHash128('foo', '\x01', 3));
Result:
┌─hex(sipHash128('foo', '', 3))────┐
│ 9DE516A64A414D4B1B609415E4523F24 │
└──────────────────────────────────┘
cityHash64
Produces a 64-bit CityHash hash value.
cityHash64(par1,...)
This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Examples
Call example:
SELECT cityHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS CityHash, toTypeName(CityHash) AS type;
┌─────────────CityHash─┬─type───┐
│ 12072650598913549138 │ UInt64 │
└──────────────────────┴────────┘
The following example shows how to compute the checksum of the entire table with accuracy up to the row order:
SELECT groupBitXor(cityHash64(*)) FROM table
intHash32
Calculates a 32-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers.
intHash64
Calculates a 64-bit hash code from any type of integer. It works faster than intHash32. Average quality.
SHA1, SHA224, SHA256, SHA512
Calculates SHA-1, SHA-224, SHA-256, SHA-512 hash from a string and returns the resulting set of bytes as FixedString.
Syntax
SHA1('s')
...
SHA512('s')
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million). We recommend using this function only in cases when you need a specific hash function and you can’t select it. Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECT
queries.
Arguments
s
— Input string for SHA hash calculation. String.
Returned value
SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64).
Type: FixedString.
Example
Use the hex function to represent the result as a hex-encoded string.
Query:
SELECT hex(SHA1('abc'));
Result:
┌─hex(SHA1('abc'))─────────────────────────┐
│ A9993E364706816ABA3E25717850C26C9CD0D89D │
└──────────────────────────────────────────┘
URLHash(url[, N])
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization. URLHash(s)
– Calculates a hash from a string without one of the trailing symbols /
,?
or #
at the end, if present. URLHash(s, N)
– Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /
,?
or #
at the end, if present. Levels are the same as in URLHierarchy.
farmFingerprint64
farmHash64
Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64
is preferred for a stable and portable value.
farmFingerprint64(par1, ...)
farmHash64(par1, ...)
These functions use the Fingerprint64
and Hash64
methods respectively from all available methods.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data)..
Returned Value
A UInt64 data type hash value.
Example
SELECT farmHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS FarmHash, toTypeName(FarmHash) AS type;
┌─────────────FarmHash─┬─type───┐
│ 17790458267262532859 │ UInt64 │
└──────────────────────┴────────┘
javaHash
Calculates JavaHash from a string, Byte, Short, Integer, Long. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.
Syntax
SELECT javaHash('')
Returned value
A Int32
data type hash value.
Example
Query:
SELECT javaHash(toInt32(123));
Result:
┌─javaHash(toInt32(123))─┐
│ 123 │
└────────────────────────┘
Query:
SELECT javaHash('Hello, world!');
Result:
┌─javaHash('Hello, world!')─┐
│ -1880044555 │
└───────────────────────────┘
javaHashUTF16LE
Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.
Syntax
javaHashUTF16LE(stringUtf16le)
Arguments
stringUtf16le
— a string in UTF-16LE encoding.
Returned value
A Int32
data type hash value.
Example
Correct query with UTF-16LE encoded string.
Query:
SELECT javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'));
Result:
┌─javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))─┐
│ 3556498 │
└──────────────────────────────────────────────────────────────┘
hiveHash
Calculates HiveHash
from a string.
SELECT hiveHash('')
This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Returned value
A Int32
data type hash value.
Type: hiveHash
.
Example
Query:
SELECT hiveHash('Hello, world!');
Result:
┌─hiveHash('Hello, world!')─┐
│ 267439093 │
└───────────────────────────┘
metroHash64
Produces a 64-bit MetroHash hash value.
metroHash64(par1, ...)
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
SELECT metroHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MetroHash, toTypeName(MetroHash) AS type;
┌────────────MetroHash─┬─type───┐
│ 14235658766382344533 │ UInt64 │
└──────────────────────┴────────┘
jumpConsistentHash
Calculates JumpConsistentHash form a UInt64. Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32. For more information, see the link: JumpConsistentHash
murmurHash2_32, murmurHash2_64
Produces a MurmurHash2 hash value.
murmurHash2_32(par1, ...)
murmurHash2_64(par1, ...)
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
The
murmurHash2_32
function returns hash value having the UInt32 data type.The
murmurHash2_64
function returns hash value having the UInt64 data type.
Example
SELECT murmurHash2_64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash2, toTypeName(MurmurHash2) AS type;
┌──────────MurmurHash2─┬─type───┐
│ 11832096901709403633 │ UInt64 │
└──────────────────────┴────────┘
gccMurmurHash
Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between CLang and GCC builds.
Syntax
gccMurmurHash(par1, ...)
Arguments
par1, ...
— A variable number of parameters that can be any of the supported data types.
Returned value
Calculated hash value.
Type: UInt64.
Example
Query:
SELECT
gccMurmurHash(1, 2, 3) AS res1,
gccMurmurHash(('a', [1, 2, 3], 4, (4, ['foo', 'bar'], 1, (1, 2)))) AS res2
Result:
┌─────────────────res1─┬────────────────res2─┐
│ 12384823029245979431 │ 1188926775431157506 │
└──────────────────────┴─────────────────────┘
murmurHash3_32, murmurHash3_64
Produces a MurmurHash3 hash value.
murmurHash3_32(par1, ...)
murmurHash3_64(par1, ...)
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
The
murmurHash3_32
function returns a UInt32 data type hash value.The
murmurHash3_64
function returns a UInt64 data type hash value.
Example
SELECT murmurHash3_32(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash3, toTypeName(MurmurHash3) AS type;
┌─MurmurHash3─┬─type───┐
│ 2152717 │ UInt32 │
└─────────────┴────────┘
murmurHash3_128
Produces a 128-bit MurmurHash3 hash value.
Syntax
murmurHash3_128(expr)
Arguments
expr
— A list of expressions. String.
Returned value
A 128-bit MurmurHash3
hash value.
Type: FixedString(16).
Example
Query:
SELECT hex(murmurHash3_128('foo', 'foo', 'foo'));
Result:
┌─hex(murmurHash3_128('foo', 'foo', 'foo'))─┐
│ F8F7AD9B6CD4CF117A71E277E2EC2931 │
└───────────────────────────────────────────┘
xxHash32, xxHash64
Calculates xxHash
from a string. It is proposed in two flavors, 32 and 64 bits.
SELECT xxHash32('')
OR
SELECT xxHash64('')
Returned value
A UInt32
or UInt64
data type hash value.
Type: UInt32
for xxHash32
and UInt64
for xxHash64
.
Example
Query:
SELECT xxHash32('Hello, world!');
Result:
┌─xxHash32('Hello, world!')─┐
│ 834093149 │
└───────────────────────────┘
See Also
ngramSimHash
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
ngramSimHash(string[, ngramsize])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT ngramSimHash('ClickHouse') AS Hash;
Result:
┌───────Hash─┐
│ 1627567969 │
└────────────┘
ngramSimHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
ngramSimHashCaseInsensitive(string[, ngramsize])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT ngramSimHashCaseInsensitive('ClickHouse') AS Hash;
Result:
┌──────Hash─┐
│ 562180645 │
└───────────┘
ngramSimHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
ngramSimHashUTF8(string[, ngramsize])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT ngramSimHashUTF8('ClickHouse') AS Hash;
Result:
┌───────Hash─┐
│ 1628157797 │
└────────────┘
ngramSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
ngramSimHashCaseInsensitiveUTF8(string[, ngramsize])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT ngramSimHashCaseInsensitiveUTF8('ClickHouse') AS Hash;
Result:
┌───────Hash─┐
│ 1636742693 │
└────────────┘
wordShingleSimHash
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
wordShingleSimHash(string[, shinglesize])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT wordShingleSimHash('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;
Result:
┌───────Hash─┐
│ 2328277067 │
└────────────┘
wordShingleSimHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
wordShingleSimHashCaseInsensitive(string[, shinglesize])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT wordShingleSimHashCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;
Result:
┌───────Hash─┐
│ 2194812424 │
└────────────┘
wordShingleSimHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
wordShingleSimHashUTF8(string[, shinglesize])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optinal. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT wordShingleSimHashUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;
Result:
┌───────Hash─┐
│ 2328277067 │
└────────────┘
wordShingleSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
wordShingleSimHashCaseInsensitiveUTF8(string[, shinglesize])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT wordShingleSimHashCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Hash;
Result:
┌───────Hash─┐
│ 2194812424 │
└────────────┘
ngramMinHash
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
ngramMinHash(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT ngramMinHash('ClickHouse') AS Tuple;
Result:
┌─Tuple──────────────────────────────────────┐
│ (18333312859352735453,9054248444481805918) │
└────────────────────────────────────────────┘
ngramMinHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
ngramMinHashCaseInsensitive(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT ngramMinHashCaseInsensitive('ClickHouse') AS Tuple;
Result:
┌─Tuple──────────────────────────────────────┐
│ (2106263556442004574,13203602793651726206) │
└────────────────────────────────────────────┘
ngramMinHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
ngramMinHashUTF8(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT ngramMinHashUTF8('ClickHouse') AS Tuple;
Result:
┌─Tuple──────────────────────────────────────┐
│ (18333312859352735453,6742163577938632877) │
└────────────────────────────────────────────┘
ngramMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
ngramMinHashCaseInsensitiveUTF8(string [, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT ngramMinHashCaseInsensitiveUTF8('ClickHouse') AS Tuple;
Result:
┌─Tuple───────────────────────────────────────┐
│ (12493625717655877135,13203602793651726206) │
└─────────────────────────────────────────────┘
ngramMinHashArg
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash function with the same input. Is case sensitive.
Syntax
ngramMinHashArg(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT ngramMinHashArg('ClickHouse') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','Hou','kHo','use'),('Hou','lic','ick','ous','ckH','Cli')) │
└───────────────────────────────────────────────────────────────────────────────┘
ngramMinHashArgCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
ngramMinHashArgCaseInsensitive(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT ngramMinHashArgCaseInsensitive('ClickHouse') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','kHo','use','Cli'),('kHo','lic','ick','ous','ckH','Hou')) │
└───────────────────────────────────────────────────────────────────────────────┘
ngramMinHashArgUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8 function with the same input. Is case sensitive.
Syntax
ngramMinHashArgUTF8(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT ngramMinHashArgUTF8('ClickHouse') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ous','ick','lic','Hou','kHo','use'),('kHo','Hou','lic','ick','ous','ckH')) │
└───────────────────────────────────────────────────────────────────────────────┘
ngramMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
ngramMinHashArgCaseInsensitiveUTF8(string[, ngramsize, hashnum])
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
n-grams each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT ngramMinHashArgCaseInsensitiveUTF8('ClickHouse') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────────────┐
│ (('ckH','ous','ick','lic','kHo','use'),('kHo','lic','ick','ous','ckH','Hou')) │
└───────────────────────────────────────────────────────────────────────────────┘
wordShingleMinHash
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
wordShingleMinHash(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT wordShingleMinHash('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;
Result:
┌─Tuple──────────────────────────────────────┐
│ (16452112859864147620,5844417301642981317) │
└────────────────────────────────────────────┘
wordShingleMinHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
wordShingleMinHashCaseInsensitive(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT wordShingleMinHashCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────┐
│ (3065874883688416519,1634050779997673240) │
└───────────────────────────────────────────┘
wordShingleMinHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
wordShingleMinHashUTF8(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT wordShingleMinHashUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;
Result:
┌─Tuple──────────────────────────────────────┐
│ (16452112859864147620,5844417301642981317) │
└────────────────────────────────────────────┘
wordShingleMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
wordShingleMinHashCaseInsensitiveUTF8(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum.
Example
Query:
SELECT wordShingleMinHashCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).') AS Tuple;
Result:
┌─Tuple─────────────────────────────────────┐
│ (3065874883688416519,1634050779997673240) │
└───────────────────────────────────────────┘
wordShingleMinHashArg
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordshingleMinHash function with the same input. Is case sensitive.
Syntax
wordShingleMinHashArg(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT wordShingleMinHashArg('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────┐
│ (('OLAP','database','analytical'),('online','oriented','processing')) │
└───────────────────────────────────────────────────────────────────────┘
wordShingleMinHashArgCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
wordShingleMinHashArgCaseInsensitive(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT wordShingleMinHashArgCaseInsensitive('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;
Result:
┌─Tuple──────────────────────────────────────────────────────────────────┐
│ (('queries','database','analytical'),('oriented','processing','DBMS')) │
└────────────────────────────────────────────────────────────────────────┘
wordShingleMinHashArgUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8 function with the same input. Is case sensitive.
Syntax
wordShingleMinHashArgUTF8(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT wordShingleMinHashArgUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;
Result:
┌─Tuple─────────────────────────────────────────────────────────────────┐
│ (('OLAP','database','analytical'),('online','oriented','processing')) │
└───────────────────────────────────────────────────────────────────────┘
wordShingleMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
wordShingleMinHashArgCaseInsensitiveUTF8(string[, shinglesize, hashnum])
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two tuples with
hashnum
word shingles each.
Type: Tuple(Tuple(String), Tuple(String)).
Example
Query:
SELECT wordShingleMinHashArgCaseInsensitiveUTF8('ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).', 1, 3) AS Tuple;
Result:
┌─Tuple──────────────────────────────────────────────────────────────────┐
│ (('queries','database','analytical'),('oriented','processing','DBMS')) │
└────────────────────────────────────────────────────────────────────────┘
INTROSPECTION FUNCTIONS
You can use functions described in this chapter to introspect ELF and DWARF for query profiling.
WARNING
These functions are slow and may impose security considerations.
For proper operation of introspection functions:
Install the
clickhouse-common-static-dbg
package.Set the allow_introspection_functions setting to 1.
For security reasons introspection functions are disabled by default.
ClickHouse saves profiler reports to the trace_log system table. Make sure the table and profiler are configured properly.
addressToLine
Converts virtual memory address inside ClickHouse server process to the filename and the line number in ClickHouse source code.
If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg
package.
Syntax
addressToLine(address_of_binary_instruction)
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Source code filename and the line number in this file delimited by colon.
For example, `/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199`, where `199` is a line number.
Name of a binary, if the function couldn’t find the debug information.
Empty string, if the address is not valid.
Type: String.
Example
Enabling introspection functions:
SET allow_introspection_functions=1;
Selecting the first string from the trace_log
system table:
SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date: 2019-11-19
event_time: 2019-11-19 18:57:23
revision: 54429
timer_type: Real
thread_number: 48
query_id: 421b6855-1858-45a5-8f37-f383409d6d72
trace: [140658411141617,94784174532828,94784076370703,94784076372094,94784076361020,94784175007680,140658411116251,140658403895439]
The trace
field contains the stack trace at the moment of sampling.
Getting the source code filename and the line number for a single address:
SELECT addressToLine(94784076370703) \G;
Row 1:
──────
addressToLine(94784076370703): /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199
Applying the function to the whole stack trace:
SELECT
arrayStringConcat(arrayMap(x -> addressToLine(x), trace), '\n') AS trace_source_code_lines
FROM system.trace_log
LIMIT 1
\G
The arrayMap function allows to process each individual element of the trace
array by the addressToLine
function. The result of this processing you see in the trace_source_code_lines
column of output.
Row 1:
──────
trace_source_code_lines: /lib/x86_64-linux-gnu/libpthread-2.27.so
/usr/lib/debug/usr/bin/clickhouse
/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:199
/build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.h:155
/usr/include/c++/9/bits/atomic_base.h:551
/usr/lib/debug/usr/bin/clickhouse
/lib/x86_64-linux-gnu/libpthread-2.27.so
/build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97
addressToLineWithInlines
Similar to addressToLine
, but it will return an Array with all inline functions, and will be much slower as a price.
If you use official ClickHouse packages, you need to install the clickhouse-common-static-dbg
package.
Syntax
addressToLineWithInlines(address_of_binary_instruction)
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Array which first element is source code filename and the line number in this file delimited by colon. And from second element, inline functions' source code filename and line number and function name are listed.
Array with single element which is name of a binary, if the function couldn’t find the debug information.
Empty array, if the address is not valid.
Type: Array(String).
Example
Enabling introspection functions:
SET allow_introspection_functions=1;
Applying the function to address.
SELECT addressToLineWithInlines(531055181::UInt64);
┌─addressToLineWithInlines(CAST('531055181', 'UInt64'))────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ['./src/Functions/addressToLineWithInlines.cpp:98','./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:176:DB::(anonymous namespace)::FunctionAddressToLineWithInlines::implCached(unsigned long) const'] │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Applying the function to the whole stack trace:
SELECT
ta, addressToLineWithInlines(arrayJoin(trace) as ta)
FROM system.trace_log
WHERE
query_id = '5e173544-2020-45de-b645-5deebe2aae54';
The arrayJoin functions will split array to rows.
┌────────ta─┬─addressToLineWithInlines(arrayJoin(trace))───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 365497529 │ ['./build_normal_debug/./contrib/libcxx/include/string_view:252'] │
│ 365593602 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:191'] │
│ 365593866 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365592528 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365591003 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:477'] │
│ 365590479 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:442'] │
│ 365590600 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:457'] │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451'] │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451'] │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365607098 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365590571 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:451'] │
│ 365598941 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:0'] │
│ 365597289 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:807'] │
│ 365599840 │ ['./build_normal_debug/./src/Common/Dwarf.cpp:1118'] │
│ 531058145 │ ['./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:152'] │
│ 531055181 │ ['./src/Functions/addressToLineWithInlines.cpp:98','./build_normal_debug/./src/Functions/addressToLineWithInlines.cpp:176:DB::(anonymous namespace)::FunctionAddressToLineWithInlines::implCached(unsigned long) const'] │
│ 422333613 │ ['./build_normal_debug/./src/Functions/IFunctionAdaptors.h:21'] │
│ 586866022 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:216'] │
│ 586869053 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:264'] │
│ 586873237 │ ['./build_normal_debug/./src/Functions/IFunction.cpp:334'] │
│ 597901620 │ ['./build_normal_debug/./src/Interpreters/ExpressionActions.cpp:601'] │
│ 597898534 │ ['./build_normal_debug/./src/Interpreters/ExpressionActions.cpp:718'] │
│ 630442912 │ ['./build_normal_debug/./src/Processors/Transforms/ExpressionTransform.cpp:23'] │
│ 546354050 │ ['./build_normal_debug/./src/Processors/ISimpleTransform.h:38'] │
│ 626026993 │ ['./build_normal_debug/./src/Processors/ISimpleTransform.cpp:89'] │
│ 626294022 │ ['./build_normal_debug/./src/Processors/Executors/ExecutionThreadContext.cpp:45'] │
│ 626293730 │ ['./build_normal_debug/./src/Processors/Executors/ExecutionThreadContext.cpp:63'] │
│ 626169525 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:213'] │
│ 626170308 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:178'] │
│ 626166348 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:329'] │
│ 626163461 │ ['./build_normal_debug/./src/Processors/Executors/PipelineExecutor.cpp:84'] │
│ 626323536 │ ['./build_normal_debug/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:85'] │
│ 626323277 │ ['./build_normal_debug/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:112'] │
│ 626323133 │ ['./build_normal_debug/./contrib/libcxx/include/type_traits:3682'] │
│ 626323041 │ ['./build_normal_debug/./contrib/libcxx/include/tuple:1415'] │
└───────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
addressToSymbol
Converts virtual memory address inside ClickHouse server process to the symbol from ClickHouse object files.
Syntax
addressToSymbol(address_of_binary_instruction)
Arguments
address_of_binary_instruction
(UInt64) — Address of instruction in a running process.
Returned value
Symbol from ClickHouse object files.
Empty string, if the address is not valid.
Type: String.
Example
Enabling introspection functions:
SET allow_introspection_functions=1;
Selecting the first string from the trace_log
system table:
SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date: 2019-11-20
event_time: 2019-11-20 16:57:59
revision: 54429
timer_type: Real
thread_number: 48
query_id: 724028bf-f550-45aa-910d-2af6212b94ac
trace: [94138803686098,94138815010911,94138815096522,94138815101224,94138815102091,94138814222988,94138806823642,94138814457211,94138806823642,94138814457211,94138806823642,94138806795179,94138806796144,94138753770094,94138753771646,94138753760572,94138852407232,140399185266395,140399178045583]
The trace
field contains the stack trace at the moment of sampling.
Getting a symbol for a single address:
SELECT addressToSymbol(94138803686098) \G;
Row 1:
──────
addressToSymbol(94138803686098): _ZNK2DB24IAggregateFunctionHelperINS_20AggregateFunctionSumImmNS_24AggregateFunctionSumDataImEEEEE19addBatchSinglePlaceEmPcPPKNS_7IColumnEPNS_5ArenaE
Applying the function to the whole stack trace:
SELECT
arrayStringConcat(arrayMap(x -> addressToSymbol(x), trace), '\n') AS trace_symbols
FROM system.trace_log
LIMIT 1
\G
The arrayMap function allows to process each individual element of the trace
array by the addressToSymbols
function. The result of this processing you see in the trace_symbols
column of output.
Row 1:
──────
trace_symbols: _ZNK2DB24IAggregateFunctionHelperINS_20AggregateFunctionSumImmNS_24AggregateFunctionSumDataImEEEEE19addBatchSinglePlaceEmPcPPKNS_7IColumnEPNS_5ArenaE
_ZNK2DB10Aggregator21executeWithoutKeyImplERPcmPNS0_28AggregateFunctionInstructionEPNS_5ArenaE
_ZN2DB10Aggregator14executeOnBlockESt6vectorIN3COWINS_7IColumnEE13immutable_ptrIS3_EESaIS6_EEmRNS_22AggregatedDataVariantsERS1_IPKS3_SaISC_EERS1_ISE_SaISE_EERb
_ZN2DB10Aggregator14executeOnBlockERKNS_5BlockERNS_22AggregatedDataVariantsERSt6vectorIPKNS_7IColumnESaIS9_EERS6_ISB_SaISB_EERb
_ZN2DB10Aggregator7executeERKSt10shared_ptrINS_17IBlockInputStreamEERNS_22AggregatedDataVariantsE
_ZN2DB27AggregatingBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB26ExpressionBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB26ExpressionBlockInputStream8readImplEv
_ZN2DB17IBlockInputStream4readEv
_ZN2DB28AsynchronousBlockInputStream9calculateEv
_ZNSt17_Function_handlerIFvvEZN2DB28AsynchronousBlockInputStream4nextEvEUlvE_E9_M_invokeERKSt9_Any_data
_ZN14ThreadPoolImplI20ThreadFromGlobalPoolE6workerESt14_List_iteratorIS0_E
_ZZN20ThreadFromGlobalPoolC4IZN14ThreadPoolImplIS_E12scheduleImplIvEET_St8functionIFvvEEiSt8optionalImEEUlvE1_JEEEOS4_DpOT0_ENKUlvE_clEv
_ZN14ThreadPoolImplISt6threadE6workerESt14_List_iteratorIS0_E
execute_native_thread_routine
start_thread
clone
demangle
Converts a symbol that you can get using the addressToSymbol function to the C++ function name.
Syntax
demangle(symbol)
Arguments
symbol
(String) — Symbol from an object file.
Returned value
Name of the C++ function.
Empty string if a symbol is not valid.
Type: String.
Example
Enabling introspection functions:
SET allow_introspection_functions=1;
Selecting the first string from the trace_log
system table:
SELECT * FROM system.trace_log LIMIT 1 \G;
Row 1:
──────
event_date: 2019-11-20
event_time: 2019-11-20 16:57:59
revision: 54429
timer_type: Real
thread_number: 48
query_id: 724028bf-f550-45aa-910d-2af6212b94ac
trace: [94138803686098,94138815010911,94138815096522,94138815101224,94138815102091,94138814222988,94138806823642,94138814457211,94138806823642,94138814457211,94138806823642,94138806795179,94138806796144,94138753770094,94138753771646,94138753760572,94138852407232,140399185266395,140399178045583]
The trace
field contains the stack trace at the moment of sampling.
Getting a function name for a single address:
SELECT demangle(addressToSymbol(94138803686098)) \G;
Row 1:
──────
demangle(addressToSymbol(94138803686098)): DB::IAggregateFunctionHelper<DB::AggregateFunctionSum<unsigned long, unsigned long, DB::AggregateFunctionSumData<unsigned long> > >::addBatchSinglePlace(unsigned long, char*, DB::IColumn const**, DB::Arena*) const
Applying the function to the whole stack trace:
SELECT
arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS trace_functions
FROM system.trace_log
LIMIT 1
\G
The arrayMap function allows to process each individual element of the trace
array by the demangle
function. The result of this processing you see in the trace_functions
column of output.
Row 1:
──────
trace_functions: DB::IAggregateFunctionHelper<DB::AggregateFunctionSum<unsigned long, unsigned long, DB::AggregateFunctionSumData<unsigned long> > >::addBatchSinglePlace(unsigned long, char*, DB::IColumn const**, DB::Arena*) const
DB::Aggregator::executeWithoutKeyImpl(char*&, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, DB::Arena*) const
DB::Aggregator::executeOnBlock(std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn> > >, unsigned long, DB::AggregatedDataVariants&, std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >&, std::vector<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >, std::allocator<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> > > >&, bool&)
DB::Aggregator::executeOnBlock(DB::Block const&, DB::AggregatedDataVariants&, std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >&, std::vector<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> >, std::allocator<std::vector<DB::IColumn const*, std::allocator<DB::IColumn const*> > > >&, bool&)
DB::Aggregator::execute(std::shared_ptr<DB::IBlockInputStream> const&, DB::AggregatedDataVariants&)
DB::AggregatingBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::ExpressionBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::ExpressionBlockInputStream::readImpl()
DB::IBlockInputStream::read()
DB::AsynchronousBlockInputStream::calculate()
std::_Function_handler<void (), DB::AsynchronousBlockInputStream::next()::{lambda()#1}>::_M_invoke(std::_Any_data const&)
ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::_List_iterator<ThreadFromGlobalPool>)
ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}&&)::{lambda()#1}::operator()() const
ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)
execute_native_thread_routine
start_thread
clone
tid
Returns id of the thread, in which current Block is processed.
Syntax
tid()
Returned value
Current thread id. Uint64.
Example
Query:
SELECT tid();
Result:
┌─tid()─┐
│ 3878 │
└───────┘
logTrace
Emits trace log message to server log for each Block.
Syntax
logTrace('message')
Arguments
message
— Message that is emitted to server log. String.
Returned value
Always returns 0.
Example
Query:
SELECT logTrace('logTrace message');
Result:
┌─logTrace('logTrace message')─┐
│ 0 │
└──────────────────────────────┘
IP ADDRESSES
IPv4NumToString(num)
Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).
Alias: INET_NTOA
.
IPv4StringToNum(s)
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.
Alias: INET_ATON
.
IPv4NumToStringClassC(num)
Similar to IPv4NumToString, but using xxx instead of the last octet.
Example:
SELECT
IPv4NumToStringClassC(ClientIP) AS k,
count() AS c
FROM test.hits
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─k──────────────┬─────c─┐
│ 83.149.9.xxx │ 26238 │
│ 217.118.81.xxx │ 26074 │
│ 213.87.129.xxx │ 25481 │
│ 83.149.8.xxx │ 24984 │
│ 217.118.83.xxx │ 22797 │
│ 78.25.120.xxx │ 22354 │
│ 213.87.131.xxx │ 21285 │
│ 78.25.121.xxx │ 20887 │
│ 188.162.65.xxx │ 19694 │
│ 83.149.48.xxx │ 17406 │
└────────────────┴───────┘
Since using ‘xxx’ is highly unusual, this may be changed in the future. We recommend that you do not rely on the exact format of this fragment.
IPv6NumToString(x)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.
Alias: INET6_NTOA
.
Examples:
SELECT IPv6NumToString(toFixedString(unhex('2A0206B8000000000000000000000011'), 16)) AS addr;
┌─addr─────────┐
│ 2a02:6b8::11 │
└──────────────┘
SELECT
IPv6NumToString(ClientIP6 AS k),
count() AS c
FROM hits_all
WHERE EventDate = today() AND substring(ClientIP6, 1, 12) != unhex('00000000000000000000FFFF')
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─IPv6NumToString(ClientIP6)──────────────┬─────c─┐
│ 2a02:2168:aaa:bbbb::2 │ 24695 │
│ 2a02:2698:abcd:abcd:abcd:abcd:8888:5555 │ 22408 │
│ 2a02:6b8:0:fff::ff │ 16389 │
│ 2a01:4f8:111:6666::2 │ 16016 │
│ 2a02:2168:888:222::1 │ 15896 │
│ 2a01:7e00::ffff:ffff:ffff:222 │ 14774 │
│ 2a02:8109:eee:ee:eeee:eeee:eeee:eeee │ 14443 │
│ 2a02:810b:8888:888:8888:8888:8888:8888 │ 14345 │
│ 2a02:6b8:0:444:4444:4444:4444:4444 │ 14279 │
│ 2a01:7e00::ffff:ffff:ffff:ffff │ 13880 │
└─────────────────────────────────────────┴───────┘
SELECT
IPv6NumToString(ClientIP6 AS k),
count() AS c
FROM hits_all
WHERE EventDate = today()
GROUP BY k
ORDER BY c DESC
LIMIT 10
┌─IPv6NumToString(ClientIP6)─┬──────c─┐
│ ::ffff:94.26.111.111 │ 747440 │
│ ::ffff:37.143.222.4 │ 529483 │
│ ::ffff:5.166.111.99 │ 317707 │
│ ::ffff:46.38.11.77 │ 263086 │
│ ::ffff:79.105.111.111 │ 186611 │
│ ::ffff:93.92.111.88 │ 176773 │
│ ::ffff:84.53.111.33 │ 158709 │
│ ::ffff:217.118.11.22 │ 154004 │
│ ::ffff:217.118.11.33 │ 148449 │
│ ::ffff:217.118.11.44 │ 148243 │
└────────────────────────────┴────────┘
IPv6StringToNum
The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it throws exception.
If the input string contains a valid IPv4 address, returns its IPv6 equivalent. HEX can be uppercase or lowercase.
Alias: INET6_ATON
.
Syntax
IPv6StringToNum(string)
Argument
string
— IP address. String.
Returned value
IPv6 address in binary format.
Type: FixedString(16).
Example
Query:
SELECT addr, cutIPv6(IPv6StringToNum(addr), 0, 0) FROM (SELECT ['notaddress', '127.0.0.1', '1111::ffff'] AS addr) ARRAY JOIN addr;
Result:
┌─addr───────┬─cutIPv6(IPv6StringToNum(addr), 0, 0)─┐
│ notaddress │ :: │
│ 127.0.0.1 │ ::ffff:127.0.0.1 │
│ 1111::ffff │ 1111::ffff │
└────────────┴──────────────────────────────────────┘
See Also
IPv4ToIPv6(x)
Takes a UInt32
number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16)
value containing the IPv6 address in binary format. Examples:
SELECT IPv6NumToString(IPv4ToIPv6(IPv4StringToNum('192.168.0.1'))) AS addr;
┌─addr───────────────┐
│ ::ffff:192.168.0.1 │
└────────────────────┘
cutIPv6(x, bytesToCutForIPv6, bytesToCutForIPv4)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bytes removed in text format. For example:
WITH
IPv6StringToNum('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D') AS ipv6,
IPv4ToIPv6(IPv4StringToNum('192.168.0.1')) AS ipv4
SELECT
cutIPv6(ipv6, 2, 0),
cutIPv6(ipv4, 0, 2)
┌─cutIPv6(ipv6, 2, 0)─────────────────┬─cutIPv6(ipv4, 0, 2)─┐
│ 2001:db8:ac10:fe01:feed:babe:cafe:0 │ ::ffff:192.168.0.0 │
└─────────────────────────────────────┴─────────────────────┘
IPv4CIDRToRange(ipv4, Cidr),
Accepts an IPv4 and an UInt8 value containing the CIDR. Return a tuple with two IPv4 containing the lower range and the higher range of the subnet.
SELECT IPv4CIDRToRange(toIPv4('192.168.5.2'), 16);
┌─IPv4CIDRToRange(toIPv4('192.168.5.2'), 16)─┐
│ ('192.168.0.0','192.168.255.255') │
└────────────────────────────────────────────┘
IPv6CIDRToRange(ipv6, Cidr),
Accepts an IPv6 and an UInt8 value containing the CIDR. Return a tuple with two IPv6 containing the lower range and the higher range of the subnet.
SELECT IPv6CIDRToRange(toIPv6('2001:0db8:0000:85a3:0000:0000:ac1f:8001'), 32);
┌─IPv6CIDRToRange(toIPv6('2001:0db8:0000:85a3:0000:0000:ac1f:8001'), 32)─┐
│ ('2001:db8::','2001:db8:ffff:ffff:ffff:ffff:ffff:ffff') │
└────────────────────────────────────────────────────────────────────────┘
toIPv4(string)
An alias to IPv4StringToNum()
that takes a string form of IPv4 address and returns value of IPv4 type, which is binary equal to value returned by IPv4StringToNum()
.
WITH
'171.225.130.45' as IPv4_string
SELECT
toTypeName(IPv4StringToNum(IPv4_string)),
toTypeName(toIPv4(IPv4_string))
┌─toTypeName(IPv4StringToNum(IPv4_string))─┬─toTypeName(toIPv4(IPv4_string))─┐
│ UInt32 │ IPv4 │
└──────────────────────────────────────────┴─────────────────────────────────┘
WITH
'171.225.130.45' as IPv4_string
SELECT
hex(IPv4StringToNum(IPv4_string)),
hex(toIPv4(IPv4_string))
┌─hex(IPv4StringToNum(IPv4_string))─┬─hex(toIPv4(IPv4_string))─┐
│ ABE1822D │ ABE1822D │
└───────────────────────────────────┴──────────────────────────┘
toIPv6
Converts a string form of IPv6 address to IPv6 type. If the IPv6 address has an invalid format, returns an empty value. Similar to IPv6StringToNum function, which converts IPv6 address to binary format.
If the input string contains a valid IPv4 address, then the IPv6 equivalent of the IPv4 address is returned.
Syntax
toIPv6(string)
Argument
string
— IP address. String
Returned value
IP address.
Type: IPv6.
Examples
Query:
WITH '2001:438:ffff::407d:1bc1' AS IPv6_string
SELECT
hex(IPv6StringToNum(IPv6_string)),
hex(toIPv6(IPv6_string));
Result:
┌─hex(IPv6StringToNum(IPv6_string))─┬─hex(toIPv6(IPv6_string))─────────┐
│ 20010438FFFF000000000000407D1BC1 │ 20010438FFFF000000000000407D1BC1 │
└───────────────────────────────────┴──────────────────────────────────┘
Query:
SELECT toIPv6('127.0.0.1');
Result:
┌─toIPv6('127.0.0.1')─┐
│ ::ffff:127.0.0.1 │
└─────────────────────┘
isIPv4String
Determines whether the input string is an IPv4 address or not. If string
is IPv6 address returns 0
.
Syntax
isIPv4String(string)
Arguments
string
— IP address. String.
Returned value
1
ifstring
is IPv4 address,0
otherwise.
Type: UInt8.
Examples
Query:
SELECT addr, isIPv4String(addr) FROM ( SELECT ['0.0.0.0', '127.0.0.1', '::ffff:127.0.0.1'] AS addr ) ARRAY JOIN addr;
Result:
┌─addr─────────────┬─isIPv4String(addr)─┐
│ 0.0.0.0 │ 1 │
│ 127.0.0.1 │ 1 │
│ ::ffff:127.0.0.1 │ 0 │
└──────────────────┴────────────────────┘
isIPv6String
Determines whether the input string is an IPv6 address or not. If string
is IPv4 address returns 0
.
Syntax
isIPv6String(string)
Arguments
string
— IP address. String.
Returned value
1
ifstring
is IPv6 address,0
otherwise.
Type: UInt8.
Examples
Query:
SELECT addr, isIPv6String(addr) FROM ( SELECT ['::', '1111::ffff', '::ffff:127.0.0.1', '127.0.0.1'] AS addr ) ARRAY JOIN addr;
Result:
┌─addr─────────────┬─isIPv6String(addr)─┐
│ :: │ 1 │
│ 1111::ffff │ 1 │
│ ::ffff:127.0.0.1 │ 1 │
│ 127.0.0.1 │ 0 │
└──────────────────┴────────────────────┘
isIPAddressInRange
Determines if an IP address is contained in a network represented in the CIDR notation. Returns 1
if true, or 0
otherwise.
Syntax
isIPAddressInRange(address, prefix)
This function accepts both IPv4 and IPv6 addresses (and networks) represented as strings. It returns 0
if the IP version of the address and the CIDR don't match.
Arguments
Returned value
1
or0
.
Type: UInt8.
Example
Query:
SELECT isIPAddressInRange('127.0.0.1', '127.0.0.0/8');
Result:
┌─isIPAddressInRange('127.0.0.1', '127.0.0.0/8')─┐
│ 1 │
└────────────────────────────────────────────────┘
Query:
SELECT isIPAddressInRange('127.0.0.1', 'ffff::/16');
Result:
┌─isIPAddressInRange('127.0.0.1', 'ffff::/16')─┐
│ 0 │
└──────────────────────────────────────────────┘
Query:
SELECT isIPAddressInRange('::ffff:192.168.0.1', '::ffff:192.168.0.4/128');
Result:
┌─isIPAddressInRange('::ffff:192.168.0.1', '::ffff:192.168.0.4/128')─┐
│ 0 │
└────────────────────────────────────────────────────────────────────┘
JSON
ClickHouse has special functions for working with this JSON. All the JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
The following assumptions are made:
The field name (function argument) must be a constant.
The field name is somehow canonically encoded in JSON. For example:
visitParamHas('{"abc":"def"}', 'abc') = 1
, butvisitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0
Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
The JSON does not have space characters outside of string literals.
visitParamHas(params, name)
Checks whether there is a field with the name
name.
Alias: simpleJSONHas
.
visitParamExtractUInt(params, name)
Parses UInt64 from the value of the field named name
. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns 0.
Alias: simpleJSONExtractUInt
.
visitParamExtractInt(params, name)
The same as for Int64.
Alias: simpleJSONExtractInt
.
visitParamExtractFloat(params, name)
The same as for Float64.
Alias: simpleJSONExtractFloat
.
visitParamExtractBool(params, name)
Parses a true/false value. The result is UInt8.
Alias: simpleJSONExtractBool
.
visitParamExtractRaw(params, name)
Returns the value of a field, including separators.
Alias: simpleJSONExtractRaw
.
Examples:
visitParamExtractRaw('{"abc":"\\n\\u0000"}', 'abc') = '"\\n\\u0000"';
visitParamExtractRaw('{"abc":{"def":[1,2,3]}}', 'abc') = '{"def":[1,2,3]}';
visitParamExtractString(params, name)
Parses the string in double quotes. The value is unescaped. If unescaping failed, it returns an empty string.
Alias: simpleJSONExtractString
.
Examples:
visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0';
visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺';
visitParamExtractString('{"abc":"\\u263"}', 'abc') = '';
visitParamExtractString('{"abc":"hello}', 'abc') = '';
There is currently no support for code points in the format \uXXXX\uYYYY
that are not from the basic multilingual plane (they are converted to CESU-8 instead of UTF-8).
The following functions are based on simdjson designed for more complex JSON parsing requirements. The assumption 2 mentioned above still applies.
isValidJSON(json)
Checks that passed string is a valid json.
Examples:
SELECT isValidJSON('{"a": "hello", "b": [-100, 200.0, 300]}') = 1
SELECT isValidJSON('not a json') = 0
JSONHas(json[, indices_or_keys]…)
If the value exists in the JSON document, 1
will be returned.
If the value does not exist, 0
will be returned.
Examples:
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 1
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4) = 0
indices_or_keys
is a list of zero or more arguments each of them can be either string or integer.
String = access object member by key.
Positive integer = access the n-th member/key from the beginning.
Negative integer = access the n-th member/key from the end.
Minimum index of the element is 1. Thus the element 0 does not exist.
You may use integers to access both JSON arrays and JSON objects.
So, for example:
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'a'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 2) = 'b'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -1) = 'b'
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -2) = 'a'
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'hello'
JSONLength(json[, indices_or_keys]…)
Return the length of a JSON array or a JSON object.
If the value does not exist or has a wrong type, 0
will be returned.
Examples:
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 3
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}') = 2
JSONType(json[, indices_or_keys]…)
Return the type of a JSON value.
If the value does not exist, Null
will be returned.
Examples:
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}') = 'Object'
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'String'
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 'Array'
JSONExtractUInt(json[, indices_or_keys]…)
JSONExtractInt(json[, indices_or_keys]…)
JSONExtractFloat(json[, indices_or_keys]…)
JSONExtractBool(json[, indices_or_keys]…)
Parses a JSON and extract a value. These functions are similar to visitParam
functions.
If the value does not exist or has a wrong type, 0
will be returned.
Examples:
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 1) = -100
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) = 200.0
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) = 300
JSONExtractString(json[, indices_or_keys]…)
Parses a JSON and extract a string. This function is similar to visitParamExtractString
functions.
If the value does not exist or has a wrong type, an empty string will be returned.
The value is unescaped. If unescaping failed, it returns an empty string.
Examples:
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'hello'
SELECT JSONExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0'
SELECT JSONExtractString('{"abc":"\\u263a"}', 'abc') = '☺'
SELECT JSONExtractString('{"abc":"\\u263"}', 'abc') = ''
SELECT JSONExtractString('{"abc":"hello}', 'abc') = ''
JSONExtract(json[, indices_or_keys…], Return_type)
Parses a JSON and extract a value of the given ClickHouse data type.
This is a generalization of the previous JSONExtract<type>
functions. This means JSONExtract(..., 'String')
returns exactly the same as JSONExtractString()
, JSONExtract(..., 'Float64')
returns exactly the same as JSONExtractFloat()
.
Examples:
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(String, Array(Float64))') = ('hello',[-100,200,300])
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(b Array(Float64), a String)') = ([-100,200,300],'hello')
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 'Array(Nullable(Int8))') = [-100, NULL, NULL]
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4, 'Nullable(Int64)') = NULL
SELECT JSONExtract('{"passed": true}', 'passed', 'UInt8') = 1
SELECT JSONExtract('{"day": "Thursday"}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Thursday'
SELECT JSONExtract('{"day": 5}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Friday'
JSONExtractKeysAndValues(json[, indices_or_keys…], Value_type)
Parses key-value pairs from a JSON where the values are of the given ClickHouse data type.
Example:
SELECT JSONExtractKeysAndValues('{"x": {"a": 5, "b": 7, "c": 11}}', 'x', 'Int8') = [('a',5),('b',7),('c',11)];
JSONExtractKeys
Parses a JSON string and extracts the keys.
Syntax
JSONExtractKeys(json[, a, b, c...])
Arguments
json
— String with valid JSON.a, b, c...
— Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a String to get the field by the key or an Integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
Returned value
Array with the keys of the JSON.
Example
Query:
SELECT JSONExtractKeys('{"a": "hello", "b": [-100, 200.0, 300]}');
Result:
text
┌─JSONExtractKeys('{"a": "hello", "b": [-100, 200.0, 300]}')─┐
│ ['a','b'] │
└────────────────────────────────────────────────────────────┘
JSONExtractRaw(json[, indices_or_keys]…)
Returns a part of JSON as unparsed string.
If the part does not exist or has a wrong type, an empty string will be returned.
Example:
SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = '[-100, 200.0, 300]';
JSONExtractArrayRaw(json[, indices_or_keys…])
Returns an array with elements of JSON array, each represented as unparsed string.
If the part does not exist or isn’t array, an empty array will be returned.
Example:
SELECT JSONExtractArrayRaw('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b') = ['-100', '200.0', '"hello"'];
JSONExtractKeysAndValuesRaw
Extracts raw data from a JSON object.
Syntax
JSONExtractKeysAndValuesRaw(json[, p, a, t, h])
Arguments
json
— String with valid JSON.p, a, t, h
— Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a string to get the field by the key or an integer to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
Returned values
Array with
('key', 'value')
tuples. Both tuple members are strings.Empty array if the requested object does not exist, or input JSON is invalid.
Type: Array(Tuple(String, String).
Examples
Query:
SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}');
Result:
┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}')─┐
│ [('a','[-100,200]'),('b','{"c":{"d":"hello","f":"world"}}')] │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
Query:
SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', 'b');
Result:
┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', 'b')─┐
│ [('c','{"d":"hello","f":"world"}')] │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
Query:
SELECT JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', -1, 'c');
Result:
┌─JSONExtractKeysAndValuesRaw('{"a": [-100, 200.0], "b":{"c": {"d": "hello", "f": "world"}}}', -1, 'c')─┐
│ [('d','"hello"'),('f','"world"')] │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
JSON_EXISTS(json, path)
If the value exists in the JSON document, 1
will be returned.
If the value does not exist, 0
will be returned.
Examples:
SELECT JSON_EXISTS('{"hello":1}', '$.hello');
SELECT JSON_EXISTS('{"hello":{"world":1}}', '$.hello.world');
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[*]');
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[0]');
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
JSON_QUERY(json, path)
Parses a JSON and extract a value as JSON array or JSON object.
If the value does not exist, an empty string will be returned.
Example:
SELECT JSON_QUERY('{"hello":"world"}', '$.hello');
SELECT JSON_QUERY('{"array":[[0, 1, 2, 3, 4, 5], [0, -1, -2, -3, -4, -5]]}', '$.array[*][0 to 2, 4]');
SELECT JSON_QUERY('{"hello":2}', '$.hello');
SELECT toTypeName(JSON_QUERY('{"hello":2}', '$.hello'));
Result:
["world"]
[0, 1, 4, 0, -1, -4]
[2]
String
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json)
JSON_VALUE(json, path)
Parses a JSON and extract a value as JSON scalar.
If the value does not exist, an empty string will be returned.
Example:
SELECT JSON_VALUE('{"hello":"world"}', '$.hello');
SELECT JSON_VALUE('{"array":[[0, 1, 2, 3, 4, 5], [0, -1, -2, -3, -4, -5]]}', '$.array[*][0 to 2, 4]');
SELECT JSON_VALUE('{"hello":2}', '$.hello');
SELECT toTypeName(JSON_VALUE('{"hello":2}', '$.hello'));
Result:
world
0
2
String
NOTE
Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json)
toJSONString
Serializes a value to its JSON representation. Various data types and nested structures are supported. 64-bit integers or bigger (like UInt64
or Int128
) are enclosed in quotes by default. output_format_json_quote_64bit_integers controls this behavior. Special values NaN
and inf
are replaced with null
. Enable output_format_json_quote_denormals setting to show them. When serializing an Enum value, the function outputs its name.
Syntax
toJSONString(value)
Arguments
value
— Value to serialize. Value may be of any data type.
Returned value
JSON representation of the value.
Type: String.
Example
The first example shows serialization of a Map. The second example shows some special values wrapped into a Tuple.
Query:
SELECT toJSONString(map('key1', 1, 'key2', 2));
SELECT toJSONString(tuple(1.25, NULL, NaN, +inf, -inf, [])) SETTINGS output_format_json_quote_denormals = 1;
Result:
{"key1":1,"key2":2}
[1.25,null,"nan","inf","-inf",[]]
MACHINE LEARNING FUNCTIONS
evalMLMethod
Prediction using fitted regression models uses evalMLMethod
function. See link in linearRegression
.
stochasticLinearRegression
The stochasticLinearRegression aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses evalMLMethod
to predict on new data.
stochasticLogisticRegression
The stochasticLogisticRegression aggregate function implements stochastic gradient descent method for binary classification problem. Uses evalMLMethod
to predict on new data.
MAPS
map
Arranges key:value
pairs into Map(key, value) data type.
Syntax
map(key1, value1[, key2, value2, ...])
Arguments
key
— The key part of the pair. String, Integer, LowCardinality, FixedString, UUID, Date, DateTime, Date32, Enum.
Returned value
Data structure as
key:value
pairs.
Type: Map(key, value).
Examples
Query:
SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
Result:
┌─map('key1', number, 'key2', multiply(number, 2))─┐
│ {'key1':0,'key2':0} │
│ {'key1':1,'key2':2} │
│ {'key1':2,'key2':4} │
└──────────────────────────────────────────────────┘
Query:
CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a;
INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
SELECT a['key2'] FROM table_map;
Result:
┌─arrayElement(a, 'key2')─┐
│ 0 │
│ 2 │
│ 4 │
└─────────────────────────┘
See Also
Map(key, value) data type
mapAdd
Collect all the keys and sum corresponding values.
Syntax
mapAdd(arg1, arg2 [, ...])
Arguments
Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promoted to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.
Returned value
Example
Query with a tuple:
SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTypeName(res) as type;
Result:
┌─res───────────┬─type───────────────────────────────┐
│ ([1,2],[2,2]) │ Tuple(Array(UInt8), Array(UInt64)) │
└───────────────┴────────────────────────────────────┘
Query with Map
type:
SELECT mapAdd(map(1,1), map(1,1));
Result:
┌─mapAdd(map(1, 1), map(1, 1))─┐
│ {1:2} │
└──────────────────────────────┘
mapSubtract
Collect all the keys and subtract corresponding values.
Syntax
mapSubtract(Tuple(Array, Array), Tuple(Array, Array) [, ...])
Arguments
Arguments are maps or tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promote to the one type (Int64, UInt64 or Float64). The common promoted type is used as a type for the result array.
Returned value
Example
Query with a tuple map:
SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt32(2), 1])) as res, toTypeName(res) as type;
Result:
┌─res────────────┬─type──────────────────────────────┐
│ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │
└────────────────┴───────────────────────────────────┘
Query with Map
type:
SELECT mapSubtract(map(1,1), map(1,1));
Result:
┌─mapSubtract(map(1, 1), map(1, 1))─┐
│ {1:0} │
└───────────────────────────────────┘
mapPopulateSeries
Fills missing keys in the maps (key and value array pair), where keys are integers. Also, it supports specifying the max key, which is used to extend the keys array.
Syntax
mapPopulateSeries(keys, values[, max])
mapPopulateSeries(map[, max])
Generates a map (a tuple with two arrays or a value of Map
type, depending on the arguments), where keys are a series of numbers, from minimum to maximum keys (or max
argument if it specified) taken from the map with a step size of one, and corresponding values. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key.
For array arguments the number of elements in keys
and values
must be the same for each row.
Arguments
Arguments are maps or two arrays, where the first array represent keys, and the second array contains values for the each key.
Mapped arrays:
max
— Maximum key value. Optional. Int8, Int16, Int32, Int64, Int128, Int256.
or
map
— Map with integer keys. Map.
Returned value
Example
Query with mapped arrays:
SELECT mapPopulateSeries([1,2,4], [11,22,44], 5) AS res, toTypeName(res) AS type;
Result:
┌─res──────────────────────────┬─type──────────────────────────────┐
│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │
└──────────────────────────────┴───────────────────────────────────┘
Query with Map
type:
SELECT mapPopulateSeries(map(1, 10, 5, 20), 6);
Result:
┌─mapPopulateSeries(map(1, 10, 5, 20), 6)─┐
│ {1:10,2:0,3:0,4:0,5:20,6:0} │
└─────────────────────────────────────────┘
mapContains
Determines whether the map
contains the key
parameter.
Syntax
mapContains(map, key)
Parameters
map
— Map. Map.key
— Key. Type matches the type of keys ofmap
parameter.
Returned value
1
ifmap
containskey
,0
if not.
Type: UInt8.
Example
Query:
CREATE TABLE test (a Map(String,String)) ENGINE = Memory;
INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});
SELECT mapContains(a, 'name') FROM test;
Result:
┌─mapContains(a, 'name')─┐
│ 1 │
│ 0 │
└────────────────────────┘
mapKeys
Returns all keys from the map
parameter.
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only keys subcolumn instead of reading and processing the whole column data. The query SELECT mapKeys(m) FROM table
transforms to SELECT m.keys FROM table
.
Syntax
mapKeys(map)
Parameters
map
— Map. Map.
Returned value
Array containing all keys from the
map
.
Type: Array.
Example
Query:
CREATE TABLE test (a Map(String,String)) ENGINE = Memory;
INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});
SELECT mapKeys(a) FROM test;
Result:
┌─mapKeys(a)────────────┐
│ ['name','age'] │
│ ['number','position'] │
└───────────────────────┘
mapValues
Returns all values from the map
parameter.
Can be optimized by enabling the optimize_functions_to_subcolumns setting. With optimize_functions_to_subcolumns = 1
the function reads only values subcolumn instead of reading and processing the whole column data. The query SELECT mapValues(m) FROM table
transforms to SELECT m.values FROM table
.
Syntax
mapValues(map)
Parameters
map
— Map. Map.
Returned value
Array containing all the values from
map
.
Type: Array.
Example
Query:
CREATE TABLE test (a Map(String,String)) ENGINE = Memory;
INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'});
SELECT mapValues(a) FROM test;
Result:
┌─mapValues(a)─────┐
│ ['eleven','11'] │
│ ['twelve','6.0'] │
└──────────────────┘
MATHEMATICAL
All the functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.
e()
Returns a Float64 number that is close to the number e.
exp(x)
Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
log(x), ln(x)
Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.
exp2(x)
Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.
log2(x)
Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.
exp10(x)
Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.
log10(x)
Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.
sqrt(x)
Accepts a numeric argument and returns a Float64 number close to the square root of the argument.
cbrt(x)
Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.
erf(x)
If ‘x’ is non-negative, then erf(x / σ√2)
is the probability that a random variable having a normal distribution with standard deviation ‘σ’ takes the value that is separated from the expected value by more than ‘x’.
Example (three sigma rule):
SELECT erf(3 / sqrt(2));
┌─erf(divide(3, sqrt(2)))─┐
│ 0.9973002039367398 │
└─────────────────────────┘
erfc(x)
Accepts a numeric argument and returns a Float64 number close to 1 - erf(x), but without loss of precision for large ‘x’ values.
lgamma(x)
The logarithm of the gamma function.
tgamma(x)
Gamma function.
sin(x)
The sine.
cos(x)
The cosine.
tan(x)
The tangent.
asin(x)
The arc sine.
acos(x)
The arc cosine.
atan(x)
The arc tangent.
pow(x, y), power(x, y)
Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.
intExp2
Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.
intExp10
Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.
cosh(x)
Syntax
cosh(x)
Arguments
x
— The angle, in radians. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
Values from the interval:
1 <= cosh(x) < +∞
.
Type: Float64.
Example
Query:
SELECT cosh(0);
Result:
┌─cosh(0)──┐
│ 1 │
└──────────┘
acosh(x)
Syntax
acosh(x)
Arguments
x
— Hyperbolic cosine of angle. Values from the interval:1 <= x < +∞
. Float64.
Returned value
The angle, in radians. Values from the interval:
0 <= acosh(x) < +∞
.
Type: Float64.
Example
Query:
SELECT acosh(1);
Result:
┌─acosh(1)─┐
│ 0 │
└──────────┘
See Also
sinh(x)
Syntax
sinh(x)
Arguments
x
— The angle, in radians. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
Values from the interval:
-∞ < sinh(x) < +∞
.
Type: Float64.
Example
Query:
SELECT sinh(0);
Result:
┌─sinh(0)──┐
│ 0 │
└──────────┘
asinh(x)
Syntax
asinh(x)
Arguments
x
— Hyperbolic sine of angle. Values from the interval:-∞ < x < +∞
. Float64.
Returned value
The angle, in radians. Values from the interval:
-∞ < asinh(x) < +∞
.
Type: Float64.
Example
Query:
SELECT asinh(0);
Result:
┌─asinh(0)─┐
│ 0 │
└──────────┘
See Also
atanh(x)
Syntax
atanh(x)
Arguments
x
— Hyperbolic tangent of angle. Values from the interval:–1 < x < 1
. Float64.
Returned value
The angle, in radians. Values from the interval:
-∞ < atanh(x) < +∞
.
Type: Float64.
Example
Query:
SELECT atanh(0);
Result:
┌─atanh(0)─┐
│ 0 │
└──────────┘
atan2(y, x)
The function calculates the angle in the Euclidean plane, given in radians, between the positive x axis and the ray to the point (x, y) ≠ (0, 0)
.
Syntax
atan2(y, x)
Arguments
y
— y-coordinate of the point through which the ray passes. Float64.x
— x-coordinate of the point through which the ray passes. Float64.
Returned value
The angle
θ
such that−π < θ ≤ π
, in radians.
Type: Float64.
Example
Query:
SELECT atan2(1, 1);
Result:
┌────────atan2(1, 1)─┐
│ 0.7853981633974483 │
└────────────────────┘
hypot(x, y)
Calculates the length of the hypotenuse of a right-angle triangle. The function avoids problems that occur when squaring very large or very small numbers.
Syntax
hypot(x, y)
Arguments
x
— The first cathetus of a right-angle triangle. Float64.y
— The second cathetus of a right-angle triangle. Float64.
Returned value
The length of the hypotenuse of a right-angle triangle.
Type: Float64.
Example
Query:
SELECT hypot(1, 1);
Result:
┌────────hypot(1, 1)─┐
│ 1.4142135623730951 │
└────────────────────┘
log1p(x)
Calculates log(1+x)
. The function log1p(x)
is more accurate than log(1+x)
for small values of x.
Syntax
log1p(x)
Arguments
x
— Values from the interval:-1 < x < +∞
. Float64.
Returned value
Values from the interval:
-∞ < log1p(x) < +∞
.
Type: Float64.
Example
Query:
SELECT log1p(0);
Result:
┌─log1p(0)─┐
│ 0 │
└──────────┘
See Also
sign(x)
Returns the sign of a real number.
Syntax
sign(x)
Arguments
x
— Values from-∞
to+∞
. Support all numeric types in ClickHouse.
Returned value
-1 for
x < 0
0 for
x = 0
1 for
x > 0
Examples
Sign for the zero value:
SELECT sign(0);
Result:
┌─sign(0)─┐
│ 0 │
└─────────┘
Sign for the positive value:
SELECT sign(1);
Result:
┌─sign(1)─┐
│ 1 │
└─────────┘
Sign for the negative value:
SELECT sign(-1);
Result:
┌─sign(-1)─┐
│ -1 │
└──────────┘
NULLABLE
isNull
Checks whether the argument is NULL.
isNull(x)
Alias: ISNULL
.
Arguments
x
— A value with a non-compound data type.
Returned value
1
ifx
isNULL
.0
ifx
is notNULL
.
Example
Input table
┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │ 3 │
└───┴──────┘
Query
SELECT x FROM t_null WHERE isNull(y);
┌─x─┐
│ 1 │
└───┘
isNotNull
Checks whether the argument is NULL.
isNotNull(x)
Arguments:
x
— A value with a non-compound data type.
Returned value
0
ifx
isNULL
.1
ifx
is notNULL
.
Example
Input table
┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │ 3 │
└───┴──────┘
Query
SELECT x FROM t_null WHERE isNotNull(y);
┌─x─┐
│ 2 │
└───┘
coalesce
Checks from left to right whether NULL
arguments were passed and returns the first non-NULL
argument.
coalesce(x,...)
Arguments:
Any number of parameters of a non-compound type. All parameters must be compatible by data type.
Returned values
The first non-
NULL
argument.NULL
, if all arguments areNULL
.
Example
Consider a list of contacts that may specify multiple ways to contact a customer.
┌─name─────┬─mail─┬─phone─────┬──icq─┐
│ client 1 │ ᴺᵁᴸᴸ │ 123-45-67 │ 123 │
│ client 2 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
└──────────┴──────┴───────────┴──────┘
The mail
and phone
fields are of type String, but the icq
field is UInt32
, so it needs to be converted to String
.
Get the first available contact method for the customer from the contact list:
SELECT name, coalesce(mail, phone, CAST(icq,'Nullable(String)')) FROM aBook;
┌─name─────┬─coalesce(mail, phone, CAST(icq, 'Nullable(String)'))─┐
│ client 1 │ 123-45-67 │
│ client 2 │ ᴺᵁᴸᴸ │
└──────────┴──────────────────────────────────────────────────────┘
ifNull
Returns an alternative value if the main argument is NULL
.
ifNull(x,alt)
Arguments:
x
— The value to check forNULL
.alt
— The value that the function returns ifx
isNULL
.
Returned values
The value
x
, ifx
is notNULL
.The value
alt
, ifx
isNULL
.
Example
SELECT ifNull('a', 'b');
┌─ifNull('a', 'b')─┐
│ a │
└──────────────────┘
SELECT ifNull(NULL, 'b');
┌─ifNull(NULL, 'b')─┐
│ b │
└───────────────────┘
nullIf
Returns NULL
if the arguments are equal.
nullIf(x, y)
Arguments:
x
, y
— Values for comparison. They must be compatible types, or ClickHouse will generate an exception.
Returned values
NULL
, if the arguments are equal.The
x
value, if the arguments are not equal.
Example
SELECT nullIf(1, 1);
┌─nullIf(1, 1)─┐
│ ᴺᵁᴸᴸ │
└──────────────┘
SELECT nullIf(1, 2);
┌─nullIf(1, 2)─┐
│ 1 │
└──────────────┘
assumeNotNull
Results in an equivalent non-Nullable
value for a Nullable type. In case the original value is NULL
the result is undetermined. See also ifNull
and coalesce
functions.
assumeNotNull(x)
Arguments:
x
— The original value.
Returned values
The original value from the non-
Nullable
type, if it is notNULL
.Implementation specific result if the original value was
NULL
.
Example
Consider the t_null
table.
SHOW CREATE TABLE t_null;
┌─statement─────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t_null ( x Int8, y Nullable(Int8)) ENGINE = TinyLog │
└───────────────────────────────────────────────────────────────────────────┘
┌─x─┬────y─┐
│ 1 │ ᴺᵁᴸᴸ │
│ 2 │ 3 │
└───┴──────┘
Apply the assumeNotNull
function to the y
column.
SELECT assumeNotNull(y) FROM t_null;
┌─assumeNotNull(y)─┐
│ 0 │
│ 3 │
└──────────────────┘
SELECT toTypeName(assumeNotNull(y)) FROM t_null;
┌─toTypeName(assumeNotNull(y))─┐
│ Int8 │
│ Int8 │
└──────────────────────────────┘
toNullable
Converts the argument type to Nullable
.
toNullable(x)
Arguments:
x
— The value of any non-compound type.
Returned value
The input value with a
Nullable
type.
Example
SELECT toTypeName(10);
┌─toTypeName(10)─┐
│ UInt8 │
└────────────────┘
SELECT toTypeName(toNullable(10));
┌─toTypeName(toNullable(10))─┐
│ Nullable(UInt8) │
└────────────────────────────┘
OTHERS
hostName()
Returns a string with the name of the host that this function was performed on. For distributed processing, this is the name of the remote server host, if the function is performed on a remote server. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
getMacro
Gets a named value from the macros section of the server configuration.
Syntax
getMacro(name);
Arguments
name
— Name to retrieve from themacros
section. String.
Returned value
Value of the specified macro.
Type: String.
Example
The example macros
section in the server configuration file:
<macros>
<test>Value</test>
</macros>
Query:
SELECT getMacro('test');
Result:
┌─getMacro('test')─┐
│ Value │
└──────────────────┘
An alternative way to get the same value:
SELECT * FROM system.macros
WHERE macro = 'test';
┌─macro─┬─substitution─┐
│ test │ Value │
└───────┴──────────────┘
FQDN
Returns the fully qualified domain name.
Syntax
fqdn();
This function is case-insensitive.
Returned value
String with the fully qualified domain name.
Type: String
.
Example
Query:
SELECT FQDN();
Result:
┌─FQDN()──────────────────────────┐
│ clickhouse.ru-central1.internal │
└─────────────────────────────────┘
basename
Extracts the trailing part of a string after the last slash or backslash. This function if often used to extract the filename from a path.
basename( expr )
Arguments
expr
— Expression resulting in a String type value. All the backslashes must be escaped in the resulting value.
Returned Value
A string that contains:
The trailing part of a string after the last slash or backslash.
If the input string contains a path ending with slash or backslash, for example, `/` or `c:\`, the function returns an empty string.
The original string if there are no slashes or backslashes.
Example
SELECT 'some/long/path/to/file' AS a, basename(a)
┌─a──────────────────────┬─basename('some\\long\\path\\to\\file')─┐
│ some\long\path\to\file │ file │
└────────────────────────┴────────────────────────────────────────┘
SELECT 'some\\long\\path\\to\\file' AS a, basename(a)
┌─a──────────────────────┬─basename('some\\long\\path\\to\\file')─┐
│ some\long\path\to\file │ file │
└────────────────────────┴────────────────────────────────────────┘
SELECT 'some-file-name' AS a, basename(a)
┌─a──────────────┬─basename('some-file-name')─┐
│ some-file-name │ some-file-name │
└────────────────┴────────────────────────────┘
visibleWidth(x)
Calculates the approximate width when outputting values to the console in text format (tab-separated). This function is used by the system for implementing Pretty formats.
NULL
is represented as a string corresponding to NULL
in Pretty
formats.
SELECT visibleWidth(NULL)
┌─visibleWidth(NULL)─┐
│ 4 │
└────────────────────┘
toTypeName(x)
Returns a string containing the type name of the passed argument.
If NULL
is passed to the function as input, then it returns the Nullable(Nothing)
type, which corresponds to an internal NULL
representation in ClickHouse.
blockSize()
Gets the size of the block. In ClickHouse, queries are always run on blocks (sets of column parts). This function allows getting the size of the block that you called it for.
byteSize
Returns estimation of uncompressed byte size of its arguments in memory.
Syntax
byteSize(argument [, ...])
Arguments
argument
— Value.
Returned value
Estimation of byte size of the arguments in memory.
Type: UInt64.
Examples
For String arguments the funtion returns the string length + 9 (terminating zero + length).
Query:
SELECT byteSize('string');
Result:
┌─byteSize('string')─┐
│ 15 │
└────────────────────┘
Query:
CREATE TABLE test
(
`key` Int32,
`u8` UInt8,
`u16` UInt16,
`u32` UInt32,
`u64` UInt64,
`i8` Int8,
`i16` Int16,
`i32` Int32,
`i64` Int64,
`f32` Float32,
`f64` Float64
)
ENGINE = MergeTree
ORDER BY key;
INSERT INTO test VALUES(1, 8, 16, 32, 64, -8, -16, -32, -64, 32.32, 64.64);
SELECT key, byteSize(u8) AS `byteSize(UInt8)`, byteSize(u16) AS `byteSize(UInt16)`, byteSize(u32) AS `byteSize(UInt32)`, byteSize(u64) AS `byteSize(UInt64)`, byteSize(i8) AS `byteSize(Int8)`, byteSize(i16) AS `byteSize(Int16)`, byteSize(i32) AS `byteSize(Int32)`, byteSize(i64) AS `byteSize(Int64)`, byteSize(f32) AS `byteSize(Float32)`, byteSize(f64) AS `byteSize(Float64)` FROM test ORDER BY key ASC FORMAT Vertical;
Result:
Row 1:
──────
key: 1
byteSize(UInt8): 1
byteSize(UInt16): 2
byteSize(UInt32): 4
byteSize(UInt64): 8
byteSize(Int8): 1
byteSize(Int16): 2
byteSize(Int32): 4
byteSize(Int64): 8
byteSize(Float32): 4
byteSize(Float64): 8
If the function takes multiple arguments, it returns their combined byte size.
Query:
SELECT byteSize(NULL, 1, 0.3, '');
Result:
┌─byteSize(NULL, 1, 0.3, '')─┐
│ 19 │
└────────────────────────────┘
materialize(x)
Turns a constant into a full column containing just one value. In ClickHouse, full columns and constants are represented differently in memory. Functions work differently for constant arguments and normal arguments (different code is executed), although the result is almost always the same. This function is for debugging this behavior.
ignore(…)
Accepts any arguments, including NULL
. Always returns 0. However, the argument is still evaluated. This can be used for benchmarks.
sleep(seconds)
Sleeps ‘seconds’ seconds on each data block. You can specify an integer or a floating-point number.
sleepEachRow(seconds)
Sleeps ‘seconds’ seconds on each row. You can specify an integer or a floating-point number.
currentDatabase()
Returns the name of the current database. You can use this function in table engine parameters in a CREATE TABLE query where you need to specify the database.
currentUser()
Returns the login of current user. Login of user, that initiated query, will be returned in case distibuted query.
SELECT currentUser();
Alias: user()
, USER()
.
Returned values
Login of current user.
Login of user that initiated query in case of disributed query.
Type: String
.
Example
Query:
SELECT currentUser();
Result:
┌─currentUser()─┐
│ default │
└───────────────┘
isConstant
Checks whether the argument is a constant expression.
A constant expression means an expression whose resulting value is known at the query analysis (i.e. before execution). For example, expressions over literals are constant expressions.
The function is intended for development, debugging and demonstration.
Syntax
isConstant(x)
Arguments
x
— Expression to check.
Returned values
1
—x
is constant.0
—x
is non-constant.
Type: UInt8.
Examples
Query:
SELECT isConstant(x + 1) FROM (SELECT 43 AS x)
Result:
┌─isConstant(plus(x, 1))─┐
│ 1 │
└────────────────────────┘
Query:
WITH 3.14 AS pi SELECT isConstant(cos(pi))
Result:
┌─isConstant(cos(pi))─┐
│ 1 │
└─────────────────────┘
Query:
SELECT isConstant(number) FROM numbers(1)
Result:
┌─isConstant(number)─┐
│ 0 │
└────────────────────┘
isFinite(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is not infinite and not a NaN, otherwise 0.
isInfinite(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is infinite, otherwise 0. Note that 0 is returned for a NaN.
ifNotFinite
Checks whether floating point value is finite.
Syntax
ifNotFinite(x,y)
Arguments
Returned value
x
ifx
is finite.y
ifx
is not finite.
Example
Query:
SELECT 1/0 as infimum, ifNotFinite(infimum,42)
Result:
┌─infimum─┬─ifNotFinite(divide(1, 0), 42)─┐
│ inf │ 42 │
└─────────┴───────────────────────────────┘
You can get similar result by using ternary operator: isFinite(x) ? x : y
.
isNaN(x)
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is a NaN, otherwise 0.
hasColumnInTable([‘hostname’[, ‘username’[, ‘password’]],] ‘database’, ‘table’, ‘column’)
Accepts constant strings: database name, table name, and column name. Returns a UInt8 constant expression equal to 1 if there is a column, otherwise 0. If the hostname parameter is set, the test will run on a remote server. The function throws an exception if the table does not exist. For elements in a nested data structure, the function checks for the existence of a column. For the nested data structure itself, the function returns 0.
bar
Allows building a unicode-art diagram.
bar(x, min, max, width)
draws a band with a width proportional to (x - min)
and equal to width
characters when x = max
.
Arguments
x
— Size to display.min, max
— Integer constants. The value must fit inInt64
.width
— Constant, positive integer, can be fractional.
The band is drawn with accuracy to one eighth of a symbol.
Example:
SELECT
toHour(EventTime) AS h,
count() AS c,
bar(c, 0, 600000, 20) AS bar
FROM test.hits
GROUP BY h
ORDER BY h ASC
┌──h─┬──────c─┬─bar────────────────┐
│ 0 │ 292907 │ █████████▋ │
│ 1 │ 180563 │ ██████ │
│ 2 │ 114861 │ ███▋ │
│ 3 │ 85069 │ ██▋ │
│ 4 │ 68543 │ ██▎ │
│ 5 │ 78116 │ ██▌ │
│ 6 │ 113474 │ ███▋ │
│ 7 │ 170678 │ █████▋ │
│ 8 │ 278380 │ █████████▎ │
│ 9 │ 391053 │ █████████████ │
│ 10 │ 457681 │ ███████████████▎ │
│ 11 │ 493667 │ ████████████████▍ │
│ 12 │ 509641 │ ████████████████▊ │
│ 13 │ 522947 │ █████████████████▍ │
│ 14 │ 539954 │ █████████████████▊ │
│ 15 │ 528460 │ █████████████████▌ │
│ 16 │ 539201 │ █████████████████▊ │
│ 17 │ 523539 │ █████████████████▍ │
│ 18 │ 506467 │ ████████████████▊ │
│ 19 │ 520915 │ █████████████████▎ │
│ 20 │ 521665 │ █████████████████▍ │
│ 21 │ 542078 │ ██████████████████ │
│ 22 │ 493642 │ ████████████████▍ │
│ 23 │ 400397 │ █████████████▎ │
└────┴────────┴────────────────────┘
transform
Transforms a value according to the explicitly defined mapping of some elements to other ones. There are two variations of this function:
transform(x, array_from, array_to, default)
x
– What to transform.
array_from
– Constant array of values for converting.
array_to
– Constant array of values to convert the values in ‘from’ to.
default
– Which value to use if ‘x’ is not equal to any of the values in ‘from’.
array_from
and array_to
– Arrays of the same size.
Types:
transform(T, Array(T), Array(U), U) -> U
T
and U
can be numeric, string, or Date or DateTime types. Where the same letter is indicated (T or U), for numeric types these might not be matching types, but types that have a common type. For example, the first argument can have the Int64 type, while the second has the Array(UInt16) type.
If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the existing element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘default’. If there are multiple matching elements in ‘array_from’, it returns one of the matches.
Example:
SELECT
transform(SearchEngineID, [2, 3], ['Yandex', 'Google'], 'Other') AS title,
count() AS c
FROM test.hits
WHERE SearchEngineID != 0
GROUP BY title
ORDER BY c DESC
┌─title─────┬──────c─┐
│ Yandex │ 498635 │
│ Google │ 229872 │
│ Other │ 104472 │
└───────────┴────────┘
transform(x, array_from, array_to)
Differs from the first variation in that the ‘default’ argument is omitted. If the ‘x’ value is equal to one of the elements in the ‘array_from’ array, it returns the matching element (that is numbered the same) from the ‘array_to’ array. Otherwise, it returns ‘x’.
Types:
transform(T, Array(T), Array(T)) -> T
Example:
SELECT
transform(domain(Referer), ['yandex.ru', 'google.ru', 'vkontakte.ru'], ['www.yandex', 'example.com', 'vk.com']) AS s,
count() AS c
FROM test.hits
GROUP BY domain(Referer)
ORDER BY count() DESC
LIMIT 10
┌─s──────────────┬───────c─┐
│ │ 2906259 │
│ www.yandex │ 867767 │
│ ███████.ru │ 313599 │
│ mail.yandex.ru │ 107147 │
│ ██████.ru │ 100355 │
│ █████████.ru │ 65040 │
│ news.yandex.ru │ 64515 │
│ ██████.net │ 59141 │
│ example.com │ 57316 │
└────────────────┴─────────┘
formatReadableDecimalSize(x)
Accepts the size (number of bytes). Returns a rounded size with a suffix (KB, MB, etc.) as a string.
Example:
SELECT
arrayJoin([1, 1024, 1024*1024, 192851925]) AS filesize_bytes,
formatReadableDecimalSize(filesize_bytes) AS filesize
┌─filesize_bytes─┬─filesize───┐
│ 1 │ 1.00 B │
│ 1024 │ 1.02 KB │
│ 1048576 │ 1.05 MB │
│ 192851925 │ 192.85 MB │
└────────────────┴────────────┘
formatReadableSize(x)
Accepts the size (number of bytes). Returns a rounded size with a suffix (KiB, MiB, etc.) as a string.
Example:
SELECT
arrayJoin([1, 1024, 1024*1024, 192851925]) AS filesize_bytes,
formatReadableSize(filesize_bytes) AS filesize
┌─filesize_bytes─┬─filesize───┐
│ 1 │ 1.00 B │
│ 1024 │ 1.00 KiB │
│ 1048576 │ 1.00 MiB │
│ 192851925 │ 183.92 MiB │
└────────────────┴────────────┘
formatReadableQuantity(x)
Accepts the number. Returns a rounded number with a suffix (thousand, million, billion, etc.) as a string.
It is useful for reading big numbers by human.
Example:
SELECT
arrayJoin([1024, 1234 * 1000, (4567 * 1000) * 1000, 98765432101234]) AS number,
formatReadableQuantity(number) AS number_for_humans
┌─────────number─┬─number_for_humans─┐
│ 1024 │ 1.02 thousand │
│ 1234000 │ 1.23 million │
│ 4567000000 │ 4.57 billion │
│ 98765432101234 │ 98.77 trillion │
└────────────────┴───────────────────┘
formatReadableTimeDelta
Accepts the time delta in seconds. Returns a time delta with (year, month, day, hour, minute, second) as a string.
Syntax
formatReadableTimeDelta(column[, maximum_unit])
Arguments
column
— A column with numeric time delta.maximum_unit
— Optional. Maximum unit to show. Acceptable values seconds, minutes, hours, days, months, years.
Example:
SELECT
arrayJoin([100, 12345, 432546534]) AS elapsed,
formatReadableTimeDelta(elapsed) AS time_delta
┌────elapsed─┬─time_delta ─────────────────────────────────────────────────────┐
│ 100 │ 1 minute and 40 seconds │
│ 12345 │ 3 hours, 25 minutes and 45 seconds │
│ 432546534 │ 13 years, 8 months, 17 days, 7 hours, 48 minutes and 54 seconds │
└────────────┴─────────────────────────────────────────────────────────────────┘
SELECT
arrayJoin([100, 12345, 432546534]) AS elapsed,
formatReadableTimeDelta(elapsed, 'minutes') AS time_delta
┌────elapsed─┬─time_delta ─────────────────────────────────────────────────────┐
│ 100 │ 1 minute and 40 seconds │
│ 12345 │ 205 minutes and 45 seconds │
│ 432546534 │ 7209108 minutes and 54 seconds │
└────────────┴─────────────────────────────────────────────────────────────────┘
least(a, b)
Returns the smallest value from a and b.
greatest(a, b)
Returns the largest value of a and b.
uptime()
Returns the server’s uptime in seconds. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
version()
Returns the version of the server as a string. If it is executed in the context of a distributed table, then it generates a normal column with values relevant to each shard. Otherwise it produces a constant value.
blockNumber
Returns the sequence number of the data block where the row is located.
rowNumberInBlock
Returns the ordinal number of the row in the data block. Different data blocks are always recalculated.
rowNumberInAllBlocks()
Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
neighbor
The window function that provides access to a row at a specified offset which comes before or after the current row of a given column.
Syntax
neighbor(column, offset[, default_value])
The result of the function depends on the affected data blocks and the order of data in the block.
WARNING
It can reach the neighbor rows only inside the currently processed data block.
The rows order used during the calculation of neighbor
can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
Arguments
column
— A column name or scalar expression.offset
— The number of rows forwards or backwards from the current row ofcolumn
. Int64.default_value
— Optional. The value to be returned if offset goes beyond the scope of the block. Type of data blocks affected.
Returned values
Value for
column
inoffset
distance from current row ifoffset
value is not outside block bounds.Default value for
column
ifoffset
value is outside block bounds. Ifdefault_value
is given, then it will be used.
Type: type of data blocks affected or default value type.
Example
Query:
SELECT number, neighbor(number, 2) FROM system.numbers LIMIT 10;
Result:
┌─number─┬─neighbor(number, 2)─┐
│ 0 │ 2 │
│ 1 │ 3 │
│ 2 │ 4 │
│ 3 │ 5 │
│ 4 │ 6 │
│ 5 │ 7 │
│ 6 │ 8 │
│ 7 │ 9 │
│ 8 │ 0 │
│ 9 │ 0 │
└────────┴─────────────────────┘
Query:
SELECT number, neighbor(number, 2, 999) FROM system.numbers LIMIT 10;
Result:
┌─number─┬─neighbor(number, 2, 999)─┐
│ 0 │ 2 │
│ 1 │ 3 │
│ 2 │ 4 │
│ 3 │ 5 │
│ 4 │ 6 │
│ 5 │ 7 │
│ 6 │ 8 │
│ 7 │ 9 │
│ 8 │ 999 │
│ 9 │ 999 │
└────────┴──────────────────────────┘
This function can be used to compute year-over-year metric value:
Query:
WITH toDate('2018-01-01') AS start_date
SELECT
toStartOfMonth(start_date + (number * 32)) AS month,
toInt32(month) % 100 AS money,
neighbor(money, -12) AS prev_year,
round(prev_year / money, 2) AS year_over_year
FROM numbers(16)
Result:
┌──────month─┬─money─┬─prev_year─┬─year_over_year─┐
│ 2018-01-01 │ 32 │ 0 │ 0 │
│ 2018-02-01 │ 63 │ 0 │ 0 │
│ 2018-03-01 │ 91 │ 0 │ 0 │
│ 2018-04-01 │ 22 │ 0 │ 0 │
│ 2018-05-01 │ 52 │ 0 │ 0 │
│ 2018-06-01 │ 83 │ 0 │ 0 │
│ 2018-07-01 │ 13 │ 0 │ 0 │
│ 2018-08-01 │ 44 │ 0 │ 0 │
│ 2018-09-01 │ 75 │ 0 │ 0 │
│ 2018-10-01 │ 5 │ 0 │ 0 │
│ 2018-11-01 │ 36 │ 0 │ 0 │
│ 2018-12-01 │ 66 │ 0 │ 0 │
│ 2019-01-01 │ 97 │ 32 │ 0.33 │
│ 2019-02-01 │ 28 │ 63 │ 2.25 │
│ 2019-03-01 │ 56 │ 91 │ 1.62 │
│ 2019-04-01 │ 87 │ 22 │ 0.25 │
└────────────┴───────┴───────────┴────────────────┘
runningDifference(x)
Calculates the difference between successive row values in the data block. Returns 0 for the first row and the difference from the previous row for each subsequent row.
WARNING
It can reach the previous row only inside the currently processed data block.
The result of the function depends on the affected data blocks and the order of data in the block.
The rows order used during the calculation of runningDifference
can differ from the order of rows returned to the user. To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
Example:
SELECT
EventID,
EventTime,
runningDifference(EventTime) AS delta
FROM
(
SELECT
EventID,
EventTime
FROM events
WHERE EventDate = '2016-11-24'
ORDER BY EventTime ASC
LIMIT 5
)
┌─EventID─┬───────────EventTime─┬─delta─┐
│ 1106 │ 2016-11-24 00:00:04 │ 0 │
│ 1107 │ 2016-11-24 00:00:05 │ 1 │
│ 1108 │ 2016-11-24 00:00:05 │ 0 │
│ 1109 │ 2016-11-24 00:00:09 │ 4 │
│ 1110 │ 2016-11-24 00:00:10 │ 1 │
└─────────┴─────────────────────┴───────┘
Please note - block size affects the result. With each new block, the runningDifference
state is reset.
SELECT
number,
runningDifference(number + 1) AS diff
FROM numbers(100000)
WHERE diff != 1
┌─number─┬─diff─┐
│ 0 │ 0 │
└────────┴──────┘
┌─number─┬─diff─┐
│ 65536 │ 0 │
└────────┴──────┘
set max_block_size=100000 -- default value is 65536!
SELECT
number,
runningDifference(number + 1) AS diff
FROM numbers(100000)
WHERE diff != 1
┌─number─┬─diff─┐
│ 0 │ 0 │
└────────┴──────┘
runningDifferenceStartingWithFirstValue
Same as for runningDifference, the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
runningConcurrency
Calculates the number of concurrent events. Each event has a start time and an end time. The start time is included in the event, while the end time is excluded. Columns with a start time and an end time must be of the same data type. The function calculates the total number of active (concurrent) events for each event start time.
WARNING
Events must be ordered by the start time in ascending order. If this requirement is violated the function raises an exception. Every data block is processed separately. If events from different data blocks overlap then they can not be processed correctly.
Syntax
runningConcurrency(start, end)
Arguments
start
— A column with the start time of events. Date, DateTime, or DateTime64.end
— A column with the end time of events. Date, DateTime, or DateTime64.
Returned values
The number of concurrent events at each event start time.
Type: UInt32
Example
Consider the table:
┌──────start─┬────────end─┐
│ 2021-03-03 │ 2021-03-11 │
│ 2021-03-06 │ 2021-03-12 │
│ 2021-03-07 │ 2021-03-08 │
│ 2021-03-11 │ 2021-03-12 │
└────────────┴────────────┘
Query:
SELECT start, runningConcurrency(start, end) FROM example_table;
Result:
┌──────start─┬─runningConcurrency(start, end)─┐
│ 2021-03-03 │ 1 │
│ 2021-03-06 │ 2 │
│ 2021-03-07 │ 3 │
│ 2021-03-11 │ 2 │
└────────────┴────────────────────────────────┘
MACNumToString(num)
Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).
MACStringToNum(s)
The inverse function of MACNumToString. If the MAC address has an invalid format, it returns 0.
MACStringToOUI(s)
Accepts a MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Returns the first three octets as a UInt64 number. If the MAC address has an invalid format, it returns 0.
getSizeOfEnumType
Returns the number of fields in Enum.
getSizeOfEnumType(value)
Arguments:
value
— Value of typeEnum
.
Returned values
The number of fields with
Enum
input values.An exception is thrown if the type is not
Enum
.
Example
SELECT getSizeOfEnumType( CAST('a' AS Enum8('a' = 1, 'b' = 2) ) ) AS x
┌─x─┐
│ 2 │
└───┘
blockSerializedSize
Returns size on disk (without taking into account compression).
blockSerializedSize(value[, value[, ...]])
Arguments
value
— Any value.
Returned values
The number of bytes that will be written to disk for block of values (without compression).
Example
Query:
SELECT blockSerializedSize(maxState(1)) as x
Result:
┌─x─┐
│ 2 │
└───┘
toColumnTypeName
Returns the name of the class that represents the data type of the column in RAM.
toColumnTypeName(value)
Arguments:
value
— Any type of value.
Returned values
A string with the name of the class that is used for representing the
value
data type in RAM.
Example of the difference betweentoTypeName ' and ' toColumnTypeName
SELECT toTypeName(CAST('2018-01-01 01:02:03' AS DateTime))
┌─toTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ DateTime │
└─────────────────────────────────────────────────────┘
SELECT toColumnTypeName(CAST('2018-01-01 01:02:03' AS DateTime))
┌─toColumnTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ Const(UInt32) │
└───────────────────────────────────────────────────────────┘
The example shows that the DateTime
data type is stored in memory as Const(UInt32)
.
dumpColumnStructure
Outputs a detailed description of data structures in RAM
dumpColumnStructure(value)
Arguments:
value
— Any type of value.
Returned values
A string describing the structure that is used for representing the
value
data type in RAM.
Example
SELECT dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))
┌─dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
│ DateTime, Const(size = 1, UInt32(size = 1)) │
└──────────────────────────────────────────────────────────────┘
defaultValueOfArgumentType
Outputs the default value for the data type.
Does not include default values for custom columns set by the user.
defaultValueOfArgumentType(expression)
Arguments:
expression
— Arbitrary type of value or an expression that results in a value of an arbitrary type.
Returned values
0
for numbers.Empty string for strings.
ᴺᵁᴸᴸ
for Nullable.
Example
SELECT defaultValueOfArgumentType( CAST(1 AS Int8) )
┌─defaultValueOfArgumentType(CAST(1, 'Int8'))─┐
│ 0 │
└─────────────────────────────────────────────┘
SELECT defaultValueOfArgumentType( CAST(1 AS Nullable(Int8) ) )
┌─defaultValueOfArgumentType(CAST(1, 'Nullable(Int8)'))─┐
│ ᴺᵁᴸᴸ │
└───────────────────────────────────────────────────────┘
defaultValueOfTypeName
Outputs the default value for given type name.
Does not include default values for custom columns set by the user.
defaultValueOfTypeName(type)
Arguments:
type
— A string representing a type name.
Returned values
0
for numbers.Empty string for strings.
ᴺᵁᴸᴸ
for Nullable.
Example
SELECT defaultValueOfTypeName('Int8')
┌─defaultValueOfTypeName('Int8')─┐
│ 0 │
└────────────────────────────────┘
SELECT defaultValueOfTypeName('Nullable(Int8)')
┌─defaultValueOfTypeName('Nullable(Int8)')─┐
│ ᴺᵁᴸᴸ │
└──────────────────────────────────────────┘
indexHint
The function is intended for debugging and introspection purposes. The function ignores it's argument and always returns 1. Arguments are not even evaluated.
But for the purpose of index analysis, the argument of this function is analyzed as if it was present directly without being wrapped inside indexHint
function. This allows to select data in index ranges by the corresponding condition but without further filtering by this condition. The index in ClickHouse is sparse and using indexHint
will yield more data than specifying the same condition directly.
Syntax
SELECT * FROM table WHERE indexHint(<expression>)
Returned value
Type: Uint8.
Example
Here is the example of test data from the table ontime.
Input table:
SELECT count() FROM ontime
┌─count()─┐
│ 4276457 │
└─────────┘
The table has indexes on the fields (FlightDate, (Year, FlightDate))
.
Create a query, where the index is not used.
Query:
SELECT FlightDate AS k, count() FROM ontime GROUP BY k ORDER BY k
ClickHouse processed the entire table (Processed 4.28 million rows
).
Result:
┌──────────k─┬─count()─┐
│ 2017-01-01 │ 13970 │
│ 2017-01-02 │ 15882 │
........................
│ 2017-09-28 │ 16411 │
│ 2017-09-29 │ 16384 │
│ 2017-09-30 │ 12520 │
└────────────┴─────────┘
To apply the index, select a specific date.
Query:
SELECT FlightDate AS k, count() FROM ontime WHERE k = '2017-09-15' GROUP BY k ORDER BY k
By using the index, ClickHouse processed a significantly smaller number of rows (Processed 32.74 thousand rows
).
Result:
┌──────────k─┬─count()─┐
│ 2017-09-15 │ 16428 │
└────────────┴─────────┘
Now wrap the expression k = '2017-09-15'
into indexHint
function.
Query:
SELECT
FlightDate AS k,
count()
FROM ontime
WHERE indexHint(k = '2017-09-15')
GROUP BY k
ORDER BY k ASC
ClickHouse used the index in the same way as the previous time (Processed 32.74 thousand rows
). The expression k = '2017-09-15'
was not used when generating the result. In examle the indexHint
function allows to see adjacent dates.
Result:
┌──────────k─┬─count()─┐
│ 2017-09-14 │ 7071 │
│ 2017-09-15 │ 16428 │
│ 2017-09-16 │ 1077 │
│ 2017-09-30 │ 8167 │
└────────────┴─────────┘
replicate
Creates an array with a single value.
Used for internal implementation of arrayJoin.
SELECT replicate(x, arr);
Arguments:
arr
— Original array. ClickHouse creates a new array of the same length as the original and fills it with the valuex
.x
— The value that the resulting array will be filled with.
Returned value
An array filled with the value x
.
Type: Array
.
Example
Query:
SELECT replicate(1, ['a', 'b', 'c'])
Result:
┌─replicate(1, ['a', 'b', 'c'])─┐
│ [1,1,1] │
└───────────────────────────────┘
filesystemAvailable
Returns amount of remaining space on the filesystem where the files of the databases located. It is always smaller than total free space (filesystemFree) because some space is reserved for OS.
Syntax
filesystemAvailable()
Returned value
The amount of remaining space available in bytes.
Type: UInt64.
Example
Query:
SELECT formatReadableSize(filesystemAvailable()) AS "Available space", toTypeName(filesystemAvailable()) AS "Type";
Result:
┌─Available space─┬─Type───┐
│ 30.75 GiB │ UInt64 │
└─────────────────┴────────┘
filesystemFree
Returns total amount of the free space on the filesystem where the files of the databases located. See also filesystemAvailable
Syntax
filesystemFree()
Returned value
Amount of free space in bytes.
Type: UInt64.
Example
Query:
SELECT formatReadableSize(filesystemFree()) AS "Free space", toTypeName(filesystemFree()) AS "Type";
Result:
┌─Free space─┬─Type───┐
│ 32.39 GiB │ UInt64 │
└────────────┴────────┘
filesystemCapacity
Returns the capacity of the filesystem in bytes. For evaluation, the path to the data directory must be configured.
Syntax
filesystemCapacity()
Returned value
Capacity information of the filesystem in bytes.
Type: UInt64.
Example
Query:
SELECT formatReadableSize(filesystemCapacity()) AS "Capacity", toTypeName(filesystemCapacity()) AS "Type"
Result:
┌─Capacity──┬─Type───┐
│ 39.32 GiB │ UInt64 │
└───────────┴────────┘
initializeAggregation
Calculates result of aggregate function based on single value. It is intended to use this function to initialize aggregate functions with combinator -State. You can create states of aggregate functions and insert them to columns of type AggregateFunction or use initialized aggregates as default values.
Syntax
initializeAggregation (aggregate_function, arg1, arg2, ..., argN)
Arguments
aggregate_function
— Name of the aggregation function to initialize. String.arg
— Arguments of aggregate function.
Returned value(s)
Result of aggregation for every row passed to the function.
The return type is the same as the return type of function, that initializeAgregation
takes as first argument.
Example
Query:
SELECT uniqMerge(state) FROM (SELECT initializeAggregation('uniqState', number % 3) AS state FROM numbers(10000));
Result:
┌─uniqMerge(state)─┐
│ 3 │
└──────────────────┘
Query:
SELECT finalizeAggregation(state), toTypeName(state) FROM (SELECT initializeAggregation('sumState', number % 3) AS state FROM numbers(5));
Result:
┌─finalizeAggregation(state)─┬─toTypeName(state)─────────────┐
│ 0 │ AggregateFunction(sum, UInt8) │
│ 1 │ AggregateFunction(sum, UInt8) │
│ 2 │ AggregateFunction(sum, UInt8) │
│ 0 │ AggregateFunction(sum, UInt8) │
│ 1 │ AggregateFunction(sum, UInt8) │
└────────────────────────────┴───────────────────────────────┘
Example with AggregatingMergeTree
table engine and AggregateFunction
column:
CREATE TABLE metrics
(
key UInt64,
value AggregateFunction(sum, UInt64) DEFAULT initializeAggregation('sumState', toUInt64(0))
)
ENGINE = AggregatingMergeTree
ORDER BY key
INSERT INTO metrics VALUES (0, initializeAggregation('sumState', toUInt64(42)))
See Also
finalizeAggregation
Takes state of aggregate function. Returns result of aggregation (or finalized state when using-State combinator).
Syntax
finalizeAggregation(state)
Arguments
state
— State of aggregation. AggregateFunction.
Returned value(s)
Value/values that was aggregated.
Type: Value of any types that was aggregated.
Examples
Query:
SELECT finalizeAggregation(( SELECT countState(number) FROM numbers(10)));
Result:
┌─finalizeAggregation(_subquery16)─┐
│ 10 │
└──────────────────────────────────┘
Query:
SELECT finalizeAggregation(( SELECT sumState(number) FROM numbers(10)));
Result:
┌─finalizeAggregation(_subquery20)─┐
│ 45 │
└──────────────────────────────────┘
Note that NULL
values are ignored.
Query:
SELECT finalizeAggregation(arrayReduce('anyState', [NULL, 2, 3]));
Result:
┌─finalizeAggregation(arrayReduce('anyState', [NULL, 2, 3]))─┐
│ 2 │
└────────────────────────────────────────────────────────────┘
Combined example:
Query:
WITH initializeAggregation('sumState', number) AS one_row_sum_state
SELECT
number,
finalizeAggregation(one_row_sum_state) AS one_row_sum,
runningAccumulate(one_row_sum_state) AS cumulative_sum
FROM numbers(10);
Result:
┌─number─┬─one_row_sum─┬─cumulative_sum─┐
│ 0 │ 0 │ 0 │
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 6 │
│ 4 │ 4 │ 10 │
│ 5 │ 5 │ 15 │
│ 6 │ 6 │ 21 │
│ 7 │ 7 │ 28 │
│ 8 │ 8 │ 36 │
│ 9 │ 9 │ 45 │
└────────┴─────────────┴────────────────┘
See Also
runningAccumulate
Accumulates states of an aggregate function for each row of a data block.
WARNING
The state is reset for each new data block.
Syntax
runningAccumulate(agg_state[, grouping]);
Arguments
agg_state
— State of the aggregate function. AggregateFunction.grouping
— Grouping key. Optional. The state of the function is reset if thegrouping
value is changed. It can be any of the supported data types for which the equality operator is defined.
Returned value
Each resulting row contains a result of the aggregate function, accumulated for all the input rows from 0 to the current position.
runningAccumulate
resets states for each new data block or when thegrouping
value changes.
Type depends on the aggregate function used.
Examples
Consider how you can use runningAccumulate
to find the cumulative sum of numbers without and with grouping.
Query:
SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k);
Result:
┌─k─┬─res─┐
│ 0 │ 0 │
│ 1 │ 1 │
│ 2 │ 3 │
│ 3 │ 6 │
│ 4 │ 10 │
│ 5 │ 15 │
│ 6 │ 21 │
│ 7 │ 28 │
│ 8 │ 36 │
│ 9 │ 45 │
└───┴─────┘
The subquery generates sumState
for every number from 0
to 9
. sumState
returns the state of the sum function that contains the sum of a single number.
The whole query does the following:
For the first row,
runningAccumulate
takessumState(0)
and returns0
.For the second row, the function merges
sumState(0)
andsumState(1)
resulting insumState(0 + 1)
, and returns1
as a result.For the third row, the function merges
sumState(0 + 1)
andsumState(2)
resulting insumState(0 + 1 + 2)
, and returns3
as a result.The actions are repeated until the block ends.
The following example shows the groupping
parameter usage:
Query:
SELECT
grouping,
item,
runningAccumulate(state, grouping) AS res
FROM
(
SELECT
toInt8(number / 4) AS grouping,
number AS item,
sumState(number) AS state
FROM numbers(15)
GROUP BY item
ORDER BY item ASC
);
Result:
┌─grouping─┬─item─┬─res─┐
│ 0 │ 0 │ 0 │
│ 0 │ 1 │ 1 │
│ 0 │ 2 │ 3 │
│ 0 │ 3 │ 6 │
│ 1 │ 4 │ 4 │
│ 1 │ 5 │ 9 │
│ 1 │ 6 │ 15 │
│ 1 │ 7 │ 22 │
│ 2 │ 8 │ 8 │
│ 2 │ 9 │ 17 │
│ 2 │ 10 │ 27 │
│ 2 │ 11 │ 38 │
│ 3 │ 12 │ 12 │
│ 3 │ 13 │ 25 │
│ 3 │ 14 │ 39 │
└──────────┴──────┴─────┘
As you can see, runningAccumulate
merges states for each group of rows separately.
joinGet
The function lets you extract data from the table the same way as from a dictionary.
Gets data from Join tables using the specified join key.
Only supports tables created with the ENGINE = Join(ANY, LEFT, <join_keys>)
statement.
Syntax
joinGet(join_storage_table_name, `value_column`, join_keys)
Arguments
join_storage_table_name
— an identifier indicates where search is performed. The identifier is searched in the default database (see parameterdefault_database
in the config file). To override the default database, use theUSE db_name
or specify the database and the table through the separatordb_name.db_table
, see the example.value_column
— name of the column of the table that contains required data.join_keys
— list of keys.
Returned value
Returns list of values corresponded to list of keys.
If certain does not exist in source table then 0
or null
will be returned based on join_use_nulls setting.
More info about join_use_nulls
in Join operation.
Example
Input table:
CREATE DATABASE db_test
CREATE TABLE db_test.id_val(`id` UInt32, `val` UInt32) ENGINE = Join(ANY, LEFT, id) SETTINGS join_use_nulls = 1
INSERT INTO db_test.id_val VALUES (1,11)(2,12)(4,13)
┌─id─┬─val─┐
│ 4 │ 13 │
│ 2 │ 12 │
│ 1 │ 11 │
└────┴─────┘
Query:
SELECT joinGet(db_test.id_val,'val',toUInt32(number)) from numbers(4) SETTINGS join_use_nulls = 1
Result:
┌─joinGet(db_test.id_val, 'val', toUInt32(number))─┐
│ 0 │
│ 11 │
│ 12 │
│ 0 │
└──────────────────────────────────────────────────┘
catboostEvaluate(path_to_model, feature_1, feature_2, …, feature_n)
Evaluate external catboost model. CatBoost is an open-source gradient boosting library developed by Yandex for machine learing. Accepts a path to a catboost model and model arguments (features). Returns Float64.
SELECT feat1, ..., feat_n, catboostEvaluate('/path/to/model.bin', feat_1, ..., feat_n) AS prediction
FROM data_table
Prerequisites
Build the catboost evaluation library
Before evaluating catboost models, the libcatboostmodel.<so|dylib>
library must be made available. See CatBoost documentation how to compile it.
Next, specify the path to libcatboostmodel.<so|dylib>
in the clickhouse configuration:
<clickhouse>
...
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
...
</clickhouse>
For security and isolation reasons, the model evaluation does not run in the server process but in the clickhouse-library-bridge process. At the first execution of catboostEvaluate()
, the server starts the library bridge process if it is not running already. Both processes communicate using a HTTP interface. By default, port 9012
is used. A different port can be specified as follows - this is useful if port 9012
is already assigned to a different service.
<library_bridge>
<port>9019</port>
</library_bridge>
Train a catboost model using libcatboost
See Training and applying models for how to train catboost models from a training data set.
throwIf(x[, message[, error_code]])
Throw an exception if the argument is non zero. message
- is an optional parameter: a constant string providing a custom error message error_code
- is an optional parameter: a constant integer providing a custom error code
To use the error_code
argument, configuration parameter allow_custom_error_code_in_throwif
must be enabled.
SELECT throwIf(number = 3, 'Too many') FROM numbers(10);
↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) Received exception from server (version 19.14.1):
Code: 395. DB::Exception: Received from localhost:9000. DB::Exception: Too many.
identity
Returns the same value that was used as its argument. Used for debugging and testing, allows to cancel using index, and get the query performance of a full scan. When query is analyzed for possible use of index, the analyzer does not look inside identity
functions. Also constant folding is not applied too.
Syntax
identity(x)
Example
Query:
SELECT identity(42)
Result:
┌─identity(42)─┐
│ 42 │
└──────────────┘
getSetting
Returns the current value of a custom setting.
Syntax
getSetting('custom_setting');
Parameter
custom_setting
— The setting name. String.
Returned value
The setting current value.
Example
SET custom_a = 123;
SELECT getSetting('custom_a');
Result
123
See Also
isDecimalOverflow
Checks whether the Decimal value is out of its (or specified) precision.
Syntax
isDecimalOverflow(d, [p])
Arguments
d
— value. Decimal.p
— precision. Optional. If omitted, the initial precision of the first argument is used. Using of this paratemer could be helpful for data extraction to another DBMS or file. UInt8.
Returned values
1
— Decimal value has more digits then it's precision allow,0
— Decimal value satisfies the specified precision.
Example
Query:
SELECT isDecimalOverflow(toDecimal32(1000000000, 0), 9),
isDecimalOverflow(toDecimal32(1000000000, 0)),
isDecimalOverflow(toDecimal32(-1000000000, 0), 9),
isDecimalOverflow(toDecimal32(-1000000000, 0));
Result:
1 1 1 1
countDigits
Returns number of decimal digits you need to represent the value.
Syntax
countDigits(x)
Arguments
Returned value
Number of digits.
Type: UInt8.
NOTE
For Decimal
values takes into account their scales: calculates result over underlying integer type which is (value * scale)
. For example: countDigits(42) = 2
, countDigits(42.000) = 5
, countDigits(0.04200) = 4
. I.e. you may check decimal overflow for Decimal64
with countDecimal(x) > 18
. It's a slow variant of isDecimalOverflow.
Example
Query:
SELECT countDigits(toDecimal32(1, 9)), countDigits(toDecimal32(-1, 9)),
countDigits(toDecimal64(1, 18)), countDigits(toDecimal64(-1, 18)),
countDigits(toDecimal128(1, 38)), countDigits(toDecimal128(-1, 38));
Result:
10 10 19 19 39 39
errorCodeToName
Returned value
Variable name for the error code.
Type: LowCardinality(String).
Syntax
errorCodeToName(1)
Result:
UNSUPPORTED_METHOD
tcpPort
Returns native interface TCP port number listened by this server. If it is executed in the context of a distributed table, then it generates a normal column, otherwise it produces a constant value.
Syntax
tcpPort()
Arguments
None.
Returned value
The TCP port number.
Type: UInt16.
Example
Query:
SELECT tcpPort();
Result:
┌─tcpPort()─┐
│ 9000 │
└───────────┘
See Also
RANDOM NUMBER AND STRING
All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The only purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.
NOTE
Non-cryptographic generators of pseudo-random numbers are used.
rand, rand32
Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers.
Uses a linear congruential generator.
rand64
Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Uses a linear congruential generator.
randCanonical
The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1).
Non-deterministic. Return type is Float64.
randConstant
Produces a constant column with a random value.
Syntax
randConstant([x])
Arguments
x
— Expression resulting in any of the supported data types. The resulting value is discarded, but the expression itself if used for bypassing common subexpression elimination if the function is called multiple times in one query. Optional parameter.
Returned value
Pseudo-random number.
Type: UInt32.
Example
Query:
SELECT rand(), rand(1), rand(number), randConstant(), randConstant(1), randConstant(number)
FROM numbers(3)
Result:
┌─────rand()─┬────rand(1)─┬─rand(number)─┬─randConstant()─┬─randConstant(1)─┬─randConstant(number)─┐
│ 3047369878 │ 4132449925 │ 4044508545 │ 2740811946 │ 4229401477 │ 1924032898 │
│ 2938880146 │ 1267722397 │ 4154983056 │ 2740811946 │ 4229401477 │ 1924032898 │
│ 956619638 │ 4238287282 │ 1104342490 │ 2740811946 │ 4229401477 │ 1924032898 │
└────────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────────────────┘
randomString
randomFixedString
randomPrintableASCII
randomStringUTF8
fuzzBits
Syntax
fuzzBits([s], [prob])
Inverts bits of s
, each with probability prob
.
Arguments
s
-String
orFixedString
prob
- constantFloat32/64
Returned value Fuzzed string with same as s type.
Example
SELECT fuzzBits(materialize('abacaba'), 0.1)
FROM numbers(3)
Result:
┌─fuzzBits(materialize('abacaba'), 0.1)─┐
│ abaaaja │
│ a*cjab+ │
│ aeca2A │
└───────────────────────────────────────┘
REPLACING IN STRINGS
NOTE
Functions for searching and other manipulations with strings are described separately.
replaceOne(haystack, pattern, replacement)
Replaces the first occurrence of the substring ‘pattern’ (if it exists) in ‘haystack’ by the ‘replacement’ string. ‘pattern’ and ‘replacement’ must be constants.
replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
Replaces all occurrences of the substring ‘pattern’ in ‘haystack’ by the ‘replacement’ string.
replaceRegexpOne(haystack, pattern, replacement)
Replaces the first occurrence of the substring matching the regular expression ‘pattern’ in ‘haystack‘ by the ‘replacement‘ string. ‘pattern‘ must be a constant re2 regular expression. ‘replacement’ must be a plain constant string or a constant string containing substitutions \0-\9
. Substitutions \1-\9
correspond to the 1st to 9th capturing group (submatch), substitution \0
corresponds to the entire match. To use a verbatim \
character in the ‘pattern‘ or ‘replacement‘ string, escape it using \
. Also keep in mind that string literals require an extra escaping.
Example 1. Converting ISO dates to American format:
SELECT DISTINCT
EventDate,
replaceRegexpOne(toString(EventDate), '(\\d{4})-(\\d{2})-(\\d{2})', '\\2/\\3/\\1') AS res
FROM test.hits
LIMIT 7
FORMAT TabSeparated
2014-03-17 03/17/2014
2014-03-18 03/18/2014
2014-03-19 03/19/2014
2014-03-20 03/20/2014
2014-03-21 03/21/2014
2014-03-22 03/22/2014
2014-03-23 03/23/2014
Example 2. Copying a string ten times:
SELECT replaceRegexpOne('Hello, World!', '.*', '\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0') AS res
┌─res────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World! │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
replaceRegexpAll(haystack, pattern, replacement)
Like ‘replaceRegexpOne‘, but replaces all occurrences of the pattern. Example:
SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res
┌─res────────────────────────┐
│ HHeelllloo,, WWoorrlldd!! │
└────────────────────────────┘
As an exception, if a regular expression worked on an empty substring, the replacement is not made more than once. Example:
SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
┌─res─────────────────┐
│ here: Hello, World! │
└─────────────────────┘
regexpQuoteMeta(s)
The function adds a backslash before some predefined characters in the string. Predefined characters: \0
, \\
, |
, (
, )
, ^
, $
, .
, [
, ]
, ?
, *
, +
, {
, :
, -
. This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0
instead of \x00
and it escapes only required characters. For more information, see the link: RE2
ROUNDING
floor(x[, N])
Returns the largest round number that is less than or equal to x
. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N isn’t exact. ‘N’ is an integer constant, optional parameter. By default it is zero, which means to round to an integer. ‘N’ may be negative.
Examples: floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.
x
is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative N
value (for non-negative N
, the function does not do anything). If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned.
ceil(x[, N]), ceiling(x[, N])
Returns the smallest round number that is greater than or equal to x
. In every other way, it is the same as the floor
function (see above).
trunc(x[, N]), truncate(x[, N])
Returns the round number with largest absolute value that has an absolute value less than or equal to x
‘s. In every other way, it is the same as the ’floor’ function (see above).
round(x[, N])
Rounds a value to a specified number of decimal places.
The function returns the nearest number of the specified order. In case when given number has equal distance to surrounding numbers, the function uses banker’s rounding for float number types and rounds away from zero for the other number types (Decimal).
round(expression [, decimal_places])
Arguments
expression
— A number to be rounded. Can be any expression returning the numeric data type.decimal-places
— An integer value.If
decimal-places > 0
then the function rounds the value to the right of the decimal point.If
decimal-places < 0
then the function rounds the value to the left of the decimal point.If
decimal-places = 0
then the function rounds the value to integer. In this case the argument can be omitted.
Returned value:
The rounded number of the same type as the input number.
Examples
Example of use with Float
SELECT number / 2 AS x, round(x) FROM system.numbers LIMIT 3
┌───x─┬─round(divide(number, 2))─┐
│ 0 │ 0 │
│ 0.5 │ 0 │
│ 1 │ 1 │
└─────┴──────────────────────────┘
Example of use with Decimal
SELECT cast(number / 2 AS Decimal(10,4)) AS x, round(x) FROM system.numbers LIMIT 3
┌──────x─┬─round(CAST(divide(number, 2), 'Decimal(10, 4)'))─┐
│ 0.0000 │ 0.0000 │
│ 0.5000 │ 1.0000 │
│ 1.0000 │ 1.0000 │
└────────┴──────────────────────────────────────────────────┘
Examples of rounding
Rounding to the nearest number.
round(3.2, 0) = 3
round(4.1267, 2) = 4.13
round(22,-1) = 20
round(467,-2) = 500
round(-467,-2) = -500
Banker’s rounding.
round(3.5) = 4
round(4.5) = 4
round(3.55, 1) = 3.6
round(3.65, 1) = 3.6
See Also
roundBankers
Rounds a number to a specified decimal position.
If the rounding number is halfway between two numbers, the function uses banker’s rounding.
Banker's rounding is a method of rounding fractional numbers. When the rounding number is halfway between two numbers, it's rounded to the nearest even digit at the specified decimal position. For example: 3.5 rounds up to 4, 2.5 rounds down to 2. It's the default rounding method for floating point numbers defined in [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest). The [round](#rounding_functions-round) function performs the same rounding for floating point numbers. The `roundBankers` function also rounds integers the same way, for example, `roundBankers(45, -1) = 40`.
In other cases, the function rounds numbers to the nearest integer.
Using banker’s rounding, you can reduce the effect that rounding numbers has on the results of summing or subtracting these numbers.
For example, sum numbers 1.5, 2.5, 3.5, 4.5 with different rounding:
No rounding: 1.5 + 2.5 + 3.5 + 4.5 = 12.
Banker’s rounding: 2 + 2 + 4 + 4 = 12.
Rounding to the nearest integer: 2 + 3 + 4 + 5 = 14.
Syntax
roundBankers(expression [, decimal_places])
Arguments
expression
— A number to be rounded. Can be any expression returning the numeric data type.decimal-places
— Decimal places. An integer number.decimal-places > 0
— The function rounds the number to the given position right of the decimal point. Example:roundBankers(3.55, 1) = 3.6
.decimal-places < 0
— The function rounds the number to the given position left of the decimal point. Example:roundBankers(24.55, -1) = 20
.decimal-places = 0
— The function rounds the number to an integer. In this case the argument can be omitted. Example:roundBankers(2.5) = 2
.
Returned value
A value rounded by the banker’s rounding method.
Examples
Example of use
Query:
SELECT number / 2 AS x, roundBankers(x, 0) AS b fROM system.numbers limit 10
Result:
┌───x─┬─b─┐
│ 0 │ 0 │
│ 0.5 │ 0 │
│ 1 │ 1 │
│ 1.5 │ 2 │
│ 2 │ 2 │
│ 2.5 │ 2 │
│ 3 │ 3 │
│ 3.5 │ 4 │
│ 4 │ 4 │
│ 4.5 │ 4 │
└─────┴───┘
Examples of Banker’s rounding
roundBankers(0.4) = 0
roundBankers(-3.5) = -4
roundBankers(4.5) = 4
roundBankers(3.55, 1) = 3.6
roundBankers(3.65, 1) = 3.6
roundBankers(10.35, 1) = 10.4
roundBankers(10.755, 2) = 10.76
See Also
roundToExp2(num)
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two.
roundDuration(num)
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to numbers from the set: 1, 10, 30, 60, 120, 180, 240, 300, 600, 1200, 1800, 3600, 7200, 18000, 36000.
roundAge(num)
Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55.
roundDown(num, arr)
Accepts a number and rounds it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned.
SEARCHING IN STRINGS
NOTE
Functions for replacing and other manipulations with strings are described separately.
position(haystack, needle), locate(haystack, needle)
Searches for the substring needle
in the string haystack
.
Returns the position (in bytes) of the found substring in the string, starting from 1.
For a case-insensitive search, use the function positionCaseInsensitive.
Syntax
position(haystack, needle[, start_pos])
position(needle IN haystack)
Alias: locate(haystack, needle[, start_pos])
.
NOTE
Syntax of position(needle IN haystack)
provides SQL-compatibility, the function works the same way as to position(haystack, needle)
.
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
– Position of the first character in the string to start search. UInt. Optional.
Returned values
Starting position in bytes (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Examples
The phrase “Hello, world!” contains a set of bytes representing a single-byte encoded text. The function returns some expected result:
Query:
SELECT position('Hello, world!', '!');
Result:
┌─position('Hello, world!', '!')─┐
│ 13 │
└────────────────────────────────┘
SELECT
position('Hello, world!', 'o', 1),
position('Hello, world!', 'o', 7)
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
└───────────────────────────────────┴───────────────────────────────────┘
The same phrase in Russian contains characters which can’t be represented using a single byte. The function returns some unexpected result (use positionUTF8 function for multi-byte encoded text):
Query:
SELECT position('Привет, мир!', '!');
Result:
┌─position('Привет, мир!', '!')─┐
│ 21 │
└───────────────────────────────┘
Examples for POSITION(needle IN haystack) syntax
Query:
SELECT 3 = position('c' IN 'abc');
Result:
┌─equals(3, position('abc', 'c'))─┐
│ 1 │
└─────────────────────────────────┘
Query:
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
Result:
┌─equals(6, position(s, '/'))─┐
│ 1 │
└─────────────────────────────┘
positionCaseInsensitive
The same as position returns the position (in bytes) of the found substring in the string, starting from 1. Use the function for a case-insensitive search.
Works under the assumption that the string contains a set of bytes representing a single-byte encoded text. If this assumption is not met and a character can’t be represented using a single byte, the function does not throw an exception and returns some unexpected result. If character can be represented using two bytes, it will use two bytes and so on.
Syntax
positionCaseInsensitive(haystack, needle[, start_pos])
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt.
Returned values
Starting position in bytes (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Example
Query:
SELECT positionCaseInsensitive('Hello, world!', 'hello');
Result:
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│ 1 │
└───────────────────────────────────────────────────┘
positionUTF8
Returns the position (in Unicode points) of the found substring in the string, starting from 1.
Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
For a case-insensitive search, use the function positionCaseInsensitiveUTF8.
Syntax
positionUTF8(haystack, needle[, start_pos])
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt
Returned values
Starting position in Unicode points (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Examples
The phrase “Hello, world!” in Russian contains a set of Unicode points representing a single-point encoded text. The function returns some expected result:
Query:
SELECT positionUTF8('Привет, мир!', '!');
Result:
┌─positionUTF8('Привет, мир!', '!')─┐
│ 12 │
└───────────────────────────────────┘
The phrase “Salut, étudiante!”, where character é
can be represented using a one point (U+00E9
) or two points (U+0065U+0301
) the function can be returned some unexpected result:
Query for the letter é
, which is represented one Unicode point U+00E9
:
SELECT positionUTF8('Salut, étudiante!', '!');
Result:
┌─positionUTF8('Salut, étudiante!', '!')─┐
│ 17 │
└────────────────────────────────────────┘
Query for the letter é
, which is represented two Unicode points U+0065U+0301
:
SELECT positionUTF8('Salut, étudiante!', '!');
Result:
┌─positionUTF8('Salut, étudiante!', '!')─┐
│ 18 │
└────────────────────────────────────────┘
positionCaseInsensitiveUTF8
The same as positionUTF8, but is case-insensitive. Returns the position (in Unicode points) of the found substring in the string, starting from 1.
Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
Syntax
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.start_pos
— Optional parameter, position of the first character in the string to start search. UInt
Returned value
Starting position in Unicode points (counting from 1), if substring was found.
0, if the substring was not found.
Type: Integer
.
Example
Query:
SELECT positionCaseInsensitiveUTF8('Привет, мир!', 'Мир');
Result:
┌─positionCaseInsensitiveUTF8('Привет, мир!', 'Мир')─┐
│ 9 │
└────────────────────────────────────────────────────┘
multiSearchAllPositions
The same as position but returns Array
of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1.
The search is performed on sequences of bytes without respect to string encoding and collation.
For case-insensitive ASCII search, use the function
multiSearchAllPositionsCaseInsensitive
.For search in UTF-8, use the function multiSearchAllPositionsUTF8.
For case-insensitive UTF-8 search, use the function multiSearchAllPositionsCaseInsensitiveUTF8.
Syntax
multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])
Arguments
haystack
— String, in which substring will to be searched. String.needle
— Substring to be searched. String.
Returned values
Array of starting positions in bytes (counting from 1), if the corresponding substring was found and 0 if not found.
Example
Query:
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
Result:
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
multiSearchAllPositionsUTF8
See multiSearchAllPositions
.
multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])
The same as position
but returns the leftmost offset of the string haystack
that is matched to some of the needles.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8
.
multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])
Returns the index i
(starting from 1) of the leftmost found needlei in the string haystack
and 0 otherwise.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8
.
multiSearchAny(haystack, [needle1, needle2, …, needlen])
Returns 1, if at least one string needlei matches the string haystack
and 0 otherwise.
For a case-insensitive search or/and in UTF-8 format use functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8
.
NOTE
In all multiSearch*
functions the number of needles should be less than 28 because of implementation specification.
match(haystack, pattern)
Checks whether the string matches the regular expression pattern
in re2
syntax. Re2
has a more limited syntax than Perl regular expressions.
Returns 0 if it does not match, or 1 if it matches.
Matching is based on UTF-8, e.g. .
matches the Unicode code point ¥
which is represented in UTF-8 using two bytes. The regular expression must not contain null bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.
For patterns to search for substrings in a string, it is better to use LIKE or ‘position’, since they work much faster.
multiMatchAny(haystack, [pattern1, pattern2, …, patternn])
The same as match
, but returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. It uses hyperscan library. For patterns to search substrings in a string, it is better to use multiSearchAny
since it works much faster.
NOTE
The length of any of the haystack
string must be less than 232 bytes otherwise the exception is thrown. This restriction takes place because of hyperscan API.
multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns any index that matches the haystack.
multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns the array of all indices that match the haystack in any order.
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiMatchAny
, but returns 1 if any pattern matches the haystack within a constant edit distance. This function relies on the experimental feature of hyperscan library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiFuzzyMatchAny
, but returns any index that matches the haystack within a constant edit distance.
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])
The same as multiFuzzyMatchAny
, but returns the array of all indices in any order that match the haystack within a constant edit distance.
NOTE
multiFuzzyMatch*
functions do not support UTF-8 regular expressions, and such expressions are treated as bytes because of hyperscan restriction.
NOTE
To turn off all functions that use hyperscan, use setting SET allow_hyperscan = 0;
.
extract(haystack, pattern)
Extracts a fragment of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.
extractAll(haystack, pattern)
Extracts all the fragments of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. Returns an array of strings consisting of all matches to the regex. In general, the behavior is the same as the ‘extract’ function (it takes the first subpattern, or the entire expression if there isn’t a subpattern).
extractAllGroupsHorizontal
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.
NOTE
extractAllGroupsHorizontal
function is slower than extractAllGroupsVertical.
Syntax
extractAllGroupsHorizontal(haystack, pattern)
Arguments
haystack
— Input string. Type: String.pattern
— Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. Ifpattern
contains no groups, an exception is thrown. Type: String.
Returned value
Type: Array.
If haystack
does not match the pattern
regex, an array of empty arrays is returned.
Example
Query:
SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
Result:
┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']] │
└──────────────────────────────────────────────────────────────────────────────────────────┘
See Also
extractAllGroupsVertical
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack
.
Syntax
extractAllGroupsVertical(haystack, pattern)
Arguments
haystack
— Input string. Type: String.pattern
— Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. Ifpattern
contains no groups, an exception is thrown. Type: String.
Returned value
Type: Array.
If haystack
does not match the pattern
regex, an empty array is returned.
Example
Query:
SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
Result:
┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']] │
└────────────────────────────────────────────────────────────────────────────────────────┘
See Also
like(haystack, pattern), haystack LIKE pattern operator
Checks whether a string matches a simple regular expression. The regular expression can contain the metasymbols %
and _
.
%
indicates any quantity of any bytes (including zero characters).
_
indicates any one byte.
Use the backslash (\
) for escaping metasymbols. See the note on escaping in the description of the ‘match’ function.
Matching is based on UTF-8, e.g. _
matches the Unicode code point ¥
which is represented in UTF-8 using two bytes. If the haystack or pattern contain a sequence of bytes that are not valid UTF-8, then the behavior is undefined. No automatic Unicode normalization is performed, if you need it you can use the normalizeUTF8*() functions for that.
For regular expressions like %needle%
, the code is more optimal and works as fast as the position
function. For other regular expressions, the code is the same as for the ‘match’ function.
notLike(haystack, pattern), haystack NOT LIKE pattern operator
The same thing as ‘like’, but negative.
ilike
Case insensitive variant of like function. You can use ILIKE
operator instead of the ilike
function.
The function ignores the language, e.g. for Turkish (i/İ), the result might be incorrect.
Syntax
ilike(haystack, pattern)
Arguments
haystack
— Input string. String.pattern
— Ifpattern
does not contain percent signs or underscores, then thepattern
only represents the string itself. An underscore (_
) inpattern
stands for (matches) any single character. A percent sign (%
) matches any sequence of zero or more characters.
Some pattern
examples:
'abc' ILIKE 'abc' true
'abc' ILIKE 'a%' true
'abc' ILIKE '_b_' true
'abc' ILIKE 'c' false
Returned values
True, if the string matches
pattern
.False, if the string does not match
pattern
.
Example
Input table:
┌─id─┬─name─────┬─days─┐
│ 1 │ January │ 31 │
│ 2 │ February │ 29 │
│ 3 │ March │ 31 │
│ 4 │ April │ 30 │
└────┴──────────┴──────┘
Query:
SELECT * FROM Months WHERE ilike(name, '%j%');
Result:
┌─id─┬─name────┬─days─┐
│ 1 │ January │ 31 │
└────┴─────────┴──────┘
See Also
ngramDistance(haystack, needle)
Calculates the 4-gram distance between haystack
and needle
: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 – the closer to zero, the more strings are similar to each other. If the constant needle
or haystack
is more than 32Kb, throws an exception. If some of the non-constant haystack
or needle
strings are more than 32Kb, the distance is always one.
For case-insensitive search or/and in UTF-8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8
.
ngramSearch(haystack, needle)
Same as ngramDistance
but calculates the non-symmetric difference between needle
and haystack
– the number of n-grams from needle minus the common number of n-grams normalized by the number of needle
n-grams. The closer to one, the more likely needle
is in the haystack
. Can be useful for fuzzy string search.
For case-insensitive search or/and in UTF-8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8
.
NOTE
For UTF-8 case we use 3-gram distance. All these are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair tolower
function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.
countSubstrings
Returns the number of substring occurrences.
For a case-insensitive search, use countSubstringsCaseInsensitive or countSubstringsCaseInsensitiveUTF8 functions.
Syntax
countSubstrings(haystack, needle[, start_pos])
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
– Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
SELECT countSubstrings('foobar.com', '.');
Result:
┌─countSubstrings('foobar.com', '.')─┐
│ 1 │
└────────────────────────────────────┘
Query:
SELECT countSubstrings('aaaa', 'aa');
Result:
┌─countSubstrings('aaaa', 'aa')─┐
│ 2 │
└───────────────────────────────┘
Query:
SELECT countSubstrings('abc___abc', 'abc', 4);
Result:
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│ 1 │
└────────────────────────────────────────┘
countSubstringsCaseInsensitive
Returns the number of substring occurrences case-insensitive.
Syntax
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
— Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
SELECT countSubstringsCaseInsensitive('aba', 'B');
Result:
┌─countSubstringsCaseInsensitive('aba', 'B')─┐
│ 1 │
└────────────────────────────────────────────┘
Query:
SELECT countSubstringsCaseInsensitive('foobar.com', 'CoM');
Result:
┌─countSubstringsCaseInsensitive('foobar.com', 'CoM')─┐
│ 1 │
└─────────────────────────────────────────────────────┘
Query:
SELECT countSubstringsCaseInsensitive('abC___abC', 'aBc', 2);
Result:
┌─countSubstringsCaseInsensitive('abC___abC', 'aBc', 2)─┐
│ 1 │
└───────────────────────────────────────────────────────┘
countSubstringsCaseInsensitiveUTF8
Returns the number of substring occurrences in UTF-8
case-insensitive.
Syntax
SELECT countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack
— The string to search in. String.needle
— The substring to search for. String.start_pos
— Position of the first character in the string to start search. Optional. UInt.
Returned values
Number of occurrences.
Type: UInt64.
Examples
Query:
SELECT countSubstringsCaseInsensitiveUTF8('абв', 'A');
Result:
┌─countSubstringsCaseInsensitiveUTF8('абв', 'A')─┐
│ 1 │
└────────────────────────────────────────────────┘
Query:
SELECT countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв');
Result:
┌─countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв')─┐
│ 3 │
└────────────────────────────────────────────────────────────┘
countMatches(haystack, pattern)
Returns the number of regular expression matches for a pattern
in a haystack
.
Syntax
countMatches(haystack, pattern)
Arguments
haystack
— The string to search in. String.pattern
— The regular expression with re2 syntax. String.
Returned value
The number of matches.
Type: UInt64.
Examples
Query:
SELECT countMatches('foobar.com', 'o+');
Result:
┌─countMatches('foobar.com', 'o+')─┐
│ 2 │
└──────────────────────────────────┘
Query:
SELECT countMatches('aaaa', 'aa');
Result:
┌─countMatches('aaaa', 'aa')────┐
│ 2 │
└───────────────────────────────┘
SPLITTING AND MERGING
splitByChar(separator, s[, max_substrings])
Splits a string into substrings separated by a specified character. It uses a constant string separator
which consists of exactly one character. Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
Syntax
splitByChar(separator, s[, max_substrings]))
Arguments
separator
— The separator which should contain exactly one character. String.s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings. Empty substrings may be selected when:
A separator occurs at the beginning or end of the string;
There are multiple consecutive separators;
The original string
s
is empty.
Example
SELECT splitByChar(',', '1,2,3,abcde');
┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde'] │
└─────────────────────────────────┘
splitByString(separator, s[, max_substrings])
Splits a string into substrings separated by a string. It uses a constant string separator
of multiple characters as the separator. If the string separator
is empty, it will split the string s
into an array of single characters.
Syntax
splitByString(separator, s[, max_substrings]))
Arguments
separator
— The separator. String.s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings. Empty substrings may be selected when:
A non-empty separator occurs at the beginning or end of the string;
There are multiple consecutive non-empty separators;
The original string
s
is empty while the separator is not empty.
Example
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde'] │
└───────────────────────────────────────────┘
SELECT splitByString('', 'abcde');
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e'] │
└────────────────────────────┘
arrayStringConcat(arr[, separator])
Concatenates string representations of values listed in the array with the separator. separator
is an optional parameter: a constant string, set to an empty string by default. Returns the string.
alphaTokens(s[, max_substrings]), splitByAlpha(s[, max_substrings])
Selects substrings of consecutive bytes from the ranges a-z and A-Z.Returns an array of substrings.
Syntax
alphaTokens(s[, max_substrings]))
splitByAlpha(s[, max_substrings])
Arguments
s
— The string to split. String.max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Returns an array of selected substrings.
Example
SELECT alphaTokens('abca1abc');
┌─alphaTokens('abca1abc')─┐
│ ['abca','abc'] │
└─────────────────────────┘
extractAllGroups(text, regexp)
Extracts all groups from non-overlapping substrings matched by a regular expression.
Syntax
extractAllGroups(text, regexp)
Arguments
text
— String or FixedString.regexp
— Regular expression. Constant. String or FixedString.
Returned values
If the function finds at least one matching group, it returns
Array(Array(String))
column, clustered by group_id (1 to N, where N is number of capturing groups inregexp
).If there is no matching group, returns an empty array.
Type: Array.
Example
Query:
SELECT extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
Result:
┌─extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','123'],['8','"hkl"']] │
└───────────────────────────────────────────────────────────────────────┘
STRINGS
NOTE
Functions for searching and replacing in strings are described separately.
empty
Checks whether the input string is empty.
Syntax
empty(x)
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays or UUID.
Arguments
x
— Input value. String.
Returned value
Returns
1
for an empty string or0
for a non-empty string.
Type: UInt8.
Example
Query:
SELECT empty('');
Result:
┌─empty('')─┐
│ 1 │
└───────────┘
notEmpty
Checks whether the input string is non-empty.
Syntax
notEmpty(x)
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays or UUID.
Arguments
x
— Input value. String.
Returned value
Returns
1
for a non-empty string or0
for an empty string string.
Type: UInt8.
Example
Query:
SELECT notEmpty('text');
Result:
┌─notEmpty('text')─┐
│ 1 │
└──────────────────┘
length
Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64. The function also works for arrays.
lengthUTF8
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
char_length, CHAR_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
character_length, CHARACTER_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64.
leftPad
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD
function.
Syntax
leftPad('string', 'length'[, 'pad_string'])
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
SELECT leftPad('abc', 7, '*'), leftPad('def', 7);
Result:
┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
│ ****abc │ def │
└────────────────────────┴───────────────────┘
leftPadUTF8
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL LPAD
function. While in the leftPad function the length is measured in bytes, here in the leftPadUTF8
function it is measured in code points.
Syntax
leftPadUTF8('string','length'[, 'pad_string'])
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7);
Result:
┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
│ ***абвг │ дежз │
└─────────────────────────────┴────────────────────────┘
rightPad
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD
function.
Syntax
rightPad('string', 'length'[, 'pad_string'])
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
SELECT rightPad('abc', 7, '*'), rightPad('abc', 7);
Result:
┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc**** │ abc │
└─────────────────────────┴────────────────────┘
rightPadUTF8
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL RPAD
function. While in the rightPad function the length is measured in bytes, here in the rightPadUTF8
function it is measured in code points.
Syntax
rightPadUTF8('string','length'[, 'pad_string'])
Arguments
string
— Input string that needs to be padded. String.length
— The length of the resulting string. UInt. If the value is less than the input string length, then the input string is returned as-is.pad_string
— The string to pad the input string with. String. Optional. If not specified, then the input string is padded with spaces.
Returned value
The resulting string of the given length.
Type: String.
Example
Query:
SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7);
Result:
┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг*** │ абвг │
└──────────────────────────────┴─────────────────────────┘
lower, lcase
Converts ASCII Latin symbols in a string to lowercase.
upper, ucase
Converts ASCII Latin symbols in a string to uppercase.
lowerUTF8
Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.
upperUTF8
Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. E.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point. If the string contains a sequence of bytes that are not valid UTF-8, then the behavior is undefined.
isValidUTF8
Returns 1, if the set of bytes is valid UTF-8 encoded, otherwise 0.
toValidUTF8
Replaces invalid UTF-8 characters by the �
(U+FFFD) character. All running in a row invalid characters are collapsed into the one replacement character.
toValidUTF8(input_string)
Arguments
input_string
— Any set of bytes represented as the String data type object.
Returned value: Valid UTF-8 string.
Example
SELECT toValidUTF8('\x61\xF0\x80\x80\x80b');
┌─toValidUTF8('a����b')─┐
│ a�b │
└───────────────────────┘
repeat
Repeats a string as many times as specified and concatenates the replicated values as a single string.
Alias: REPEAT
.
Syntax
repeat(s, n)
Arguments
Returned value
The single string, which contains the string s
repeated n
times. If n
\< 1, the function returns empty string.
Type: String
.
Example
Query:
SELECT repeat('abc', 10);
Result:
┌─repeat('abc', 10)──────────────┐
│ abcabcabcabcabcabcabcabcabcabc │
└────────────────────────────────┘
reverse
Reverses the string (as a sequence of bytes).
reverseUTF8
Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does something else (it does not throw an exception).
format(pattern, s0, s1, …)
Formatting constant pattern with the string listed in the arguments. pattern
is a simplified Python format pattern. Format string contains “replacement fields” surrounded by curly braces {}
. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ '{{' }}
and {{ '}}' }}
. Field names can be numbers (starting from zero) or empty (then they are treated as consequence numbers).
SELECT format('{1} {0} {1}', 'World', 'Hello')
┌─format('{1} {0} {1}', 'World', 'Hello')─┐
│ Hello World Hello │
└─────────────────────────────────────────┘
SELECT format('{} {}', 'Hello', 'World')
┌─format('{} {}', 'Hello', 'World')─┐
│ Hello World │
└───────────────────────────────────┘
concat
Concatenates the strings listed in the arguments, without a separator.
Syntax
concat(s1, s2, ...)
Arguments
Values of type String or FixedString.
Returned values
Returns the String that results from concatenating the arguments.
If any of argument values is NULL
, concat
returns NULL
.
Example
Query:
SELECT concat('Hello, ', 'World!');
Result:
┌─concat('Hello, ', 'World!')─┐
│ Hello, World! │
└─────────────────────────────┘
concatAssumeInjective
Same as concat, the difference is that you need to ensure that concat(s1, s2, ...) → sn
is injective, it will be used for optimization of GROUP BY.
The function is named “injective” if it always returns different result for different values of arguments. In other words: different arguments never yield identical result.
Syntax
concatAssumeInjective(s1, s2, ...)
Arguments
Values of type String or FixedString.
Returned values
Returns the String that results from concatenating the arguments.
If any of argument values is NULL
, concatAssumeInjective
returns NULL
.
Example
Input table:
CREATE TABLE key_val(`key1` String, `key2` String, `value` UInt32) ENGINE = TinyLog;
INSERT INTO key_val VALUES ('Hello, ','World',1), ('Hello, ','World',2), ('Hello, ','World!',3), ('Hello',', World!',2);
SELECT * from key_val;
┌─key1────┬─key2─────┬─value─┐
│ Hello, │ World │ 1 │
│ Hello, │ World │ 2 │
│ Hello, │ World! │ 3 │
│ Hello │ , World! │ 2 │
└─────────┴──────────┴───────┘
Query:
SELECT concat(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2);
Result:
┌─concat(key1, key2)─┬─sum(value)─┐
│ Hello, World! │ 3 │
│ Hello, World! │ 2 │
│ Hello, World │ 3 │
└────────────────────┴────────────┘
substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)
Returns a substring starting with the byte from the ‘offset’ index that is ‘length’ bytes long. Character indexing starts from one (as in standard SQL).
substringUTF8(s, offset, length)
The same as ‘substring’, but for Unicode code points. Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception).
appendTrailingCharIfAbsent(s, c)
If the ‘s’ string is non-empty and does not contain the ‘c’ character at the end, it appends the ‘c’ character to the end.
convertCharset(s, from, to)
Returns the string ‘s’ that was converted from the encoding in ‘from’ to the encoding in ‘to’.
base58Encode(plaintext)
Accepts a String and encodes it using Base58 encoding scheme using "Bitcoin" alphabet.
Syntax
base58Encode(plaintext)
Arguments
plaintext
— String column or constant.
Returned value
A string containing encoded value of 1st argument.
Type: String.
Example
Query:
SELECT base58Encode('Encoded');
Result:
┌─base58Encode('Encoded')─┐
│ 3dc8KtHrwM │
└─────────────────────────┘
base64Encode(s)
Encodes ‘s’ FixedString or String into base64.
Alias: TO_BASE64
.
base64Decode(s)
Decode base64-encoded FixedString or String ‘s’ into original string. In case of failure raises an exception.
Alias: FROM_BASE64
.
tryBase64Decode(s)
Similar to base64Decode, but returns an empty string in case of error.
endsWith(s, suffix)
Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
startsWith(str, prefix)
Returns 1 whether string starts with the specified prefix, otherwise it returns 0.
SELECT startsWith('Spider-Man', 'Spi');
Returned values
1, if the string starts with the specified prefix.
0, if the string does not start with the specified prefix.
Example
Query:
SELECT startsWith('Hello, world!', 'He');
Result:
┌─startsWith('Hello, world!', 'He')─┐
│ 1 │
└───────────────────────────────────┘
trim
Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.
Syntax
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)
Arguments
Returned value
A string without leading and (or) trailing specified characters.
Type: String
.
Example
Query:
SELECT trim(BOTH ' ()' FROM '( Hello, world! )');
Result:
┌─trim(BOTH ' ()' FROM '( Hello, world! )')─┐
│ Hello, world! │
└───────────────────────────────────────────────┘
trimLeft
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
trimLeft(input_string)
Alias: ltrim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without leading common whitespaces.
Type: String
.
Example
Query:
SELECT trimLeft(' Hello, world! ');
Result:
┌─trimLeft(' Hello, world! ')─┐
│ Hello, world! │
└─────────────────────────────────────┘
trimRight
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
trimRight(input_string)
Alias: rtrim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without trailing common whitespaces.
Type: String
.
Example
Query:
SELECT trimRight(' Hello, world! ');
Result:
┌─trimRight(' Hello, world! ')─┐
│ Hello, world! │
└──────────────────────────────────────┘
trimBoth
Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string. It does not remove other kinds of whitespace characters (tab, no-break space, etc.).
Syntax
trimBoth(input_string)
Alias: trim(input_string)
.
Arguments
input_string
— string to trim. String.
Returned value
A string without leading and trailing common whitespaces.
Type: String
.
Example
Query:
SELECT trimBoth(' Hello, world! ');
Result:
┌─trimBoth(' Hello, world! ')─┐
│ Hello, world! │
└─────────────────────────────────────┘
CRC32(s)
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial and initial value 0xffffffff
(zlib implementation).
The result type is UInt32.
CRC32IEEE(s)
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial.
The result type is UInt32.
CRC64(s)
Returns the CRC64 checksum of a string, using CRC-64-ECMA polynomial.
The result type is UInt64.
normalizeQuery
Replaces literals, sequences of literals and complex aliases with placeholders.
Syntax
normalizeQuery(x)
Arguments
x
— Sequence of characters. String.
Returned value
Sequence of characters with placeholders.
Type: String.
Example
Query:
SELECT normalizeQuery('[1, 2, 3, x]') AS query;
Result:
┌─query────┐
│ [?.., x] │
└──────────┘
normalizedQueryHash
Returns identical 64bit hash values without the values of literals for similar queries. It helps to analyze query log.
Syntax
normalizedQueryHash(x)
Arguments
x
— Sequence of characters. String.
Returned value
Hash value.
Type: UInt64.
Example
Query:
SELECT normalizedQueryHash('SELECT 1 AS `xyz`') != normalizedQueryHash('SELECT 1 AS `abc`') AS res;
Result:
┌─res─┐
│ 1 │
└─────┘
normalizeUTF8NFC
Converts a string to NFC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
normalizeUTF8NFC(words)
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFC normalization form.
Type: String.
Example
Query:
SELECT length('â'), normalizeUTF8NFC('â') AS nfc, length(nfc) AS nfc_len;
Result:
┌─length('â')─┬─nfc─┬─nfc_len─┐
│ 2 │ â │ 2 │
└─────────────┴─────┴─────────┘
normalizeUTF8NFD
Converts a string to NFD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
normalizeUTF8NFD(words)
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFD normalization form.
Type: String.
Example
Query:
SELECT length('â'), normalizeUTF8NFD('â') AS nfd, length(nfd) AS nfd_len;
Result:
┌─length('â')─┬─nfd─┬─nfd_len─┐
│ 2 │ â │ 3 │
└─────────────┴─────┴─────────┘
normalizeUTF8NFKC
Converts a string to NFKC normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
normalizeUTF8NFKC(words)
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFKC normalization form.
Type: String.
Example
Query:
SELECT length('â'), normalizeUTF8NFKC('â') AS nfkc, length(nfkc) AS nfkc_len;
Result:
┌─length('â')─┬─nfkc─┬─nfkc_len─┐
│ 2 │ â │ 2 │
└─────────────┴──────┴──────────┘
normalizeUTF8NFKD
Converts a string to NFKD normalized form, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
Syntax
normalizeUTF8NFKD(words)
Arguments
words
— Input string that contains UTF-8 encoded text. String.
Returned value
String transformed to NFKD normalization form.
Type: String.
Example
Query:
SELECT length('â'), normalizeUTF8NFKD('â') AS nfkd, length(nfkd) AS nfkd_len;
Result:
┌─length('â')─┬─nfkd─┬─nfkd_len─┐
│ 2 │ â │ 3 │
└─────────────┴──────┴──────────┘
encodeXMLComponent
Escapes characters to place string into XML text node or attribute.
The following five XML predefined entities will be replaced: <
, &
, >
, "
, '
.
Syntax
encodeXMLComponent(x)
Arguments
x
— The sequence of characters. String.
Returned value
The sequence of characters with escape characters.
Type: String.
Example
Query:
SELECT encodeXMLComponent('Hello, "world"!');
SELECT encodeXMLComponent('<123>');
SELECT encodeXMLComponent('&clickhouse');
SELECT encodeXMLComponent('\'foo\'');
Result:
Hello, "world"!
<123>
&clickhouse
'foo'
decodeXMLComponent
Replaces XML predefined entities with characters. Predefined entities are "
&
'
>
<
This function also replaces numeric character references with Unicode characters. Both decimal (like ✓
) and hexadecimal (✓
) forms are supported.
Syntax
decodeXMLComponent(x)
Arguments
x
— A sequence of characters. String.
Returned value
The sequence of characters after replacement.
Type: String.
Example
Query:
SELECT decodeXMLComponent(''foo'');
SELECT decodeXMLComponent('< Σ >');
Result:
'foo'
< Σ >
See Also
extractTextFromHTML
A function to extract text from HTML or XHTML. It does not necessarily 100% conform to any of the HTML, XML or XHTML standards, but the implementation is reasonably accurate and it is fast. The rules are the following:
Comments are skipped. Example:
<!-- test -->
. Comment must end with-->
. Nested comments are not possible. Note: constructions like<!-->
and<!--->
are not valid comments in HTML but they are skipped by other rules.CDATA is pasted verbatim. Note: CDATA is XML/XHTML specific. But it is processed for "best-effort" approach.
script
andstyle
elements are removed with all their content. Note: it is assumed that closing tag cannot appear inside content. For example, in JS string literal has to be escaped like"<\/script>"
. Note: comments and CDATA are possible insidescript
orstyle
- then closing tags are not searched inside CDATA. Example:<script><![CDATA[</script>]]></script>
. But they are still searched inside comments. Sometimes it becomes complicated:<script>var x = "<!--"; </script> var y = "-->"; alert(x + y);</script>
Note:script
andstyle
can be the names of XML namespaces - then they are not treated like usualscript
orstyle
elements. Example:<script:a>Hello</script:a>
. Note: whitespaces are possible after closing tag name:</script >
but not before:< / script>
.Other tags or tag-like elements are skipped without inner content. Example:
<a>.</a>
Note: it is expected that this HTML is illegal:<a test=">"></a>
Note: it also skips something like tags:<>
,<!>
, etc. Note: tag without end is skipped to the end of input:<hello
HTML and XML entities are not decoded. They must be processed by separate function.
Whitespaces in the text are collapsed or inserted by specific rules.
Whitespaces at the beginning and at the end are removed.
Consecutive whitespaces are collapsed.
But if the text is separated by other elements and there is no whitespace, it is inserted.
It may cause unnatural examples:
Hello<b>world</b>
,Hello<!-- -->world
- there is no whitespace in HTML, but the function inserts it. Also consider:Hello<p>world</p>
,Hello<br>world
. This behavior is reasonable for data analysis, e.g. to convert HTML to a bag of words.
Also note that correct handling of whitespaces requires the support of
<pre></pre>
and CSSdisplay
andwhite-space
properties.
Syntax
extractTextFromHTML(x)
Arguments
x
— input text. String.
Returned value
Extracted text.
Type: String.
Example
The first example contains several tags and a comment and also shows whitespace processing. The second example shows CDATA
and script
tag processing. In the third example text is extracted from the full HTML response received by the url function.
Query:
SELECT extractTextFromHTML(' <p> A text <i>with</i><b>tags</b>. <!-- comments --> </p> ');
SELECT extractTextFromHTML('<![CDATA[The content within <b>CDATA</b>]]> <script>alert("Script");</script>');
SELECT extractTextFromHTML(html) FROM url('http://www.donothingfor2minutes.com/', RawBLOB, 'html String');
Result:
A text with tags .
The content within <b>CDATA</b>
Do Nothing for 2 Minutes 2:00
TUPLES
tuple
A function that allows grouping multiple columns. For columns with the types T1, T2, …, it returns a Tuple(T1, T2, …) type tuple containing these columns. There is no cost to execute the function. Tuples are normally used as intermediate values for an argument of IN operators, or for creating a list of formal parameters of lambda functions. Tuples can’t be written to a table.
The function implements the operator (x, y, …)
.
Syntax
tuple(x, y, …)
tupleElement
A function that allows getting a column from a tuple. ‘N’ is the column index, starting from 1. ‘N’ must be a constant. ‘N’ must be a strict postive integer no greater than the size of the tuple. There is no cost to execute the function.
The function implements the operator x.N
.
Syntax
tupleElement(tuple, n)
untuple
Performs syntactic substitution of tuple elements in the call location.
Syntax
untuple(x)
You can use the EXCEPT
expression to skip columns as a result of the query.
Arguments
x
— Atuple
function, column, or tuple of elements. Tuple.
Returned value
None.
Examples
Input table:
┌─key─┬─v1─┬─v2─┬─v3─┬─v4─┬─v5─┬─v6────────┐
│ 1 │ 10 │ 20 │ 40 │ 30 │ 15 │ (33,'ab') │
│ 2 │ 25 │ 65 │ 70 │ 40 │ 6 │ (44,'cd') │
│ 3 │ 57 │ 30 │ 20 │ 10 │ 5 │ (55,'ef') │
│ 4 │ 55 │ 12 │ 7 │ 80 │ 90 │ (66,'gh') │
│ 5 │ 30 │ 50 │ 70 │ 25 │ 55 │ (77,'kl') │
└─────┴────┴────┴────┴────┴────┴───────────┘
Example of using a Tuple
-type column as the untuple
function parameter:
Query:
SELECT untuple(v6) FROM kv;
Result:
┌─_ut_1─┬─_ut_2─┐
│ 33 │ ab │
│ 44 │ cd │
│ 55 │ ef │
│ 66 │ gh │
│ 77 │ kl │
└───────┴───────┘
Note: the names are implementation specific and are subject to change. You should not assume specific names of the columns after application of the untuple
.
Example of using an EXCEPT
expression:
Query:
SELECT untuple((* EXCEPT (v2, v3),)) FROM kv;
Result:
┌─key─┬─v1─┬─v4─┬─v5─┬─v6────────┐
│ 1 │ 10 │ 30 │ 15 │ (33,'ab') │
│ 2 │ 25 │ 40 │ 6 │ (44,'cd') │
│ 3 │ 57 │ 10 │ 5 │ (55,'ef') │
│ 4 │ 55 │ 80 │ 90 │ (66,'gh') │
│ 5 │ 30 │ 25 │ 55 │ (77,'kl') │
└─────┴────┴────┴────┴───────────┘
See Also
tupleHammingDistance
Returns the Hamming Distance between two tuples of the same size.
Syntax
tupleHammingDistance(tuple1, tuple2)
Arguments
Tuples should have the same type of the elements.
Returned value
The Hamming distance.
Type: The result type is calculed the same way it is for Arithmetic functions, based on the number of elements in the input tuples.
SELECT
toTypeName(tupleHammingDistance(tuple(0), tuple(0))) AS t1,
toTypeName(tupleHammingDistance((0, 0), (0, 0))) AS t2,
toTypeName(tupleHammingDistance((0, 0, 0), (0, 0, 0))) AS t3,
toTypeName(tupleHammingDistance((0, 0, 0, 0), (0, 0, 0, 0))) AS t4,
toTypeName(tupleHammingDistance((0, 0, 0, 0, 0), (0, 0, 0, 0, 0))) AS t5
┌─t1────┬─t2─────┬─t3─────┬─t4─────┬─t5─────┐
│ UInt8 │ UInt16 │ UInt32 │ UInt64 │ UInt64 │
└───────┴────────┴────────┴────────┴────────┘
Examples
Query:
SELECT tupleHammingDistance((1, 2, 3), (3, 2, 1)) AS HammingDistance;
Result:
┌─HammingDistance─┐
│ 2 │
└─────────────────┘
Can be used with MinHash functions for detection of semi-duplicate strings:
SELECT tupleHammingDistance(wordShingleMinHash(string), wordShingleMinHashCaseInsensitive(string)) as HammingDistance FROM (SELECT 'ClickHouse is a column-oriented database management system for online analytical processing of queries.' AS string);
Result:
┌─HammingDistance─┐
│ 2 │
└─────────────────┘
TYPE CONVERSION
Common Issues of Numeric Conversions
When you convert a value from one to another data type, you should remember that in common case, it is an unsafe operation that can lead to a data loss. A data loss can occur if you try to fit value from a larger data type to a smaller data type, or if you convert values between different data types.
ClickHouse has the same behavior as C++ programs.
toInt(8|16|32|64|128|256)
Converts an input value to the Int data type. This function family includes:
toInt8(expr)
— Results in theInt8
data type.toInt16(expr)
— Results in theInt16
data type.toInt32(expr)
— Results in theInt32
data type.toInt64(expr)
— Results in theInt64
data type.toInt128(expr)
— Results in theInt128
data type.toInt256(expr)
— Results in theInt256
data type.
Arguments
expr
— Expression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.
Returned value
Integer value in the Int8
, Int16
, Int32
, Int64
, Int128
or Int256
data type.
Functions use rounding towards zero, meaning they truncate fractional digits of numbers.
The behavior of functions for the NaN and Inf arguments is undefined. Remember about numeric conversions issues, when using the functions.
Example
Query:
SELECT toInt64(nan), toInt32(32), toInt16('16'), toInt8(8.8);
Result:
┌─────────toInt64(nan)─┬─toInt32(32)─┬─toInt16('16')─┬─toInt8(8.8)─┐
│ -9223372036854775808 │ 32 │ 16 │ 8 │
└──────────────────────┴─────────────┴───────────────┴─────────────┘
toInt(8|16|32|64|128|256)OrZero
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns 0.
Example
Query:
SELECT toInt64OrZero('123123'), toInt8OrZero('123qwe123');
Result:
┌─toInt64OrZero('123123')─┬─toInt8OrZero('123qwe123')─┐
│ 123123 │ 0 │
└─────────────────────────┴───────────────────────────┘
toInt(8|16|32|64|128|256)OrNull
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns NULL.
Example
Query:
SELECT toInt64OrNull('123123'), toInt8OrNull('123qwe123');
Result:
┌─toInt64OrNull('123123')─┬─toInt8OrNull('123qwe123')─┐
│ 123123 │ ᴺᵁᴸᴸ │
└─────────────────────────┴───────────────────────────┘
toInt(8|16|32|64|128|256)OrDefault
It takes an argument of type String and tries to parse it into Int (8 | 16 | 32 | 64 | 128 | 256). If failed, returns the default type value.
Example
Query:
SELECT toInt64OrDefault('123123', cast('-1' as Int64)), toInt8OrDefault('123qwe123', cast('-1' as Int8));
Result:
┌─toInt64OrDefault('123123', CAST('-1', 'Int64'))─┬─toInt8OrDefault('123qwe123', CAST('-1', 'Int8'))─┐
│ 123123 │ -1 │
└─────────────────────────────────────────────────┴──────────────────────────────────────────────────┘
toUInt(8|16|32|64|256)
Converts an input value to the UInt data type. This function family includes:
toUInt8(expr)
— Results in theUInt8
data type.toUInt16(expr)
— Results in theUInt16
data type.toUInt32(expr)
— Results in theUInt32
data type.toUInt64(expr)
— Results in theUInt64
data type.toUInt256(expr)
— Results in theUInt256
data type.
Arguments
expr
— Expression returning a number or a string with the decimal representation of a number. Binary, octal, and hexadecimal representations of numbers are not supported. Leading zeroes are stripped.
Returned value
Integer value in the UInt8
, UInt16
, UInt32
, UInt64
or UInt256
data type.
Functions use rounding towards zero, meaning they truncate fractional digits of numbers.
The behavior of functions for negative arguments and for the NaN and Inf arguments is undefined. If you pass a string with a negative number, for example '-32'
, ClickHouse raises an exception. Remember about numeric conversions issues, when using the functions.
Example
Query:
SELECT toUInt64(nan), toUInt32(-32), toUInt16('16'), toUInt8(8.8);
Result:
┌───────toUInt64(nan)─┬─toUInt32(-32)─┬─toUInt16('16')─┬─toUInt8(8.8)─┐
│ 9223372036854775808 │ 4294967264 │ 16 │ 8 │
└─────────────────────┴───────────────┴────────────────┴──────────────┘
toUInt(8|16|32|64|256)OrZero
toUInt(8|16|32|64|256)OrNull
toUInt(8|16|32|64|256)OrDefault
toFloat(32|64)
toFloat(32|64)OrZero
toFloat(32|64)OrNull
toFloat(32|64)OrDefault
toDate
Converts the argument to Date
data type.
If the argument is DateTime
or DateTime64
, it truncates it, leaving the date component of the DateTime:
SELECT
now() AS x,
toDate(x)
┌───────────────────x─┬─toDate(now())─┐
│ 2022-12-30 13:44:17 │ 2022-12-30 │
└─────────────────────┴───────────────┘
If the argument is a string, it is parsed as Date or DateTime. If it was parsed as DateTime, the date component is being used:
SELECT
toDate('2022-12-30') AS x,
toTypeName(x)
┌──────────x─┬─toTypeName(toDate('2022-12-30'))─┐
│ 2022-12-30 │ Date │
└────────────┴──────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
SELECT
toDate('2022-12-30 01:02:03') AS x,
toTypeName(x)
┌──────────x─┬─toTypeName(toDate('2022-12-30 01:02:03'))─┐
│ 2022-12-30 │ Date │
└────────────┴───────────────────────────────────────────┘
If the argument is a number and it looks like a UNIX timestamp (is greater than 65535), it is interpreted as a DateTime, then truncated to Date in the current timezone. The timezone argument can be specified as a second argument of the function. The truncation to Date depends on the timezone:
SELECT
now() AS current_time,
toUnixTimestamp(current_time) AS ts,
toDateTime(ts) AS time_Amsterdam,
toDateTime(ts, 'Pacific/Apia') AS time_Samoa,
toDate(time_Amsterdam) AS date_Amsterdam,
toDate(time_Samoa) AS date_Samoa,
toDate(ts) AS date_Amsterdam_2,
toDate(ts, 'Pacific/Apia') AS date_Samoa_2
Row 1:
──────
current_time: 2022-12-30 13:51:54
ts: 1672404714
time_Amsterdam: 2022-12-30 13:51:54
time_Samoa: 2022-12-31 01:51:54
date_Amsterdam: 2022-12-30
date_Samoa: 2022-12-31
date_Amsterdam_2: 2022-12-30
date_Samoa_2: 2022-12-31
The example above demonstrates how the same UNIX timestamp can be interpreted as different dates in different time zones.
If the argument is a number and it is smaller than 65536, it is interpreted as the number of days since 1970-01-01 (a UNIX day) and converted to Date. It corresponds to the internal numeric representation of the Date
data type. Example:
SELECT toDate(12345)
┌─toDate(12345)─┐
│ 2003-10-20 │
└───────────────┘
This conversion does not depend on timezones.
If the argument does not fit in the range of the Date type, it results in an implementation-defined behavior, that can saturate to the maximum supported date or overflow:
SELECT toDate(10000000000.)
┌─toDate(10000000000.)─┐
│ 2106-02-07 │
└──────────────────────┘
The function toDate
can be also written in alternative forms:
SELECT
now() AS time,
toDate(time),
DATE(time),
CAST(time, 'Date')
┌────────────────time─┬─toDate(now())─┬─DATE(now())─┬─CAST(now(), 'Date')─┐
│ 2022-12-30 13:54:58 │ 2022-12-30 │ 2022-12-30 │ 2022-12-30 │
└─────────────────────┴───────────────┴─────────────┴─────────────────────┘
Have a nice day working with dates and times.
toDateOrZero
toDateOrNull
toDateOrDefault
toDateTime
toDateTimeOrZero
toDateTimeOrNull
toDateTimeOrDefault
toDate32
Converts the argument to the Date32 data type. If the value is outside the range, toDate32
returns the border values supported by Date32
. If the argument has Date type, borders of Date
are taken into account.
Syntax
toDate32(expr)
Arguments
Returned value
A calendar date.
Type: Date32.
Example
The value is within the range:
SELECT toDate32('1955-01-01') AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32('1925-01-01'))─┐
│ 1955-01-01 │ Date32 │
└────────────┴────────────────────────────────────┘
The value is outside the range:
SELECT toDate32('1899-01-01') AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32('1899-01-01'))─┐
│ 1900-01-01 │ Date32 │
└────────────┴────────────────────────────────────┘
With
Date
-type argument:
SELECT toDate32(toDate('1899-01-01')) AS value, toTypeName(value);
┌──────value─┬─toTypeName(toDate32(toDate('1899-01-01')))─┐
│ 1970-01-01 │ Date32 │
└────────────┴────────────────────────────────────────────┘
toDate32OrZero
The same as toDate32 but returns the min value of Date32 if an invalid argument is received.
Example
Query:
SELECT toDate32OrZero('1899-01-01'), toDate32OrZero('');
Result:
┌─toDate32OrZero('1899-01-01')─┬─toDate32OrZero('')─┐
│ 1900-01-01 │ 1900-01-01 │
└──────────────────────────────┴────────────────────┘
toDate32OrNull
The same as toDate32 but returns NULL
if an invalid argument is received.
Example
Query:
SELECT toDate32OrNull('1955-01-01'), toDate32OrNull('');
Result:
┌─toDate32OrNull('1955-01-01')─┬─toDate32OrNull('')─┐
│ 1955-01-01 │ ᴺᵁᴸᴸ │
└──────────────────────────────┴────────────────────┘
toDate32OrDefault
Converts the argument to the Date32 data type. If the value is outside the range, toDate32OrDefault
returns the lower border value supported by Date32
. If the argument has Date type, borders of Date
are taken into account. Returns default value if an invalid argument is received.
Example
Query:
SELECT
toDate32OrDefault('1930-01-01', toDate32('2020-01-01')),
toDate32OrDefault('xx1930-01-01', toDate32('2020-01-01'));
Result:
┌─toDate32OrDefault('1930-01-01', toDate32('2020-01-01'))─┬─toDate32OrDefault('xx1930-01-01', toDate32('2020-01-01'))─┐
│ 1930-01-01 │ 2020-01-01 │
└─────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘
toDateTime64
Converts the argument to the DateTime64 data type.
Syntax
toDateTime64(expr, scale, [timezone])
Arguments
scale
- Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ].timezone
- Time zone of the specified datetime64 object.
Returned value
A calendar date and time of day, with sub-second precision.
Type: DateTime64.
Example
The value is within the range:
SELECT toDateTime64('1955-01-01 00:00:00.000', 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64('1955-01-01 00:00:00.000', 3))─┐
│ 1955-01-01 00:00:00.000 │ DateTime64(3) │
└─────────────────────────┴────────────────────────────────────────────────────────┘
As decimal with precision:
SELECT toDateTime64(1546300800.000, 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64(1546300800., 3))─┐
│ 2019-01-01 00:00:00.000 │ DateTime64(3) │
└─────────────────────────┴──────────────────────────────────────────┘
Without the decimal point the value is still treated as Unix Timestamp in seconds:
SELECT toDateTime64(1546300800000, 3) AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64(1546300800000, 3))─┐
│ 2282-12-31 00:00:00.000 │ DateTime64(3) │
└─────────────────────────┴────────────────────────────────────────────┘
With
timezone
:
SELECT toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul') AS value, toTypeName(value);
┌───────────────────value─┬─toTypeName(toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul'))─┐
│ 2019-01-01 00:00:00.000 │ DateTime64(3, 'Asia/Istanbul') │
└─────────────────────────┴─────────────────────────────────────────────────────────────────────┘
toDecimal(32|64|128|256)
Converts value
to the Decimal data type with precision of S
. The value
can be a number or a string. The S
(scale) parameter specifies the number of decimal places.
toDecimal32(value, S)
toDecimal64(value, S)
toDecimal128(value, S)
toDecimal256(value, S)
toDecimal(32|64|128|256)OrNull
Converts an input string to a Nullable(Decimal(P,S)) data type value. This family of functions includes:
toDecimal32OrNull(expr, S)
— Results inNullable(Decimal32(S))
data type.toDecimal64OrNull(expr, S)
— Results inNullable(Decimal64(S))
data type.toDecimal128OrNull(expr, S)
— Results inNullable(Decimal128(S))
data type.toDecimal256OrNull(expr, S)
— Results inNullable(Decimal256(S))
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a NULL
value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Nullable(Decimal(P,S))
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.NULL
, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Examples
Query:
SELECT toDecimal32OrNull(toString(-1.111), 5) AS val, toTypeName(val);
Result:
┌────val─┬─toTypeName(toDecimal32OrNull(toString(-1.111), 5))─┐
│ -1.111 │ Nullable(Decimal(9, 5)) │
└────────┴────────────────────────────────────────────────────┘
Query:
SELECT toDecimal32OrNull(toString(-1.111), 2) AS val, toTypeName(val);
Result:
┌──val─┬─toTypeName(toDecimal32OrNull(toString(-1.111), 2))─┐
│ ᴺᵁᴸᴸ │ Nullable(Decimal(9, 2)) │
└──────┴────────────────────────────────────────────────────┘
toDecimal(32|64|128|256)OrDefault
Converts an input string to a Decimal(P,S) data type value. This family of functions includes:
toDecimal32OrDefault(expr, S)
— Results inDecimal32(S)
data type.toDecimal64OrDefault(expr, S)
— Results inDecimal64(S)
data type.toDecimal128OrDefault(expr, S)
— Results inDecimal128(S)
data type.toDecimal256OrDefault(expr, S)
— Results inDecimal256(S)
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a default value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Decimal(P,S)
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.Default
Decimal(P,S)
data type value, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Examples
Query:
SELECT toDecimal32OrDefault(toString(-1.111), 5) AS val, toTypeName(val);
Result:
┌────val─┬─toTypeName(toDecimal32OrDefault(toString(-1.111), 5))─┐
│ -1.111 │ Decimal(9, 5) │
└────────┴───────────────────────────────────────────────────────┘
Query:
SELECT toDecimal32OrDefault(toString(-1.111), 2) AS val, toTypeName(val);
Result:
┌─val─┬─toTypeName(toDecimal32OrDefault(toString(-1.111), 2))─┐
│ 0 │ Decimal(9, 2) │
└─────┴───────────────────────────────────────────────────────┘
toDecimal(32|64|128|256)OrZero
Converts an input value to the Decimal(P,S) data type. This family of functions includes:
toDecimal32OrZero( expr, S)
— Results inDecimal32(S)
data type.toDecimal64OrZero( expr, S)
— Results inDecimal64(S)
data type.toDecimal128OrZero( expr, S)
— Results inDecimal128(S)
data type.toDecimal256OrZero( expr, S)
— Results inDecimal256(S)
data type.
These functions should be used instead of toDecimal*()
functions, if you prefer to get a 0
value instead of an exception in the event of an input value parsing error.
Arguments
expr
— Expression, returns a value in the String data type. ClickHouse expects the textual representation of the decimal number. For example,'1.111'
.S
— Scale, the number of decimal places in the resulting value.
Returned value
A value in the Nullable(Decimal(P,S))
data type. The value contains:
Number with
S
decimal places, if ClickHouse interprets the input string as a number.0 with
S
decimal places, if ClickHouse can’t interpret the input string as a number or if the input number contains more thanS
decimal places.
Example
Query:
SELECT toDecimal32OrZero(toString(-1.111), 5) AS val, toTypeName(val);
Result:
┌────val─┬─toTypeName(toDecimal32OrZero(toString(-1.111), 5))─┐
│ -1.111 │ Decimal(9, 5) │
└────────┴────────────────────────────────────────────────────┘
Query:
SELECT toDecimal32OrZero(toString(-1.111), 2) AS val, toTypeName(val);
Result:
┌──val─┬─toTypeName(toDecimal32OrZero(toString(-1.111), 2))─┐
│ 0.00 │ Decimal(9, 2) │
└──────┴────────────────────────────────────────────────────┘
toString
Functions for converting between numbers, strings (but not fixed strings), dates, and dates with times. All these functions accept one argument.
When converting to or from a string, the value is formatted or parsed using the same rules as for the TabSeparated format (and almost all other text formats). If the string can’t be parsed, an exception is thrown and the request is canceled.
When converting dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix epoch. When converting dates with times to numbers or vice versa, the date with time corresponds to the number of seconds since the beginning of the Unix epoch.
The date and date-with-time formats for the toDate/toDateTime functions are defined as follows:
YYYY-MM-DD
YYYY-MM-DD hh:mm:ss
As an exception, if converting from UInt32, Int32, UInt64, or Int64 numeric types to Date, and if the number is greater than or equal to 65536, the number is interpreted as a Unix timestamp (and not as the number of days) and is rounded to the date. This allows support for the common occurrence of writing ‘toDate(unix_timestamp)’, which otherwise would be an error and would require writing the more cumbersome ‘toDate(toDateTime(unix_timestamp))’.
Conversion between a date and a date with time is performed the natural way: by adding a null time or dropping the time.
Conversion between numeric types uses the same rules as assignments between different numeric types in C++.
Additionally, the toString function of the DateTime argument can take a second String argument containing the name of the time zone. Example: Asia/Yekaterinburg
In this case, the time is formatted according to the specified time zone.
Example
Query:
SELECT
now() AS now_local,
toString(now(), 'Asia/Yekaterinburg') AS now_yekat;
Result:
┌───────────now_local─┬─now_yekat───────────┐
│ 2016-06-15 00:11:21 │ 2016-06-15 02:11:21 │
└─────────────────────┴─────────────────────┘
Also see the toUnixTimestamp
function.
toFixedString(s, N)
Converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is padded with null bytes to the right. If the string has more bytes than N, an exception is thrown.
toStringCutToZero(s)
Accepts a String or FixedString argument. Returns the String with the content truncated at the first zero byte found.
Example
Query:
SELECT toFixedString('foo', 8) AS s, toStringCutToZero(s) AS s_cut;
Result:
┌─s─────────────┬─s_cut─┐
│ foo\0\0\0\0\0 │ foo │
└───────────────┴───────┘
Query:
SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut;
Result:
┌─s──────────┬─s_cut─┐
│ foo\0bar\0 │ foo │
└────────────┴───────┘
reinterpretAsUInt(8|16|32|64)
reinterpretAsInt(8|16|32|64)
reinterpretAsFloat(32|64)
reinterpretAsDate
reinterpretAsDateTime
These functions accept a string and interpret the bytes placed at the beginning of the string as a number in host order (little endian). If the string isn’t long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date is interpreted as the number of days since the beginning of the Unix Epoch, and a date with time is interpreted as the number of seconds since the beginning of the Unix Epoch.
reinterpretAsString
This function accepts a number or date or date with time and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.
reinterpretAsFixedString
This function accepts a number or date or date with time and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.
reinterpretAsUUID
Accepts 16 bytes string and returns UUID containing bytes representing the corresponding value in network byte order (big-endian). If the string isn't long enough, the function works as if the string is padded with the necessary number of null bytes to the end. If the string is longer than 16 bytes, the extra bytes at the end are ignored.
Syntax
reinterpretAsUUID(fixed_string)
Arguments
fixed_string
— Big-endian byte string. FixedString.
Returned value
The UUID type value. UUID.
Examples
String to UUID.
Query:
SELECT reinterpretAsUUID(reverse(unhex('000102030405060708090a0b0c0d0e0f')));
Result:
┌─reinterpretAsUUID(reverse(unhex('000102030405060708090a0b0c0d0e0f')))─┐
│ 08090a0b-0c0d-0e0f-0001-020304050607 │
└───────────────────────────────────────────────────────────────────────┘
Going back and forth from String to UUID.
Query:
WITH
generateUUIDv4() AS uuid,
identity(lower(hex(reverse(reinterpretAsString(uuid))))) AS str,
reinterpretAsUUID(reverse(unhex(str))) AS uuid2
SELECT uuid = uuid2;
Result:
┌─equals(uuid, uuid2)─┐
│ 1 │
└─────────────────────┘
reinterpret(x, T)
Uses the same source in-memory bytes sequence for x
value and reinterprets it to destination type.
Syntax
reinterpret(x, type)
Arguments
x
— Any type.type
— Destination type. String.
Returned value
Destination type value.
Examples
Query:
SELECT reinterpret(toInt8(-1), 'UInt8') as int_to_uint,
reinterpret(toInt8(1), 'Float32') as int_to_float,
reinterpret('1', 'UInt32') as string_to_int;
Result:
┌─int_to_uint─┬─int_to_float─┬─string_to_int─┐
│ 255 │ 1e-45 │ 49 │
└─────────────┴──────────────┴───────────────┘
CAST(x, T)
Converts an input value to the specified data type. Unlike the reinterpret function, CAST
tries to present the same value using the new data type. If the conversion can not be done then an exception is raised. Several syntax variants are supported.
Syntax
CAST(x, T)
CAST(x AS t)
x::t
Arguments
x
— A value to convert. May be of any type.T
— The name of the target data type. String.t
— The target data type.
Returned value
Converted value.
NOTE
If the input value does not fit the bounds of the target type, the result overflows. For example, CAST(-1, 'UInt8')
returns 255
.
Examples
Query:
SELECT
CAST(toInt8(-1), 'UInt8') AS cast_int_to_uint,
CAST(1.5 AS Decimal(3,2)) AS cast_float_to_decimal,
'1'::Int32 AS cast_string_to_int;
Result:
┌─cast_int_to_uint─┬─cast_float_to_decimal─┬─cast_string_to_int─┐
│ 255 │ 1.50 │ 1 │
└──────────────────┴───────────────────────┴────────────────────┘
Query:
SELECT
'2016-06-15 23:00:00' AS timestamp,
CAST(timestamp AS DateTime) AS datetime,
CAST(timestamp AS Date) AS date,
CAST(timestamp, 'String') AS string,
CAST(timestamp, 'FixedString(22)') AS fixed_string;
Result:
┌─timestamp───────────┬────────────datetime─┬───────date─┬─string──────────────┬─fixed_string──────────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00\0\0\0 │
└─────────────────────┴─────────────────────┴────────────┴─────────────────────┴───────────────────────────┘
Conversion to FixedString(N) only works for arguments of type String or FixedString.
Type conversion to Nullable and back is supported.
Example
Query:
SELECT toTypeName(x) FROM t_null;
Result:
┌─toTypeName(x)─┐
│ Int8 │
│ Int8 │
└───────────────┘
Query:
SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null;
Result:
┌─toTypeName(CAST(x, 'Nullable(UInt16)'))─┐
│ Nullable(UInt16) │
│ Nullable(UInt16) │
└─────────────────────────────────────────┘
See also
cast_keep_nullable setting
accurateCast(x, T)
Converts x
to the T
data type.
The difference from cast(x, T) is that accurateCast
does not allow overflow of numeric types during cast if type value x
does not fit the bounds of type T
. For example, accurateCast(-1, 'UInt8')
throws an exception.
Example
Query:
SELECT cast(-1, 'UInt8') as uint8;
Result:
┌─uint8─┐
│ 255 │
└───────┘
Query:
SELECT accurateCast(-1, 'UInt8') as uint8;
Result:
Code: 70. DB::Exception: Received from localhost:9000. DB::Exception: Value in column Int8 cannot be safely converted into type UInt8: While processing accurateCast(-1, 'UInt8') AS uint8.
accurateCastOrNull(x, T)
Converts input value x
to the specified data type T
. Always returns Nullable type and returns NULL if the casted value is not representable in the target type.
Syntax
accurateCastOrNull(x, T)
Parameters
x
— Input value.T
— The name of the returned data type.
Returned value
The value, converted to the specified data type
T
.
Example
Query:
SELECT toTypeName(accurateCastOrNull(5, 'UInt8'));
Result:
┌─toTypeName(accurateCastOrNull(5, 'UInt8'))─┐
│ Nullable(UInt8) │
└────────────────────────────────────────────┘
Query:
SELECT
accurateCastOrNull(-1, 'UInt8') as uint8,
accurateCastOrNull(128, 'Int8') as int8,
accurateCastOrNull('Test', 'FixedString(2)') as fixed_string;
Result:
┌─uint8─┬─int8─┬─fixed_string─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
└───────┴──────┴──────────────┘
accurateCastOrDefault(x, T[, default_value])
Converts input value x
to the specified data type T
. Returns default type value or default_value
if specified if the casted value is not representable in the target type.
Syntax
accurateCastOrDefault(x, T)
Parameters
x
— Input value.T
— The name of the returned data type.default_value
— Default value of returned data type.
Returned value
The value converted to the specified data type
T
.
Example
Query:
SELECT toTypeName(accurateCastOrDefault(5, 'UInt8'));
Result:
┌─toTypeName(accurateCastOrDefault(5, 'UInt8'))─┐
│ UInt8 │
└───────────────────────────────────────────────┘
Query:
SELECT
accurateCastOrDefault(-1, 'UInt8') as uint8,
accurateCastOrDefault(-1, 'UInt8', 5) as uint8_default,
accurateCastOrDefault(128, 'Int8') as int8,
accurateCastOrDefault(128, 'Int8', 5) as int8_default,
accurateCastOrDefault('Test', 'FixedString(2)') as fixed_string,
accurateCastOrDefault('Test', 'FixedString(2)', 'Te') as fixed_string_default;
Result:
┌─uint8─┬─uint8_default─┬─int8─┬─int8_default─┬─fixed_string─┬─fixed_string_default─┐
│ 0 │ 5 │ 0 │ 5 │ │ Te │
└───────┴───────────────┴──────┴──────────────┴──────────────┴──────────────────────┘
toInterval(Year|Quarter|Month|Week|Day|Hour|Minute|Second)
Converts a Number type argument to an Interval data type.
Syntax
toIntervalSecond(number)
toIntervalMinute(number)
toIntervalHour(number)
toIntervalDay(number)
toIntervalWeek(number)
toIntervalMonth(number)
toIntervalQuarter(number)
toIntervalYear(number)
Arguments
number
— Duration of interval. Positive integer number.
Returned values
The value in
Interval
data type.
Example
Query:
WITH
toDate('2019-01-01') AS date,
INTERVAL 1 WEEK AS interval_week,
toIntervalWeek(1) AS interval_to_week
SELECT
date + interval_week,
date + interval_to_week;
Result:
┌─plus(date, interval_week)─┬─plus(date, interval_to_week)─┐
│ 2019-01-08 │ 2019-01-08 │
└───────────────────────────┴──────────────────────────────┘
parseDateTimeBestEffort
parseDateTime32BestEffort
Converts a date and time in the String representation to DateTime data type.
The function parses ISO 8601, RFC 1123 - 5.2.14 RFC-822 Date and Time Specification, ClickHouse’s and some other date and time formats.
Syntax
parseDateTimeBestEffort(time_string [, time_zone])
Arguments
time_string
— String containing a date and time to convert. String.time_zone
— Time zone. The function parsestime_string
according to the time zone. String.
Supported non-standard formats
A string containing 9..10 digit unix timestamp.
A string with a date and a time component:
YYYYMMDDhhmmss
,DD/MM/YYYY hh:mm:ss
,DD-MM-YY hh:mm
,YYYY-MM-DD hh:mm:ss
, etc.A string with a date, but no time component:
YYYY
,YYYYMM
,YYYY*MM
,DD/MM/YYYY
,DD-MM-YY
etc.A string with a day and time:
DD
,DD hh
,DD hh:mm
. In this caseYYYY-MM
are substituted as2000-01
.A string that includes the date and time along with time zone offset information:
YYYY-MM-DD hh:mm:ss ±h:mm
, etc. For example,2020-12-12 17:36:00 -5:00
.
For all of the formats with separator the function parses months names expressed by their full name or by the first three letters of a month name. Examples: 24/DEC/18
, 24-Dec-18
, 01-September-2018
.
Returned value
time_string
converted to theDateTime
data type.
Examples
Query:
SELECT parseDateTimeBestEffort('23/10/2020 12:12:57')
AS parseDateTimeBestEffort;
Result:
┌─parseDateTimeBestEffort─┐
│ 2020-10-23 12:12:57 │
└─────────────────────────┘
Query:
SELECT parseDateTimeBestEffort('Sat, 18 Aug 2018 07:22:16 GMT', 'Asia/Istanbul')
AS parseDateTimeBestEffort;
Result:
┌─parseDateTimeBestEffort─┐
│ 2018-08-18 10:22:16 │
└─────────────────────────┘
Query:
SELECT parseDateTimeBestEffort('1284101485')
AS parseDateTimeBestEffort;
Result:
┌─parseDateTimeBestEffort─┐
│ 2015-07-07 12:04:41 │
└─────────────────────────┘
Query:
SELECT parseDateTimeBestEffort('2018-10-23 10:12:12')
AS parseDateTimeBestEffort;
Result:
┌─parseDateTimeBestEffort─┐
│ 2018-10-23 10:12:12 │
└─────────────────────────┘
Query:
SELECT parseDateTimeBestEffort('10 20:19');
Result:
┌─parseDateTimeBestEffort('10 20:19')─┐
│ 2000-01-10 20:19:00 │
└─────────────────────────────────────┘
See Also
parseDateTimeBestEffortUS
This function behaves like parseDateTimeBestEffort for ISO date formats, e.g. YYYY-MM-DD hh:mm:ss
, and other date formats where the month and date components can be unambiguously extracted, e.g. YYYYMMDDhhmmss
, YYYY-MM
, DD hh
, or YYYY-MM-DD hh:mm:ss ±h:mm
. If the month and the date components cannot be unambiguously extracted, e.g. MM/DD/YYYY
, MM-DD-YYYY
, or MM-DD-YY
, it prefers the US date format instead of DD/MM/YYYY
, DD-MM-YYYY
, or DD-MM-YY
. As an exception from the latter, if the month is bigger than 12 and smaller or equal than 31, this function falls back to the behavior of parseDateTimeBestEffort, e.g. 15/08/2020
is parsed as 2020-08-15
.
parseDateTimeBestEffortOrNull
parseDateTime32BestEffortOrNull
Same as for parseDateTimeBestEffort except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTimeBestEffortOrZero
parseDateTime32BestEffortOrZero
Same as for parseDateTimeBestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
parseDateTimeBestEffortUSOrNull
Same as parseDateTimeBestEffortUS function except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTimeBestEffortUSOrZero
Same as parseDateTimeBestEffortUS function except that it returns zero date (1970-01-01
) or zero date with time (1970-01-01 00:00:00
) when it encounters a date format that cannot be processed.
parseDateTime64BestEffort
Same as parseDateTimeBestEffort function but also parse milliseconds and microseconds and returns DateTime data type.
Syntax
parseDateTime64BestEffort(time_string [, precision [, time_zone]])
Parameters
time_string
— String containing a date or date with time to convert. String.precision
— Required precision.3
— for milliseconds,6
— for microseconds. Default —3
. Optional. UInt8.
Returned value
time_string
converted to the DateTime data type.
Examples
Query:
SELECT parseDateTime64BestEffort('2021-01-01') AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346') AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346',6) AS a, toTypeName(a) AS t
UNION ALL
SELECT parseDateTime64BestEffort('2021-01-01 01:01:00.12346',3,'Asia/Istanbul') AS a, toTypeName(a) AS t
FORMAT PrettyCompactMonoBlock;
Result:
┌──────────────────────────a─┬─t──────────────────────────────┐
│ 2021-01-01 01:01:00.123000 │ DateTime64(3) │
│ 2021-01-01 00:00:00.000000 │ DateTime64(3) │
│ 2021-01-01 01:01:00.123460 │ DateTime64(6) │
│ 2020-12-31 22:01:00.123000 │ DateTime64(3, 'Asia/Istanbul') │
└────────────────────────────┴────────────────────────────────┘
parseDateTime64BestEffortUS
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity.
parseDateTime64BestEffortOrNull
Same as for parseDateTime64BestEffort except that it returns NULL
when it encounters a date format that cannot be processed.
parseDateTime64BestEffortOrZero
Same as for parseDateTime64BestEffort except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
parseDateTime64BestEffortUSOrNull
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity and returns NULL
when it encounters a date format that cannot be processed.
parseDateTime64BestEffortUSOrZero
Same as for parseDateTime64BestEffort, except that this function prefers US date format (MM/DD/YYYY
etc.) in case of ambiguity and returns zero date or zero date time when it encounters a date format that cannot be processed.
toLowCardinality
Converts input parameter to the LowCardinality version of same data type.
To convert data from the LowCardinality
data type use the CAST function. For example, CAST(x as String)
.
Syntax
toLowCardinality(expr)
Arguments
expr
— Expression resulting in one of the supported data types.
Returned values
Result of
expr
.
Type: LowCardinality(expr_result_type)
Example
Query:
SELECT toLowCardinality('1');
Result:
┌─toLowCardinality('1')─┐
│ 1 │
└───────────────────────┘
toUnixTimestamp64Milli
toUnixTimestamp64Micro
toUnixTimestamp64Nano
Converts a DateTime64
to a Int64
value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision.
NOTE
The output value is a timestamp in UTC, not in the timezone of DateTime64
.
Syntax
toUnixTimestamp64Milli(value)
Arguments
value
— DateTime64 value with any precision.
Returned value
value
converted to theInt64
data type.
Examples
Query:
WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64
SELECT toUnixTimestamp64Milli(dt64);
Result:
┌─toUnixTimestamp64Milli(dt64)─┐
│ 1568650812345 │
└──────────────────────────────┘
Query:
WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64
SELECT toUnixTimestamp64Nano(dt64);
Result:
┌─toUnixTimestamp64Nano(dt64)─┐
│ 1568650812345678000 │
└─────────────────────────────┘
fromUnixTimestamp64Milli
fromUnixTimestamp64Micro
fromUnixTimestamp64Nano
Converts an Int64
to a DateTime64
value with fixed sub-second precision and optional timezone. Input value is scaled up or down appropriately depending on it’s precision. Please note that input value is treated as UTC timestamp, not timestamp at given (or implicit) timezone.
Syntax
fromUnixTimestamp64Milli(value [, ti])
Arguments
value
—Int64
value with any precision.timezone
—String
(optional) timezone name of the result.
Returned value
value
converted to theDateTime64
data type.
Example
Query:
WITH CAST(1234567891011, 'Int64') AS i64
SELECT fromUnixTimestamp64Milli(i64, 'UTC');
Result:
┌─fromUnixTimestamp64Milli(i64, 'UTC')─┐
│ 2009-02-13 23:31:31.011 │
└──────────────────────────────────────┘
formatRow
Converts arbitrary expressions into a string via given format.
Syntax
formatRow(format, x, y, ...)
Arguments
x
,y
, ... — Expressions.
Returned value
A formatted string. (for text formats it's usually terminated with the new line character).
Example
Query:
SELECT formatRow('CSV', number, 'good')
FROM numbers(3);
Result:
┌─formatRow('CSV', number, 'good')─┐
│ 0,"good"
│
│ 1,"good"
│
│ 2,"good"
│
└──────────────────────────────────┘
Note: If format contains suffix/prefix, it will be written in each row.
Example
Query:
SELECT formatRow('CustomSeparated', number, 'good')
FROM numbers(3)
SETTINGS format_custom_result_before_delimiter='<prefix>\n', format_custom_result_after_delimiter='<suffix>'
Result:
┌─formatRow('CustomSeparated', number, 'good')─┐
│ <prefix>
0 good
<suffix> │
│ <prefix>
1 good
<suffix> │
│ <prefix>
2 good
<suffix> │
└──────────────────────────────────────────────┘
Note: Only row-based formats are supported in this function.
formatRowNoNewline
Converts arbitrary expressions into a string via given format. Differs from formatRow in that this function trims the last if any.
Syntax
formatRowNoNewline(format, x, y, ...)
Arguments
x
,y
, ... — Expressions.
Returned value
A formatted string.
Example
Query:
SELECT formatRowNoNewline('CSV', number, 'good')
FROM numbers(3);
Result:
┌─formatRowNoNewline('CSV', number, 'good')─┐
│ 0,"good" │
│ 1,"good" │
│ 2,"good" │
└───────────────────────────────────────────┘
URLs
All these functions do not follow the RFC. They are maximally simplified for improved performance.
protocol
Extracts the protocol from a URL.
Examples of typical returned values: http, https, ftp, mailto, tel, magnet…
domain
Extracts the hostname from a URL.
domain(url)
Arguments
url
— URL. Type: String.
The URL can be specified with or without a scheme. Examples:
svn+ssh://some.svn-hosting.com:80/repo/trunk
some.svn-hosting.com:80/repo/trunk
https://clickhouse.com/time/
For these examples, the domain
function returns the following results:
some.svn-hosting.com
some.svn-hosting.com
clickhouse.com
Returned values
Host name. If ClickHouse can parse the input string as a URL.
Empty string. If ClickHouse can’t parse the input string as a URL.
Type: String
.
Example
SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk');
┌─domain('svn+ssh://some.svn-hosting.com:80/repo/trunk')─┐
│ some.svn-hosting.com │
└────────────────────────────────────────────────────────┘
domainWithoutWWW
Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present.
topLevelDomain
Extracts the the top-level domain from a URL.
topLevelDomain(url)
Arguments
url
— URL. Type: String.
The URL can be specified with or without a scheme. Examples:
svn+ssh://some.svn-hosting.com:80/repo/trunk
some.svn-hosting.com:80/repo/trunk
https://clickhouse.com/time/
Returned values
Domain name. If ClickHouse can parse the input string as a URL.
Empty string. If ClickHouse cannot parse the input string as a URL.
Type: String
.
Example
SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk');
┌─topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')─┐
│ com │
└────────────────────────────────────────────────────────────────────┘
firstSignificantSubdomain
Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’
. The list of “insignificant” second-level domains and other implementation details may change in the future.
cutToFirstSignificantSubdomain
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above).
For example:
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'
.cutToFirstSignificantSubdomain('www.tr') = 'tr'
.cutToFirstSignificantSubdomain('tr') = ''
.
cutToFirstSignificantSubdomainWithWWW
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www".
For example:
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'
.cutToFirstSignificantSubdomain('www.tr') = 'www.tr'
.cutToFirstSignificantSubdomain('tr') = ''
.
cutToFirstSignificantSubdomainCustom
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
<!-- https://publicsuffix.org/list/public_suffix_list.dat -->
<public_suffix_list>public_suffix_list.dat</public_suffix_list>
<!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>
Syntax
cutToFirstSignificantSubdomain(URL, TLD)
Parameters
Returned value
Part of the domain that includes top-level subdomains up to the first significant subdomain.
Type: String.
Example
Query:
SELECT cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');
Result:
┌─cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┐
│ foo.there-is-no-such-domain │
└───────────────────────────────────────────────────────────────────────────────────────────────┘
See Also
cutToFirstSignificantSubdomainCustomWithWWW
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping www
. Accepts custom TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
<!-- https://publicsuffix.org/list/public_suffix_list.dat -->
<public_suffix_list>public_suffix_list.dat</public_suffix_list>
<!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>
Syntax
cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)
Parameters
Returned value
Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping
www
.
Type: String.
Example
Query:
SELECT cutToFirstSignificantSubdomainCustomWithWWW('www.foo', 'public_suffix_list');
Result:
┌─cutToFirstSignificantSubdomainCustomWithWWW('www.foo', 'public_suffix_list')─┐
│ www.foo │
└──────────────────────────────────────────────────────────────────────────────┘
See Also
firstSignificantSubdomainCustom
Returns the first significant subdomain. Accepts customs TLD list name.
Can be useful if you need fresh TLD list or you have custom.
Configuration example:
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<top_level_domains_lists>
<!-- https://publicsuffix.org/list/public_suffix_list.dat -->
<public_suffix_list>public_suffix_list.dat</public_suffix_list>
<!-- NOTE: path is under top_level_domains_path -->
</top_level_domains_lists>
Syntax
firstSignificantSubdomainCustom(URL, TLD)
Parameters
Returned value
First significant subdomain.
Type: String.
Example
Query:
SELECT firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');
Result:
┌─firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┐
│ foo │
└──────────────────────────────────────────────────────────────────────────────────────────┘
See Also
port(URL[, default_port = 0])
Returns the port or default_port
if there is no port in the URL (or in case of validation error).
path
Returns the path. Example: /top/news.html
The path does not include the query string.
pathFull
The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments
queryString
Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #.
fragment
Returns the fragment identifier. fragment does not include the initial hash symbol.
queryStringAndFragment
Returns the query string and fragment identifier. Example: page=1#29390.
extractURLParameter(URL, name)
Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.
extractURLParameters(URL)
Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
extractURLParameterNames(URL)
Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.
URLHierarchy(URL)
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.
URLPathHierarchy(URL)
The same as above, but without the protocol and host in the result. The / element (root) is not included.
URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
'/browse/',
'/browse/CONV-6788'
]
decodeURLComponent(URL)
Returns the decoded URL. Example:
SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
┌─DecodedURL─────────────────────────────┐
│ http://127.0.0.1:8123/?query=SELECT 1; │
└────────────────────────────────────────┘
netloc
Extracts network locality (username:password@host:port
) from a URL.
Syntax
netloc(URL)
Arguments
url
— URL. String.
Returned value
username:password@host:port
.
Type: String
.
Example
Query:
SELECT netloc('http://[email protected]:80/');
Result:
┌─netloc('http://[email protected]:80/')─┐
│ [email protected]:80 │
└───────────────────────────────────────────┘
cutWWW
Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present.
cutQueryString
Removes query string. The question mark is also removed.
cutFragment
Removes the fragment identifier. The number sign is also removed.
cutQueryStringAndFragment
Removes the query string and fragment identifier. The question mark and number sign are also removed.
cutURLParameter(URL, name)
Removes the name
parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. Client ID
and Client%20ID
are treated as different parameter names.
Syntax
cutURLParameter(URL, name)
Arguments
url
— URL. String.
Returned value
URL with
name
URL parameter removed.
Type: String
.
Example
Query:
SELECT
cutURLParameter('http://bigmir.net/?a=b&c=d&e=f#g', 'a') as url_without_a,
cutURLParameter('http://bigmir.net/?a=b&c=d&e=f#g', ['c', 'e']) as url_without_c_and_e;
Result:
┌─url_without_a────────────────┬─url_without_c_and_e──────┐
│ http://bigmir.net/?c=d&e=f#g │ http://bigmir.net/?a=b#g │
└──────────────────────────────┴──────────────────────────┘
UUID
The functions for working with UUID are listed below.
generateUUIDv4
Generates the UUID of version 4.
Syntax
generateUUIDv4([x])
Arguments
x
— Expression resulting in any of the supported data types. The resulting value is discarded, but the expression itself if used for bypassing common subexpression elimination if the function is called multiple times in one query. Optional parameter.
Returned value
The UUID type value.
Usage example
This example demonstrates creating a table with the UUID type column and inserting a value into the table.
CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog
INSERT INTO t_uuid SELECT generateUUIDv4()
SELECT * FROM t_uuid
┌────────────────────────────────────x─┐
│ f4bf890f-f9dc-4332-ad5c-0c18e73f28e9 │
└──────────────────────────────────────┘
Usage example if it is needed to generate multiple values in one row
SELECT generateUUIDv4(1), generateUUIDv4(2)
┌─generateUUIDv4(1)────────────────────┬─generateUUIDv4(2)────────────────────┐
│ 2d49dc6e-ddce-4cd0-afb8-790956df54c1 │ 8abf8c13-7dea-4fdf-af3e-0e18767770e6 │
└──────────────────────────────────────┴──────────────────────────────────────┘
toUUID (x)
Converts String type value to UUID type.
toUUID(String)
Returned value
The UUID type value.
Usage example
SELECT toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0') AS uuid
┌─────────────────────────────────uuid─┐
│ 61f0c404-5cb3-11e7-907b-a6006ad3dba0 │
└──────────────────────────────────────┘
toUUIDOrNull (x)
It takes an argument of type String and tries to parse it into UUID. If failed, returns NULL.
toUUIDOrNull(String)
Returned value
The Nullable(UUID) type value.
Usage example
SELECT toUUIDOrNull('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
┌─uuid─┐
│ ᴺᵁᴸᴸ │
└──────┘
toUUIDOrZero (x)
It takes an argument of type String and tries to parse it into UUID. If failed, returns zero UUID.
toUUIDOrZero(String)
Returned value
The UUID type value.
Usage example
SELECT toUUIDOrZero('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
┌─────────────────────────────────uuid─┐
│ 00000000-0000-0000-0000-000000000000 │
└──────────────────────────────────────┘
UUIDStringToNum
Accepts string
containing 36 characters in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
, and returns a FixedString(16) as its binary representation, with its format optionally specified by variant
(Big-endian
by default).
Syntax
UUIDStringToNum(string[, variant = 1])
Arguments
string
— String of 36 characters or FixedString(36). String.variant
— Integer, representing a variant as specified by RFC4122. 1 =Big-endian
(default), 2 =Microsoft
.
Returned value
FixedString(16)
Usage examples
SELECT
'612f3c40-5d3b-217e-707b-6a546a3d7b29' AS uuid,
UUIDStringToNum(uuid) AS bytes
┌─uuid─────────────────────────────────┬─bytes────────────┐
│ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │ a/<@];!~p{jTj={) │
└──────────────────────────────────────┴──────────────────┘
SELECT
'612f3c40-5d3b-217e-707b-6a546a3d7b29' AS uuid,
UUIDStringToNum(uuid, 2) AS bytes
┌─uuid─────────────────────────────────┬─bytes────────────┐
│ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │ @</a;]~!p{jTj={) │
└──────────────────────────────────────┴──────────────────┘
UUIDNumToString
Accepts binary
containing a binary representation of a UUID, with its format optionally specified by variant
(Big-endian
by default), and returns a string containing 36 characters in text format.
Syntax
UUIDNumToString(binary[, variant = 1])
Arguments
binary
— FixedString(16) as a binary representation of a UUID.variant
— Integer, representing a variant as specified by RFC4122. 1 =Big-endian
(default), 2 =Microsoft
.
Returned value
String.
Usage example
SELECT
'a/<@];!~p{jTj={)' AS bytes,
UUIDNumToString(toFixedString(bytes, 16)) AS uuid
┌─bytes────────────┬─uuid─────────────────────────────────┐
│ a/<@];!~p{jTj={) │ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │
└──────────────────┴──────────────────────────────────────┘
SELECT
'@</a;]~!p{jTj={)' AS bytes,
UUIDNumToString(toFixedString(bytes, 16), 2) AS uuid
┌─bytes────────────┬─uuid─────────────────────────────────┐
│ @</a;]~!p{jTj={) │ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │
└──────────────────┴──────────────────────────────────────┘
Last updated