String functions

SQL Function Description

ASCII(string)

Returns the numeric value of the first character of string.

Returns NULL if string is NULL.

For example:

Returns

ASCII('abc')

97

ASCII(CAST(NULL AS VARCHAR))

NULL

CHAR_LENGTH(string)

CHARACTER_LENGTH(string)

Returns the number of characters in string.

CHR(integer)

Returns the ASCII character having the binary equivalent to integer.

If integer is larger than 255, we will get the modulus of integer divided by 255 first, and returns CHR of the modulus.

Returns NULL if integer is NULL.

For example:

Returns

CHR(97)

a

CHR(353)

(n.b. integer is higher than 255)

a

ASCII(CAST(NULL AS VARCHAR))

NULL

CONCAT(string1, string2, …)

Returns a string that concatenates string1, string2, …

Returns NULL if any argument is NULL.

For example:

Returns

CONCAT('AA', 'BB', 'CC')

AABBCC

CONCAT_WS(string1, string2, string3, …)

Returns a string that concatenates string2, string3, … with a separator string1.

The separator is added between the strings to be concatenated.

Returns NULL if string1 is NULL.

Compared with CONCAT, CONCAT_WS automatically skips NULL arguments.

For example:

Returns

CONCAT_WS('~', 'AA', CAST(NULL AS VARCHAR), 'BB', '', 'CC')

AA~BB~~CC

GROK(text_field, pattern)

Extract values out of text with grok patterns.

This function returns a map where the keys are specified in the pattern argument.

Grok provides a simplified pattern syntax and uses regular expressions behind the scenes to do the parsing. You can learn more about it here and here.

For example, given the the input record

{"raw_text":"[2022-06-10T21:26:22Z] \"GET\" 6977"}

The function:

GROK('raw_text', '[%{TIMESTAMP_ISO8601:timestamp}] "%{DATA:method}" %{NUMBER:bytes_sent}')

will return:

{"timestamp":"2002-06-10 21:26:22",
 "method":"GET",
 "bytes_sent":6977}

INITCAP(string)

Returns a new form of string with the first character of each word converted to uppercase and the rest characters to lowercase. Here a word means a sequences of alphanumeric characters.

UPPER(string)

Returns string in uppercase.

string1 || string2

Returns the concatenation of string1 and string2.

POSITION(string1 IN string2)

Returns the position (start from 1) of the first occurrence of string1 in string2; returns 0 if string1 can’t be found in string2.

TRIM([ BOTH | LEADING | TRAILING ] string1 FROM string2)

Returns a string that removes leading and/or trailing characters string2 from string1. By default, whitespaces at both sides are removed.

LOWER(string)

Returns string in lowercase.

LTRIM(string)

Returns a string that removes the left whitespaces from string.

For example:

Returns

LTRIM(' FOO')

FOO

RTRIM(string)

Returns a string that removes the right whitespaces from string.

For example:

Returns

RTRIM('FOO ')

FOO

REPEAT(string, integer)

Returns a string that repeats the base string integer times.

For example:

Returns

REPEAT('FOO bar_', 2)

FOO bar_FOO bar_

REGEXP_REPLACE(string1, string2, string3)

Returns a string from string1 with all the substrings that match a regular expression string2 consecutively being replaced with string3.

For example:

Returns

REGEXP_REPLACE('apple banana cherry', '\ba\w*\b', 'fruit')

The regex \ba\w*\b matches any word beginning with a

fruit banana cherry

OVERLAY(string1 PLACING string2 FROM integer1 [ FOR integer2 ])

Returns a string that replaces integer2 characters of string1 with string2 from position integer1. If integer2 is not specified then the length of string2 is used.

For example:

Returns

OVERLAY('WIBBLE' PLACING 'BAR' FROM 1)

BARBLE

OVERLAY('ABCDE' PLACING 'FOO' FROM 2)

AFOOE

OVERLAY('ABCDE' PLACING 'FOO' FROM 2 FOR 2)

AFOODE

SUBSTRING(string FROM integer1 [ FOR integer2 ])

Returns a substring of string starting from position integer1 with length integer2 (to the end by default).

REPLACE(string1, string2, string3)

Returns a new string which replaces all the occurrences of string2 with string3 (non-overlapping) from string1.

For example:

Returns

REPLACE('HELLO WORLD','HELLO','SALUTATIONS')

SALUTATIONS WORLD

REPLACE('foobar','foo','wibble')

wibblebar

REGEXP_EXTRACT(string1, string2[, integer])

Returns a string from string1 which extracted with a specified regular expression string2 and a regex match group index integer.

The regex match group index starts from 1 and 0 means matching the whole regex. In addition, the regex match group index shouldn’t exceed the number of the defined groups.

For example:

Returns

REGEXP_EXTRACT('foothebar', 'foo(.*?)(bar)', 1)

This creates two capture groups:

  1. Zero or more characters after foo and before bar

  2. The string bar

the

LPAD(string1, integer, string2)

Returns a new string from string1 left-padded with string2 to a length of integer characters. If the length of string1 is shorter than integer, returns string1 shortened to integer characters.

For example:

Returns

LPAD('hi',4,'??')

??hi

LPAD('hi',1,'??')

h

RPAD(string1, integer, string2)

Returns a new string from string1 right-padded with string2 to a length of integer characters. If the length of string1 is shorter than integer, returns string1 shortened to integer characters. For example:

Returns

RPAD('hi',4,'??')

hi??

RPAD('hi',1,'??')

h

DECODE(binary, string)

Decodes the binary argument into a STRING using the provided string character set (one of US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16).

If either argument is NULL, the result will also be NULL.

FROM_BASE64(string)

Returns the base64-decoded result from string.

Returns NULL if string is NULL.

For example:

Returns

FROM_BASE64('aGVsbG8gd29ybGQ=')

hello world

TO_BASE64(string)

Returns the base64-encoded result from string.

Returns NULL if string is NULL.

For example:

Returns

TO_BASE64('hello world')

aGVsbG8gd29ybGQ=

ENCODE(string1, string2)

Encodes the string1 into a BINARY using the provided string2 character set (one of US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16).

If either argument is NULL, the result will also be NULL.

INSTR(string1, string2)

Returns the position of the first occurrence of string2 in string1.

Returns NULL if any of arguments is NULL.

LEFT(string, integer)

Returns the leftmost integer characters from the string. Returns an empty string if integer is negative.

Returns NULL if any argument is NULL.

RIGHT(string, integer)

Returns the rightmost integer characters from the string. Returns an empty string if integer is negative.

Returns NULL if any argument is NULL.

LCOATE(string1, string2[, integer])

Returns the position of the first occurrence of string1 in string2 after position integer. Returns 0 if not found.

Returns NULL if any of arguments is NULL.

PARSE_URL(string1, string2[, string3])

Returns the specified string2 part from the URL in string1.

Valid values for string2 are

  • HOST

  • PATH

  • QUERY

    • The value of a particular query key can be extracted by providing the key as the third argument string3

  • REF

  • PROTOCOL

  • AUTHORITY

  • FILE

  • USERINFO

Returns NULL if any of arguments is NULL.

For example:

Returns

PARSE_URL('https://docs.decodable.co/welcome.html',
          'HOST')

docs.decodable.co

PARSE_URL('https://docs.decodable.co/welcome.html?foo=bar&zaphod=beeblebrox',
          'QUERY',
          'zaphod')

beeblebrox

REGEXP(string1, string2)

Returns TRUE if any (possibly empty) substring of string1 matches the Java regular expression string2, otherwise FALSE.

Returns NULL if any of arguments is NULL.

REVERSE(string)

Returns the reversed string.

Returns NULL if string is NULL.

SPLIT_INDEX(string1, string2, integer1)

Splits string1 by the delimiter string2, returns the integer1 (zero-based) string of the split strings.

Returns NULL if integer is negative.

Returns NULL if any of arguments is NULL.

For example:

Returns

SPLIT_INDEX('THIS IS A STRING',' ',1)

IS

STR_TO_MAP(string1[, string2, string3]])

Returns a MAP after splitting the string1 into key/value pairs using pair delimiter string2 and key-value delimiter string3.

The default delimiter by which the string is split is ,.

The default key-value delimiter is =.

Returns

STR_TO_MAP('FOO=BAR,WIBBLE=SNARF')

{WIBBLE=SNARF, FOO=BAR}

STR_TO_MAP('FOO/BAR!WIBBLE/SNARF','!' ,'/')

{WIBBLE=SNARF, FOO=BAR}

SUBSTR(string [, integer1[, integer2]])

Returns a substring of string starting from position integer1 with length integer2 (to the end by default).

IP_COUNTRY(ip_address)

Returns the 2 character ISO country code for a given ip address.

Returns NULL if a country can’t be found or the ip address is invalid.

For example:

Returns

IP_COUNTRY('108.162.167.61')

CA