Skip to main content

External Function

External functions in Databend allow you to define custom operations for processing data. These functions are implemented using an external server in programming languages such as Python. They represent an advanced form of custom operation, relying on an external server to define and execute custom data processing operations. Key features of external functions include:

  • Scalability: External functions are well-suited for handling complex and resource-intensive data operations, making them suitable for demanding processing tasks.

  • External Libraries: They can utilize external libraries and dependencies, enhancing their capabilities and versatility by integrating additional functionality.

  • Advanced Logic: External functions can implement advanced and sophisticated data processing logic, making them ideal for complex data processing scenarios.

Supported Programming Languages

This table provides information on the supported languages and the necessary libraries to create External Functions in Databend.

LanguageRequired Library
Pythonudf.py: The library is not yet publicly available. Prior to its release, please download the 'udf.py' file from this link and ensure it is saved in the same directory as your Python script.

Managing External Functions

To manage external functions in Databend, use the following commands:

Usage Examples

This section demonstrates how to create an external function in each of the Supported Programming Languages.

Creating an External Function in Python

  1. Enable external server support by adding the following parameters to the [query] section in the databend-query.toml configuration file.
databend-query.toml
[query]
...
enable_udf_server = true
# List the allowed server addresses, separating multiple addresses with commas.
# For example, ['http://0.0.0.0:8815', 'http://example.com']
udf_server_allow_list = ['http://0.0.0.0:8815']
...
  1. Define your function. This code defines and runs an external server in Python, which exposes a custom function gcd for calculating the greatest common divisor of two integers and allows remote execution of this function:
external_function.py
from udf import *

@udf(
input_types=["INT", "INT"],
result_type="INT",
skip_null=True,
)
def gcd(x: int, y: int) -> int:
while y != 0:
(x, y) = (y, x % y)
return x

if __name__ == '__main__':
# create an external server listening at '0.0.0.0:8815'
server = UDFServer("0.0.0.0:8815")
# add defined functions
server.add_function(gcd)
# start the external server
server.serve()

@udf is a decorator used for defining external functions in Databend, supporting the following parameters:

ParameterDescription
input_typesA list of strings or Arrow data types that specify the input data types.
result_typeA string or an Arrow data type that specifies the return value type.
nameAn optional string specifying the function name. If not provided, the original name will be used.
io_threadsNumber of I/O threads used per data chunk for I/O bound functions.
skip_nullA boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False.

This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:

Databend TypePython Type
BOOLEANbool
TINYINT (UNSIGNED)int
SMALLINT (UNSIGNED)int
INT (UNSIGNED)int
BIGINT (UNSIGNED)int
FLOATfloat
DOUBLEfloat
DECIMALdecimal.Decimal
DATEdatetime.date
TIMESTAMPdatetime.datetime
VARCHARstr
VARIANTany
MAP(K,V)dict
ARRAY(T)list[T]
TUPLE(T...)tuple(T...)
  1. Run the Python file to start the external server:
python3 external_function.py
  1. Register the function gcd with the CREATE FUNCTION in Databend:
CREATE FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd' ADDRESS = 'http://0.0.0.0:8815'
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today