User-defined Functions
Create, deploy, and manage custom Phoenix UDFs with configuration, implementation steps, and known limitations.
As of Phoenix 4.4.0 we have added the ability to allow users to create and deploy their own custom or domain-specific UDFs to the cluster.
Overview
Users can create temporary or permanent user-defined (domain-specific) scalar functions.
UDFs can be used like built-in functions in queries such as SELECT, UPSERT, DELETE, and when creating functional indexes.
Temporary functions are session-scoped and are not accessible from other sessions.
Permanent function metadata is stored in the SYSTEM.FUNCTION table.
Phoenix also supports tenant-specific functions. Functions created in one tenant-specific connection are not visible to other tenant-specific connections.
Only global tenant (no-tenant) functions are visible to all connections.
Phoenix leverages the HBase dynamic class loader to load UDF JARs from HDFS at the Phoenix client and region server without restarting services.
Configuration
Add the following parameters to hbase-site.xml on the Phoenix client:
<property>
<name>phoenix.functions.allowUserDefinedFunctions</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>${hbase.tmp.dir}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default, we write
to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
so change this configuration or else all data will be lost on
machine restart.</description>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib</value>
<description>
The directory from which the custom udf jars can be loaded
dynamically by the phoenix client/region server without the need to restart. However,
an already loaded udf class would not be un-loaded. See
HBASE-1936 for more details.
</description>
</property>The last two configuration values should match the HBase server-side configuration.
As with other configuration properties, phoenix.functions.allowUserDefinedFunctions
may be specified at JDBC connection time as a connection property.
Example:
Properties props = new Properties();
props.setProperty("phoenix.functions.allowUserDefinedFunctions", "true");
Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost", props);The following optional parameter is used by the dynamic class loader to copy JARs from HDFS into the local filesystem:
<property>
<name>hbase.local.dir</name>
<value>${hbase.tmp.dir}/local/</value>
<description>Directory on the local filesystem to be used
as a local storage.</description>
</property>Creating Custom UDFs
Implement your custom UDF by following How to write custom UDF.
Compile your code into a JAR, then deploy the JAR to HDFS. It is recommended
to add the JAR to the HDFS directory configured by hbase.dynamic.jars.dir.
CREATE FUNCTION query.Dropping the UDFs
You can drop functions using the DROP FUNCTION query.
Dropping a function deletes the metadata for that function from Phoenix.
How to write custom UDF
You can follow these steps to write your UDF (for more detail, see this blog post):
- Create a new class derived from
org.apache.phoenix.expression.function.ScalarFunction. - Implement
getDataType()to determine the function return type. - Implement
evaluate()to calculate the result for each row. The method receivesorg.apache.phoenix.schema.tuple.Tuplewith the current row state and anorg.apache.hadoop.hbase.io.ImmutableBytesWritableto populate with the function result. The method returnsfalseif there is not enough information to calculate the result (usually because one argument is unknown), andtrueotherwise.
Below are additional optimization-related steps.
-
To contribute to scan start/stop key formation, custom functions need to override the following two methods from
ScalarFunction:/** * Determines whether or not a function may be used to form * the start/stop key of a scan * @return the zero-based position of the argument to traverse * into to look for a primary key column reference, or * {@value #NO_TRAVERSAL} if the function cannot be used to * form the scan key. */ public int getKeyFormationTraversalIndex() { return NO_TRAVERSAL; } /** * Manufactures a KeyPart used to construct the KeyRange given * a constant and a comparison operator. * @param childPart the KeyPart formulated for the child expression * at the {@link #getKeyFormationTraversalIndex()} position. * @return the KeyPart for constructing the KeyRange for this * function. */ public KeyPart newKeyPart(KeyPart childPart) { return null; } -
Additionally, to enable
ORDER BYoptimization or in-placeGROUP BY, override:/** * Determines whether or not the result of the function invocation * will be ordered in the same way as the input to the function. * Returning YES enables an optimization to occur when a * GROUP BY contains function invocations using the leading PK * column(s). * @return YES if the function invocation will always preserve order for * the inputs versus the outputs and false otherwise, YES_IF_LAST if the * function preserves order, but any further column reference would not * continue to preserve order, and NO if the function does not preserve * order. */ public OrderPreserving preservesOrder() { return OrderPreserving.NO; }
Limitations
- The JAR containing UDFs must be manually added to and deleted from HDFS. There is ongoing work to add SQL statements for JAR add/remove (PHOENIX-1890).
- The dynamic class loader copies UDF JARs to
{hbase.local.dir}/jarsat the Phoenix client and region server when a UDF is used in queries. These JARs must be deleted manually when a function is deleted. - Functional indexes must be rebuilt manually if the function implementation changes (PHOENIX-1907).
- Once loaded, a JAR is not unloaded. Use a different JAR for modified implementations to avoid restarting the cluster (PHOENIX-1907).
- To list functions, query the
SYSTEM."FUNCTION"table (PHOENIX-1921).