Scala UDFs
Scala UDFs must be configured before using them. For information about
Configuring User-defined Functions, see the System Administrator Guide. Set up Spark provider container through
iisuspark Command-- Download Spark Container Image from Repository and enable remote UDF processing in udf_engines.conf.
The function body must contain regular Scala 2 code, where the last statement equals the result of the UDF (no return keyword). Use the libraries, shipped with the official Spark distribution (https://spark.apache.org/releases/spark-release-3-5-1.html). Additionally, you can access Azure, AWS, GCS storage, Iceberg tables and Ingres or Vector through JDBC and the Spark Vector Connector. Currently, we do not support adding any additional third-party libraries not contained in the container.
Data Type Mapping
To write the Scala functions, the user must be aware of mapping SQL types to Scala types and vice versa:
In case of user defined NULL handling, all parameters as well as the return value are wrapped as Scala option.
Implementing UDF Functions
Apart from entering the regular Scala code, it is also possible to keep state between each internal call to this UDF for each Ingres in the table. This is done by implementing a pre-defined CacheAccess interface within the Scala body of the UDF, as shown below:
trait UdfCacheAccess extends Serializable {
def createCacheables(spark: SparkSession): Seq[(String, Any)]
This can be used by the UDF for one-time tasks, such as loading a pre-trained ML model, as shown below:
create or replace function scalaudf(a int) return(int) AS language scala source='
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.classification.LogisticRegressionModel
import org.apache.spark.SparkContext
import com.actian.spark_vector.provider.udf.UdfCacheAccess
val myCache = new UdfCacheAccess {
override def createCacheables(spark: SparkSession): Seq[(String, Any)] = {
val sc = spark.sparkContext
val model = LogisticRegressionModel.load(sc, "file:///opt/user_mount/udf/mllib_model")
Seq(("lmodel", model))
}
}
val v = Vectors.dense(a.toDouble)
val model = myCache.retrieveCacheable("lmodel").asInstanceOf[LogisticRegressionModel]
model.predict(v).toInt',varfi=true;\g
Last modified date: 01/27/2026