Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Hive › How to implement a UDF – user define function in hive?
October 10, 2018 at 4:35 pm #6636DataFlair TeamBlocked
How to Write Hive UDF (User-Defined Functions)?
October 10, 2018 at 4:36 pm #6637DataFlair TeamBlocked
There are two different interfaces for writing Apache Hive UDF.
– Simple API
– Complex API
As long as our function reads and returns primitive types, we can use the simple API (org.apache.hadoop.hive.ql.exec.UDF). In other words, it means basic Hadoop & Hive writable types. Such as Text, IntWritable, LongWritable, DoubleWritable, etc.
Basically, with the simpler UDF API, building a Hive UDF involves little more than writing a class with one function (evaluate). However, let’s see an example to understand it well:
Simple API – Hive UDF Example
class SimpleUDFExample extends UDF
public Text evaluate(Text input)
return new Text(“Hello ” + input.toString());
However, to write code for objects that are not writable types. Like struct, map and array types. Hence the org.apache.hadoop.hive.ql.udf.generic. GenericUDF API offers a way.
In addition, for the function arguments, it needs us to manually manage object inspectors. Also, to verify the number and types of the arguments we receive. To be more specific, an object inspector offers a consistent interface for underlying object types. Hence, that different object implementation can all be accessed in a consistent way from within hive. For example, we could implement a struct as a Map so long as you provide a corresponding object inspector.
Moreover, with this API we need to implement three methods:
// this is like the evaluate method of the simple API. It takes the actual arguments and returns the result
abstract Object evaluate(GenericUDF.DeferredObject arguments);
// Doesn’t really matter, we can return anything but should be a string representation of the function.
abstract String getDisplayString(String children);
// called once, before any evaluate() calls. You receive an array of object inspectors that represent the arguments of the function
// this is where you validate that the function is receiving the correct argument types and the correct number of arguments.
abstract ObjectInspector initialize(ObjectInspector arguments);
To understand this properly, let’s take an example.
Complex API – Apache Hive UDF Example
Basically, here the creation of a function called containsString. However, it takes two arguments:
A list of Strings:
Further, it returns true/false on whether the list contains the string that we offer, for example:
containsString(List(“a”, “b”, “c”), “b”); // true
containsString(List(“a”, “b”, “c”), “d”); // false
However, there is much more to learn about it follow the link: Hive UDF – User Defined Function with Example
- You must be logged in to reply to this topic.