Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Hive › How to implement a UDF – user define function in hive?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
October 10, 2018 at 4:35 pm #6636DataFlair TeamSpectator
How to Write Hive UDF (User-Defined Functions)?
-
October 10, 2018 at 4:36 pm #6637DataFlair TeamSpectator
Hive UDF
There are two different interfaces for writing Apache Hive UDF.
– Simple API
– Complex APIAs long as our function reads and returns primitive types, we can use the simple API (org.apache.hadoop.hive.ql.exec.UDF). In other words, it means basic Hadoop & Hive writable types. Such as Text, IntWritable, LongWritable, DoubleWritable, etc.
Simple API
Basically, with the simpler UDF API, building a Hive UDF involves little more than writing a class with one function (evaluate). However, let’s see an example to understand it well:Simple API – Hive UDF Example
class SimpleUDFExample extends UDF
{
public Text evaluate(Text input)
{
return new Text(“Hello ” + input.toString());
}
}Complex API
However, to write code for objects that are not writable types. Like struct, map and array types. Hence the org.apache.hadoop.hive.ql.udf.generic. GenericUDF API offers a way.
In addition, for the function arguments, it needs us to manually manage object inspectors. Also, to verify the number and types of the arguments we receive. To be more specific, an object inspector offers a consistent interface for underlying object types. Hence, that different object implementation can all be accessed in a consistent way from within hive. For example, we could implement a struct as a Map so long as you provide a corresponding object inspector.
Moreover, with this API we need to implement three methods:
// this is like the evaluate method of the simple API. It takes the actual arguments and returns the result
abstract Object evaluate(GenericUDF.DeferredObject[] arguments);
// Doesn’t really matter, we can return anything but should be a string representation of the function.
abstract String getDisplayString(String[] children);
// called once, before any evaluate() calls. You receive an array of object inspectors that represent the arguments of the function
// this is where you validate that the function is receiving the correct argument types and the correct number of arguments.
abstract ObjectInspector initialize(ObjectInspector[] arguments);
To understand this properly, let’s take an example.
Complex API – Apache Hive UDF Example
Basically, here the creation of a function called containsString. However, it takes two arguments:
A list of Strings:
A String
Further, it returns true/false on whether the list contains the string that we offer, for example:containsString(List(“a”, “b”, “c”), “b”); // true
containsString(List(“a”, “b”, “c”), “d”); // false
However, there is much more to learn about it follow the link: Hive UDF – User Defined Function with Example
-
-
AuthorPosts
- You must be logged in to reply to this topic.