How to implement a UDF – user define function in hive?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Hive How to implement a UDF – user define function in hive?

Viewing 1 reply thread
  • Author
    Posts
    • #6636
      DataFlair TeamDataFlair Team
      Spectator

      How to Write Hive UDF (User-Defined Functions)?

    • #6637
      DataFlair TeamDataFlair Team
      Spectator

      Hive UDF

      There are two different interfaces for writing Apache Hive UDF.

      – Simple API
      – Complex API

      As long as our function reads and returns primitive types, we can use the simple API (org.apache.hadoop.hive.ql.exec.UDF). In other words, it means basic Hadoop & Hive writable types. Such as Text, IntWritable, LongWritable, DoubleWritable, etc.

      Simple API
      Basically, with the simpler UDF API, building a Hive UDF involves little more than writing a class with one function (evaluate). However, let’s see an example to understand it well:

      Simple API – Hive UDF Example

      class SimpleUDFExample extends UDF
      {
      public Text evaluate(Text input)
      {
      return new Text(“Hello ” + input.toString());
      }
      }

      Complex API

      However, to write code for objects that are not writable types. Like struct, map and array types. Hence the org.apache.hadoop.hive.ql.udf.generic. GenericUDF API offers a way.

      In addition, for the function arguments, it needs us to manually manage object inspectors. Also, to verify the number and types of the arguments we receive. To be more specific, an object inspector offers a consistent interface for underlying object types. Hence, that different object implementation can all be accessed in a consistent way from within hive. For example, we could implement a struct as a Map so long as you provide a corresponding object inspector.

      Moreover, with this API we need to implement three methods:

      // this is like the evaluate method of the simple API. It takes the actual arguments and returns the result

      abstract Object evaluate(GenericUDF.DeferredObject[] arguments);

      // Doesn’t really matter, we can return anything but should be a string representation of the function.

      abstract String getDisplayString(String[] children);

      // called once, before any evaluate() calls. You receive an array of object inspectors that represent the arguments of the function

      // this is where you validate that the function is receiving the correct argument types and the correct number of arguments.

      abstract ObjectInspector initialize(ObjectInspector[] arguments);

      To understand this properly, let’s take an example.

      Complex API – Apache Hive UDF Example

      Basically, here the creation of a function called containsString. However, it takes two arguments:

      A list of Strings:
      A String
      Further, it returns true/false on whether the list contains the string that we offer, for example:

      containsString(List(“a”, “b”, “c”), “b”); // true

      containsString(List(“a”, “b”, “c”), “d”); // false

      However, there is much more to learn about it follow the link: Hive UDF – User Defined Function with Example

Viewing 1 reply thread
  • You must be logged in to reply to this topic.