How Apache Pig deals with the schema and schema-less data?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:15 am #4608
  
  DataFlair Team
  Spectator
  
  How does Pig deal with the schema and schema-less data?
- September 20, 2018 at 11:16 am #4609
  
  DataFlair Team
  Spectator
  
  Yes, this is true that Apache Pig deals with both schema and schema-less data, but then the question arises, how? so for that explanation is:
  
  – While the schema only includes the field name, so the data type of field is considered as a byte array.
  – If we assign a name to the field we can access the field by both, either the field name or the positional notation, however, we can only access it by the positional notation i.e. $ followed by the index number, if the field name is missing.
  – While we perform any operation that is a combination of relations ( JOIN, COGROUP, and many more.) although the resulting relation will have null schema if any of the relations is the missing schema.
  – Further, while the schema is null, then Pig will consider it as a byte array and also the real data type of field will be determined dynamically.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.