What is the difference between schema on Read and Schema on Write?

This topic has 4 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 4:17 pm #5816
  
  DataFlair Team
  Spectator
  
  What is the difference between schema on Read and Schema on Write?
  In Hive why schema on Read has been preferred over Schema on Write?
- September 20, 2018 at 4:17 pm #5817
  
  DataFlair Team
  Spectator
  
  Schema-on-Write (RDBMS):
  
  Prescriptive Data Modeling:
  Create static DB schema
  Transform data into RDBMS
  Query data in RDBMS format
  New columns must be added explicitly before new data can propagate into the system.
  Good for Known Unknowns (Repetition)
  Schema-on-Read (Hadoop):
  
  Descriptive Data Modeling:
  Copy data in its native format
  Create schema + parser
  Query Data in its native format (does ETL on the fly)
  New data can start flowing any time and will appear retroactively once the schema/parser properly describes it.
  Good for Unknown Unknowns (Exploration)
- September 20, 2018 at 4:17 pm #5819
  
  DataFlair Team
  Spectator
  
  Schema-on-Read vs. Schema-on-Write:
  
  In traditional databases, the table’s schema is imposed during the data load time, if the data being loaded does not conform to the schema then the data load is rejected, this process is know as Schema-on-Write. Here the data is being checked against the schema when written into the database(during data load).
  
  Now in HIVE, the data schema is not verified during the load time, rather it is verified while processing the query. Hence this process in HIVE called Schema-on-Read.
  
  Now, which way is better? Schema-on-Read or Schema-on-Write?
  
  Schema-on-Read:
  Schema-on-Read helps in very fast initial data load, since the data does not have to follow any internal schema(internal database format) to read or parse or serialize, as it is just a copy/move of a file.
  This type of movement of data is more flexible incase of huge data or having two schemas for same underlying data.
  
  Schema-on-Write:
  Schema-on-Write helps in faster performance of the query, as the data is already loaded in a particular format and it is easy to locate the column index or compress the data. However, it takes longer time to load data into the database.
  
  So, in scenarios of large data load or where the schema is not known at load time and there are no indexes to apply, as the query is not formulated, Schema-on-Read(property of HIVE) is more efficient than Schema-on-write.
- September 20, 2018 at 4:17 pm #5820
  
  DataFlair Team
  Spectator
  
  Schema on write is defined as creating a schema for data before writing into the database.
  The schema; the columns, rows, tables, and relationships are all defined first for the specific purpose that database will serve
  
  The main advantages of schema on write are precision and query speed..
  
  Schema on read differs from schema on write because schema only created when reading the data. Structured is applied to the data only when it’s read, this allows unstructured data to be stored in the database. Since it’s not necessary to define the schema before storing the data it makes it easier to bring in new data sources on the fly.
  
  The main advantages of schema on read are flexibility in purpose and query power.
- September 20, 2018 at 4:18 pm #5822
  
  DataFlair Team
  Spectator
  
  Schema on read Vs Schema on write.
  
  In traditional way(RDBMS), Before storing data, a schema is created for a data before it is written into the database. So suppose we create a Schema consisting of 10 Columns and we try to load data which could satisfy only 9 columns that data would be rejected because of Schema on write, here the data is read against schema before it is written to the database.
  The main advantages of RDBMS are that its Query response time is immediate and precision.
  
  With HIVE, we have Schema on read, which means the data is not verified before loading but rather when a query is issued, here we have very fast initial load as the data is not read.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

What is the difference between schema on Read and Schema on Write?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses