Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What are benefits of DataFrame in Spark?
- This topic has 1 reply, 1 voice, and was last updated 6 years ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 9:48 pm #6412DataFlair TeamSpectator
What are the advantages of DataFrame in Apache Spark?
-
September 20, 2018 at 9:49 pm #6413DataFlair TeamSpectator
Following are the Benefits of DataFrames.
1.DataFrame is distributed collection of data. In DataFrames, data is organized in named column.
2. They are conceptually similar to a table in a relational database. Also, have richer optimizations.
3. DataFrames empower SQL queries and the DataFrame API.
4. we can process both structured and unstructured data formats through it. Such as: Avro, CSV, elastic search, and Cassandra. Also, it deals with storage systems HDFS, HIVE tables, MySQL, etc.
5. In DataFrames, Catalyst supports optimization(catalyst Optimizer). There are general libraries available to represent trees. In four phases, DataFrame uses Catalyst tree transformation:
– Analyze logical plan to solve references
– Logical plan optimization
– Physical planning
– Code generation to compile part of a query to Java bytecode.6. The DataFrame API’s are available in various programming languages. For example Java, Scala, Python, and R.
7. It provides Hive compatibility. We can run unmodified Hive queries on existing Hive warehouse.
8. It can scale from kilobytes of data on the single laptop to petabytes of data on a large cluster.
9. DataFrame provides easy integration with Big data tools and framework via Spark core.
There are much more to know about DataFrames. Follow link: Spark SQL DataFrame Tutorial
-
-
AuthorPosts
- You must be logged in to reply to this topic.