What are benefits of DataFrame in Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6412
      DataFlair Team
      Moderator

      What are the advantages of DataFrame in Apache Spark?

    • #6413
      DataFlair Team
      Moderator

      Following are the Benefits of DataFrames.

      1.DataFrame is distributed collection of data. In DataFrames, data is organized in named column.

      2. They are conceptually similar to a table in a relational database. Also, have richer optimizations.

      3. DataFrames empower SQL queries and the DataFrame API.

      4. we can process both structured and unstructured data formats through it. Such as: Avro, CSV, elastic search, and Cassandra. Also, it deals with storage systems HDFS, HIVE tables, MySQL, etc.

      5. In DataFrames, Catalyst supports optimization(catalyst Optimizer). There are general libraries available to represent trees. In four phases, DataFrame uses Catalyst tree transformation:

      – Analyze logical plan to solve references
      – Logical plan optimization
      – Physical planning
      – Code generation to compile part of a query to Java bytecode.

      6. The DataFrame API’s are available in various programming languages. For example Java, Scala, Python, and R.

      7. It provides Hive compatibility. We can run unmodified Hive queries on existing Hive warehouse.

      8. It can scale from kilobytes of data on the single laptop to petabytes of data on a large cluster.

      9. DataFrame provides easy integration with Big data tools and framework via Spark core.

      There are much more to know about DataFrames. Follow link: Spark SQL DataFrame Tutorial

Viewing 1 reply thread
  • You must be logged in to reply to this topic.