What are the advantages of DataFrame in Apache Spark?

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 9:37 pm #6386
  
  DataFlair Team
  Spectator
  
  what are the features of dataframe in Spark?
  List out the characteristics of DataFrame in Apache Spark.
- September 20, 2018 at 9:38 pm #6387
  
  DataFlair Team
  Spectator
  
  Introduction
  DataFrames are the distributed collection of data. In DataFrame, data is organized into named columns. It is conceptually similar to a table in a relational database.
  we can construct DataFrames from a wide array of sources. Such as structured data files, tables in Hive, external databases, or existing RDDs.
  
  As same as RDDs, DataFrames are evaluated lazily(Lazy Evaluation). In other words, computation only happens when an action (e.g. display result, save output) is required.
  
  Out of the box, DataFrame supports reading data from the most popular formats, including JSON files, Parquet files, Hive tables. Also, can read from distributed file systems (HDFS), local file systems, cloud storage (S3), and external relational database systems through JDBC. In addition, through Spark SQL’s external data sources API, DataFrames can be extended to support any third-party data formats or sources. Existing third-party extensions already include Avro, CSV, ElasticSearch, and Cassandra.
  
  There is much more to know about DataFrames. Refer link: Spark SQL DataFrame Tutorial – An Introduction to DataFrame
- September 20, 2018 at 9:38 pm #6388
  
  DataFlair Team
  Spectator
  
  <div class=”threadauthor”>
  
  pratapajay
  <small>Member</small>
  
  </div>
  <div class=”threadpost”>
  <div class=”post”>
  
  DataFrame = Framing the data (of course we are framing it like relational table for better performance)
  A DataFrame is a distributed collection of data organised in row/column manner. Conceptually it is like a relational database. We can create DataFrame from different types of data(Hive data, JSON, CSV, structured files, relation database, RDD(provided we can map the data to a schema))
  We can create a temporary table/view out of DataFrame and run SQL query on this. DataFrame consists of data and schema together so we can run the SQL query to get faster results.
  It is also evaluated lazily(lazy Evaluation) like RDD for optimized use of resources.
  
  go to the link, for the complete introduction to DataFrames. DataFrame
  
  </div>
  </div>
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What are the advantages of DataFrame in Apache Spark?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR