Introduction DataFrame consists of two words data and frame, means data has to be fit in some kind of frame. We can understand a frame as a schema of the relational database.
In Spark, DataFrame is a collection of distributed data over the network with some schema. We can understand it as the data formatted as row/column manner. DataFrame can be created from Hive data, JSON file, CSV, Structured data or raw data that can be framed in structured data. We can also create a DataFrame from RDD if some schema can be applied on that RDD.
Temporary view or table can also be created from DataFrame as it has data and schema. We can also run SQL query on created table/view to get the faster result.
It is also evaluated lazily (Lazy Evaluation) for better resource utilization.