DataFrame: A Data Frame is used for storing data into tables. It is equivalent to a table in a relational database but with richer optimization. It is a data abstraction and domain-specific language (DSL) applicable on structure and semi-structured data. It is distributed collection of data in the form of named column and row. It has a matrix-like structure whose column may be different types (numeric, logical, factor, or character ).we can say data frame has two-dimensional array like structure where each column contains the value of one variable and row contains one set of values for each column. It combines feature of list and matrices.
RDD is the representation of set of records, immutable collection of objects with distributed computing. RDD is large collection of data or RDD is an array of reference of partitioned objects. Each and every datasets in RDD is logically partitioned across many servers so that they can be computed on different nodes of the cluster. RDDs are fault tolerant i.e. self-recovered/recomputed in the case of failure. The dataset could be data loaded externally by the users which can be in the form of JSON file, CSV file, text file or database via JDBC with no specific data structure.