

{"id":2603,"date":"2017-05-16T12:34:25","date_gmt":"2017-05-16T12:34:25","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2603"},"modified":"2018-11-16T17:38:29","modified_gmt":"2018-11-16T12:08:29","slug":"apache-spark-sql-dataframe-tutorial","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/","title":{"rendered":"Spark SQL DataFrame Tutorial &#8211; An Introduction to DataFrame"},"content":{"rendered":"<h2>1. Objective<\/h2>\n<p>In this <strong>Spark SQL DataFrame<\/strong> tutorial, we will learn what is DataFrame in<strong> <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-tutorial-quickstart-introduction\/\">Apache Spark<\/a><\/strong> and the need of Spark Dataframe. The tutorial covers the limitation of Spark RDD and How DataFrame overcomes those limitations. How to create DataFrame in Spark, Various Features of DataFrame like <strong>Custom <\/strong>Memory Management,\u00a0Optimized Execution plan,<strong>\u00a0<\/strong>and its limitations are also covers in this Spark tutorial.<\/p>\n<p>So, let&#8217;s start Spark SQL DataFrame tutorial.<\/p>\n<div id=\"attachment_42476\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-42476\" class=\"size-full wp-image-42476\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg\" alt=\"Spark SQL DataFrame Tutorial - An Introduction to DataFrame\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-42476\" class=\"wp-caption-text\">Spark SQL DataFrame Tutorial &#8211; An Introduction to DataFrame<\/p><\/div>\n<h2>2. What is Spark SQL DataFrame?<\/h2>\n<p><strong>DataFrame<\/strong>\u00a0appeared in Spark Release 1.3.0. We can term DataFrame as Dataset organized into named columns. DataFrames are similar to the table in a relational database or data frame in <a href=\"http:\/\/data-flair.training\/blogs\/r-programming-tutorial\/\">R<\/a> \/Python. It can be said as a relational table with good optimization technique.<br \/>\nThe idea behind DataFrame is it allows processing of a large amount of structured data. DataFrame contains rows with Schema. The <strong>schema<\/strong> is the illustration of the structure of data.<br \/>\nDataFrame in Apache Spark prevails over <a href=\"http:\/\/data-flair.training\/blogs\/rdd-in-apache-spark\/\">RDD<\/a> but contains the <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-features\/\">features of RDD<\/a> as well. The features common to RDD and DataFrame are\u00a0<strong>immutability<\/strong>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-in-memory-computing\/\">in-memory<\/a><\/strong>, resilient, distributed computing capability. It allows the user to impose the structure onto a distributed collection of data. Thus provides higher level abstraction.<br \/>\nWe can build DataFrame from different data sources. For Example structured data file, tables in <a href=\"http:\/\/data-flair.training\/blogs\/apache-hive-tutorial-introductory-guide\/\">Hive<\/a>, external databases or existing RDDs. The Application Programming Interface (APIs) of DataFrame is available in various languages. Examples include\u00a0<a href=\"http:\/\/data-flair.training\/blogs\/why-you-should-learn-scala-introductory-tutorial\/\">Scala<\/a>, Java, Python, and R.<br \/>\nBoth in Scala and Java, we represent DataFrame as Dataset of rows. In the Scala API, DataFrames are type alias of Dataset[Row]. In Java API, the user uses Dataset&lt;Row&gt; to represent a DataFrame.<\/p>\n<h2>3. Why DataFrame?<\/h2>\n<p>DataFrame is one step ahead of <strong>RDD. <\/strong>Since\u00a0it provides memory management and optimized execution plan.<br \/>\n<strong>a. Custom Memory Management:<\/strong> This is also known as Project <strong>Tungsten.<\/strong>\u00a0A lot of memory is saved as the data is stored in off-heap memory in binary format. Apart from this, there is no Garbage Collection overhead. Expensive Java serialization is also avoided. Since the data is stored in binary format and the schema of memory is known.<br \/>\n<strong>b. Optimized Execution plan:<\/strong> This is also known as the <strong>query optimizer<\/strong>. Using this, an optimized execution plan is created for the execution of a query. Once the optimized plan is created final execution takes place on RDDs of Spark.<\/p>\n<h2>4. Features of Apache Spark DataFrame<\/h2>\n<p>Some of the <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-limitations\/\">limitations of Spark RDD<\/a> were-<\/p>\n<ul>\n<li>It does not have any built-in optimization engine.<\/li>\n<li>There is no provision to handle structured data.<\/li>\n<\/ul>\n<p>Thus, to overcome these limitations the picture of DataFrame came into existence. Some of the key features of DataFrame in Spark are:<br \/>\ni. DataFrame is a distributed collection of data organized in named column. It is equivalent to the table in RDBMS.<br \/>\nii. It can deal with both structured and unstructured data formats. For Example Avro, CSV, elastic search, and Cassandra. It also deals with storage systems <a href=\"http:\/\/data-flair.training\/blogs\/apache-hadoop-hdfs-introduction-tutorial\/\">HDFS<\/a>, HIVE tables, MySQL, etc.<br \/>\niii. Catalyst supports optimization. It has general libraries to represent trees. DataFrame uses\u00a0<strong>Catalyst tree transformation<\/strong>\u00a0in four phases:<\/p>\n<ul>\n<li>Analyze logical plan to solve references<\/li>\n<li>Logical plan optimization<\/li>\n<li>Physical planning<\/li>\n<li>Code generation to compile part of a query to Java bytecode.<\/li>\n<\/ul>\n<p>You can refer this guide to <a href=\"http:\/\/data-flair.training\/blogs\/spark-sql-optimization-catalyst-optimizer\/\">learn Spark SQL optimization phases\u00a0<\/a>in detail.<br \/>\niv. The DataFrame API\u2019s are available in various programming languages. For example Java, Scala, Python, and R.<br \/>\nv. It provides Hive compatibility. We can run unmodified Hive queries on existing Hive warehouse.<br \/>\nvi. It can scale from kilobytes of data on the single laptop to petabytes of data on a large cluster.<br \/>\nvii. DataFrame provides easy integration with<a href=\"http:\/\/data-flair.training\/blogs\/why-learn-big-data-use-cases\/\"> Big data<\/a> tools and framework via <strong>Spark core<\/strong>.<\/p>\n<h2>5. Creating DataFrames in Apache Spark<\/h2>\n<p>To all the <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-features\/\">functionality of Spark<\/a>, <strong>SparkSession<\/strong> class is the entry point. For the creation of basic SparkSession just use<br \/>\n<em>SparkSession.builder() <\/em><br \/>\nUsing Spark Session, an application can create DataFrame from an existing RDD, Hive table or from Spark data sources. <strong><a href=\"http:\/\/data-flair.training\/blogs\/spark-sql-tutorial\/\">Spark SQL<\/a><\/strong> can operate on the variety of data sources using DataFrame interface. Using Spark SQL DataFrame we can create a temporary view. In the temporary view of dataframe, we can run the SQL query on the data.<\/p>\n<h2>6. Limitations of DataFrame in Spark<\/h2>\n<ul>\n<li>Spark SQL DataFrame API does not have provision for <strong>compile time type safety<\/strong>. So, if the structure is unknown, we cannot manipulate the data.<\/li>\n<li>Once we convert the domain object into data frame, the regeneration of domain object is not possible.<\/li>\n<\/ul>\n<h2>7. Conclusion<\/h2>\n<p>Hence, DataFrame API in Spark SQL improves the performance and scalability of Spark. It avoids the garbage-collection cost of constructing individual objects for each row in the dataset.<br \/>\nThe Spark DataFrame API is different from the RDD API because it is an API for building a relational query plan that Spark\u2019s Catalyst optimizer can then execute. This DataFrame API is good for developers who are familiar with building query plans. It is not good for the majority of developers.<br \/>\nTo Play with DataFrame in spark,\u00a0<a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-in-standalone-mode\/\">install Apache Spark in Standalone mode<\/a> and <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-on-multi-node-cluster-step-by-step-guide\/\">Spark installation in the multi-node cluster<\/a>.<\/p>\n<p><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Apache_Spark\">Reference for Spark\u00a0<\/a><\/strong><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:1357,&quot;href&quot;:&quot;https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Apache_Spark&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20250922221612\\\/https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Apache_Spark&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-09 05:27:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-12 10:08:16&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-15 10:54:44&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-18 15:58:49&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-21 22:36:30&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-25 05:31:45&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-28 12:45:42&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-31 14:24:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-03 17:46:17&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-07 06:00:10&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-10 18:44:33&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-14 03:23:51&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-17 07:55:39&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-20 08:53:11&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-23 13:06:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-26 19:31:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-30 03:59:32&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-02 04:29:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-05 06:45:01&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-08 15:14:08&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-11 17:11:37&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-14 17:21:25&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-17 19:54:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-21 15:31:35&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-24 16:57:05&quot;,&quot;http_code&quot;:429},{&quot;date&quot;:&quot;2026-02-27 17:43:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-02 18:00:05&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-06 08:59:01&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-09 10:45:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-12 12:05:44&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-15 13:52:04&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-18 16:22:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-22 02:26:17&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-25 06:42:29&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-28 13:17:46&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-31 19:34:11&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-03 21:06:08&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-04-07 13:23:55&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-10 15:12:24&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-14 01:00:09&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-17 15:03:23&quot;,&quot;http_code&quot;:429},{&quot;date&quot;:&quot;2026-04-20 17:12:48&quot;,&quot;http_code&quot;:429},{&quot;date&quot;:&quot;2026-04-23 18:14:30&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-04-26 23:59:57&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-30 03:29:22&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-03 03:48:13&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-06 06:11:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-09 10:25:28&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-12 12:20:35&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-15 15:48:18&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-19 00:06:09&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-22 12:24:50&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-25 12:59:28&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-28 18:04:56&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-01 07:34:11&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-04 09:52:56&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-07 13:28:25&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-06-10 15:46:34&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-06-14 08:05:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-18 01:16:15&quot;,&quot;http_code&quot;:200}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-18 01:16:15&quot;,&quot;http_code&quot;:200},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In this Spark SQL DataFrame tutorial, we will learn what is DataFrame in Apache Spark and the need of Spark Dataframe. The tutorial covers the limitation of Spark RDD and How DataFrame&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42476,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[909,3535,4634,6055,7109,8271,15692],"class_list":["post-2603","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark-dataframe","tag-dataframe-in-apache-spark","tag-features-of-spark-dataframe","tag-how-to-create-dataframe-in-spark","tag-introduction-to-spark-dataframe","tag-limitations-of-spark-dataframe","tag-what-is-dataframe-in-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark SQL DataFrame Tutorial - An Introduction to DataFrame - DataFlair<\/title>\n<meta name=\"description\" content=\"Learn what is Dataframe in Apache Spark &amp; need of Dataframe, features of Dataframe, how to create dataframe in Spark &amp; limitations of Spark SQL DataFrame.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark SQL DataFrame Tutorial - An Introduction to DataFrame - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Learn what is Dataframe in Apache Spark &amp; need of Dataframe, features of Dataframe, how to create dataframe in Spark &amp; limitations of Spark SQL DataFrame.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-05-16T12:34:25+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-16T12:08:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark SQL DataFrame Tutorial - An Introduction to DataFrame - DataFlair","description":"Learn what is Dataframe in Apache Spark & need of Dataframe, features of Dataframe, how to create dataframe in Spark & limitations of Spark SQL DataFrame.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"Spark SQL DataFrame Tutorial - An Introduction to DataFrame - DataFlair","og_description":"Learn what is Dataframe in Apache Spark & need of Dataframe, features of Dataframe, how to create dataframe in Spark & limitations of Spark SQL DataFrame.","og_url":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-05-16T12:34:25+00:00","article_modified_time":"2018-11-16T12:08:29+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Spark SQL DataFrame Tutorial &#8211; An Introduction to DataFrame","datePublished":"2017-05-16T12:34:25+00:00","dateModified":"2018-11-16T12:08:29+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/"},"wordCount":870,"commentCount":3,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg","keywords":["Apache Spark DataFrame","dataframe in Apache Spark","Features of Spark DataFrame","How to create dataframe in Spark","Introduction to Spark DataFrame","Limitations of Spark DataFrame","What is Dataframe in Spark"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/","url":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/","name":"Spark SQL DataFrame Tutorial - An Introduction to DataFrame - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg","datePublished":"2017-05-16T12:34:25+00:00","dateModified":"2018-11-16T12:08:29+00:00","description":"Learn what is Dataframe in Apache Spark & need of Dataframe, features of Dataframe, how to create dataframe in Spark & limitations of Spark SQL DataFrame.","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-SQL-DataFrame-Tutorial-01.jpg","width":1200,"height":628,"caption":"Spark SQL DataFrame Tutorial - An Introduction to DataFrame"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Spark SQL DataFrame Tutorial &#8211; An Introduction to DataFrame"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2603"}],"version-history":[{"count":5,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2603\/revisions"}],"predecessor-version":[{"id":42477,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2603\/revisions\/42477"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42476"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}