In how many ways can we create RDDs in Apache Spark? Explain.

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:34 pm #5895
  
  DataFlair Team
  Spectator
  
  List the ways of creating RDDs in Spark.
  Describe how RDDs are created in Apache Spark.
  How can we create RDD in Apache Spark?
- September 20, 2018 at 4:34 pm #5896
  DataFlair Team
  Spectator
  These are three methods to create the RDD.
  
  1.The first method is used when data is already available with the external systems like local filesystem, HDFS, HBase
  RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.
  
  scala> val data = sc.textFile("File1.txt")
  sc is the object of SparkContext
  You need to create a file File1.txt in Spark_Home directory
  
  2.The second approach can be used with the existing collections
```
scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val rdd1 = sc.parallelize(arr1)
```
  3.The third one is a way to create new RDD from the existing one.
  
  scala> val newRDD = arr1.map(data => (data * 2))
  
  See in depth: Creating RDD in Spark
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.