Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark In how many ways can we create RDDs in Apache Spark? Explain.

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 7 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #5895


    List the ways of creating RDDs in Spark.
    Describe how RDDs are created in Apache Spark.
    How can we create RDD in Apache Spark?



    These are three methods to create the RDD.

    1.The first method is used when data is already available with the external systems like local filesystem, HDFSHBase
    RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.

    scala> val data = sc.textFile("File1.txt")
    sc is the object of SparkContext
    You need to create a file File1.txt in Spark_Home directory

    2.The second approach can be used with the existing collections

    scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    scala> val rdd1 = sc.parallelize(arr1)

    3.The third one is a way to create new RDD from the existing one.

    scala> val newRDD = => (data * 2))

    See in depth: Creating RDD in Spark

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.