Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark In how many ways can we create RDDs in Apache Spark? Explain.

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 7 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #5895

    dfbdteam5
    Moderator

    List the ways of creating RDDs in Spark.
    Describe how RDDs are created in Apache Spark.
    How can we create RDD in Apache Spark?

    #5896

    dfbdteam5
    Moderator

    These are three methods to create the RDD.

    1.The first method is used when data is already available with the external systems like local filesystem, HDFSHBase
    RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.

    scala> val data = sc.textFile("File1.txt")
    sc is the object of SparkContext
    You need to create a file File1.txt in Spark_Home directory

    2.The second approach can be used with the existing collections

    scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    scala> val rdd1 = sc.parallelize(arr1)

    3.The third one is a way to create new RDD from the existing one.

    scala> val newRDD = arr1.map(data => (data * 2))

    See in depth: Creating RDD in Spark

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.