In how many ways can we create RDDs in Apache Spark? Explain.

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark In how many ways can we create RDDs in Apache Spark? Explain.

Viewing 1 reply thread
  • Author
    Posts
    • #5895
      DataFlair TeamDataFlair Team
      Spectator

      List the ways of creating RDDs in Spark.
      Describe how RDDs are created in Apache Spark.
      How can we create RDD in Apache Spark?

    • #5896
      DataFlair TeamDataFlair Team
      Spectator

      These are three methods to create the RDD.

      1.The first method is used when data is already available with the external systems like local filesystem, HDFSHBase
      RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.

      scala> val data = sc.textFile("File1.txt")
      sc is the object of SparkContext
      You need to create a file File1.txt in Spark_Home directory

      2.The second approach can be used with the existing collections

      scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
      scala> val rdd1 = sc.parallelize(arr1)

      3.The third one is a way to create new RDD from the existing one.

      scala> val newRDD = arr1.map(data => (data * 2))

      See in depth: Creating RDD in Spark

Viewing 1 reply thread
  • You must be logged in to reply to this topic.