Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › textFile Vs wholeTextFile in Spark
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:31 pm #5183DataFlair TeamSpectator
Explain textFile Vs wholeTextFile in Spark
-
September 20, 2018 at 2:31 pm #5186DataFlair TeamSpectator
-
<li style=”list-style-type: none”>
- Both are the method of org.apache.spark.SparkContext.
textFile() :
-
<li style=”list-style-type: none”>
- def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String]
- Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings
- For example sc.textFile(“/home/hdadmin/wc-data.txt”) so it will create RDD in which each individual line an element.
- Everyone knows the use of textFile.
wholeTextFiles() :
-
<li style=”list-style-type: none”>
- def wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions): RDD[(String, String)]
- Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
- Rather than create basic RDD, the wholeTextFile() returns pairRDD.
- For example, you have few files in a directory so by using wholeTextFile() method,
it creates pair RDD with filename with path as key,
and value being the whole file as string
val myfilerdd = sc.wholeTextFiles("/home/hdadmin/MyFiles") val keyrdd = myfilerdd.keys keyrdd.collect val filerdd = myfilerdd.values filerdd.collect
Output :
Array[String] = Array(
file:/home/hdadmin/MyFiles/JavaSparkPi.java,
file:/home/hdadmin/MyFiles/sumnumber.txt,
file:/home/hdadmin/MyFiles/JavaHdfsLR.java,
file:/home/hdadmin/MyFiles/JavaPageRank.java,
file:/home/hdadmin/MyFiles/JavaLogQuery.java,
file:/home/hdadmin/MyFiles/wc-data.txt,
file:/home/hdadmin/MyFiles/nosum.txt)Array[String] =
Array(“/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the “License”); you may not use this file except in compliance with
* the License. You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an “AS IS” BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* …
-
-
AuthorPosts
- You must be logged in to reply to this topic.