It returns a new dataset which is formed by selecting those elements of source on which function returns true. It returns those elements only that satisfy a predicate. The predicate is a function that accepts parameter and returns Boolean value either true or false. It keeps only those elements which pass/satisfies the condition and filter out those which don’t. so the new RDD will be set of those elements for which function returns true.