Package

com.hindog.spark

rdd

Permalink

package rdd

Spark merge-join capability for RDDs. See com.hindog.spark.rdd.PairRDDFunctions for further documentation

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. rdd
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class MergeJoinPartition[K, V, W](index: Int, left: Dependency[(K, V)], right: Dependency[(K, W)]) extends Partition with Product with Serializable

    Permalink

    :: @DeveloperApi ::

    :: @DeveloperApi ::

    Stores partition index and references to left and right partitions to be joined

    index

    The partition index

    left

    The left Dependency to be used in the join

    right

    The right Dependency to be used in the join

    Attributes
    protected
  2. class MergeJoinRDD[K, V, W, Out] extends RDD[Out]

    Permalink

    :: @DeveloperApi ::

    :: @DeveloperApi ::

    RDD implementation for merge-join that uses a shuffle to partition and sort by keys using an implicit Ordering for K, and then delegates to an instance of MergeJoin to perform the actual merge logic.

    There is an optimization in place to avoid a shuffle in some cases where left or right are guaranteed to be partition-sorted already (ie: via repartitionAndSortWithinPartitions)

    Annotations
    @DeveloperApi()
  3. class PairRDDFunctions[K, V] extends Serializable

    Permalink

    Merge-join operators that provide scalable equivalents to the existing Spark RDD join, leftOuterJoin, rightOuterJoin, fullOuterJoin operators.

    Merge-join operators that provide scalable equivalents to the existing Spark RDD join, leftOuterJoin, rightOuterJoin, fullOuterJoin operators.

    Refer to the documentation for MergeJoin for implementation details.

Value Members

  1. implicit def toPairRDDFunctions[K, V](rdd: RDD[(K, V)])(implicit arg0: ClassTag[K], arg1: ClassTag[V]): PairRDDFunctions[K, V]

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped