Class/Object

org.apache.spark.rdd.mergejoin

MergeJoin

Related Docs: object MergeJoin | package mergejoin

Permalink

class MergeJoin[K, V, W, O] extends Iterator[O]

:: DeveloperApi ::

Merge-join implementation that will create a spill-able collection for the right-side to be iterated over for each matching key on the left side. This enables joins that don't require that values for any given key to be required to fit in memory, but it *will* try to buffer as many values as possible using Spark's built-in 'ExternalSorter', and it's a private class so that's why this class is packaged here.

There are numerous optimizations in place to try to minimize the work being done in the join:

1) Since the Joiner returns a value Iterator for each key, we don't need to invoke join logic on each value iteration-- instead we perform join logic for each unique key. This also allows each 'Joiner' to optimize via the methods below based on the join being performed.

2) In cases were we have a key on one side but not the other, we skip creation of the spillable collection and write the output tuples directly according to the Joiner's leftOuter/rightOuter method.

3) In cases where there are no values to emit for a particular key, the Joiner can emit an empty Iterator, in which case we will immediately move to the next key without emitting+filtering tuples for those values.

4) In cases where we have a key on both sides, we invoke the Joiner's inner method. The default implementation will create a spill-able collection for the right side that will buffer as many values as possible in memory before spilling to disk... so we only pay the penalty for spilling to disk on keys where it is absolutely necessary.

Annotations
@DeveloperApi()
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MergeJoin
  2. Iterator
  3. TraversableOnce
  4. GenTraversableOnce
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new MergeJoin(context: TaskContext, left: Iterator[(K, V)], right: Iterator[(K, W)], joiner: Joiner[K, V, W, O])(implicit ord: Ordering[K])

    Permalink

    context

    The TaskContext we are executing within

    left

    The left side of the join, pre-ordered by Ordering[K]

    right

    The right side of the join, pre-ordered by Ordering[K]

Type Members

  1. class GroupedIterator[B >: A] extends AbstractIterator[Seq[B]] with Iterator[Seq[B]]

    Permalink
    Definition Classes
    Iterator

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. def ++[B >: O](that: ⇒ GenTraversableOnce[B]): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  4. def /:[B](z: B)(op: (B, O) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  5. def :\[B](z: B)(op: (O, B) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  6. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  7. def addString(b: StringBuilder): StringBuilder

    Permalink
    Definition Classes
    TraversableOnce
  8. def addString(b: StringBuilder, sep: String): StringBuilder

    Permalink
    Definition Classes
    TraversableOnce
  9. def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder

    Permalink
    Definition Classes
    TraversableOnce
  10. def aggregate[B](z: ⇒ B)(seqop: (B, O) ⇒ B, combop: (B, B) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  11. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  12. def buffered: BufferedIterator[O]

    Permalink
    Definition Classes
    Iterator
  13. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  14. def collect[B](pf: PartialFunction[O, B]): Iterator[B]

    Permalink
    Definition Classes
    Iterator
    Annotations
    @migration
    Migration

    (Changed in version 2.8.0) collect has changed. The previous behavior can be reproduced with toSeq.

  15. def collectFirst[B](pf: PartialFunction[O, B]): Option[B]

    Permalink
    Definition Classes
    TraversableOnce
  16. def contains(elem: Any): Boolean

    Permalink
    Definition Classes
    Iterator
  17. def copyToArray[B >: O](xs: Array[B], start: Int, len: Int): Unit

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  18. def copyToArray[B >: O](xs: Array[B]): Unit

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  19. def copyToArray[B >: O](xs: Array[B], start: Int): Unit

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  20. def copyToBuffer[B >: O](dest: Buffer[B]): Unit

    Permalink
    Definition Classes
    TraversableOnce
  21. def corresponds[B](that: GenTraversableOnce[B])(p: (O, B) ⇒ Boolean): Boolean

    Permalink
    Definition Classes
    Iterator
  22. def count(p: (O) ⇒ Boolean): Int

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  23. var currentIterator: Iterator[O]

    Permalink
    Attributes
    protected
  24. def drop(n: Int): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  25. def dropWhile(p: (O) ⇒ Boolean): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  26. def duplicate: (Iterator[O], Iterator[O])

    Permalink
    Definition Classes
    Iterator
  27. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  28. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  29. def exists(p: (O) ⇒ Boolean): Boolean

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  30. def filter(p: (O) ⇒ Boolean): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  31. def filterNot(p: (O) ⇒ Boolean): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  32. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  33. def find(p: (O) ⇒ Boolean): Option[O]

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  34. def finish: Iterator[O]

    Permalink
    Attributes
    protected
  35. var finished: Boolean

    Permalink
    Attributes
    protected
  36. def flatMap[B](f: (O) ⇒ GenTraversableOnce[B]): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  37. def fold[A1 >: O](z: A1)(op: (A1, A1) ⇒ A1): A1

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  38. def foldLeft[B](z: B)(op: (B, O) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  39. def foldRight[B](z: B)(op: (O, B) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  40. def forall(p: (O) ⇒ Boolean): Boolean

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  41. def foreach[U](f: (O) ⇒ U): Unit

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  42. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  43. def grouped[B >: O](size: Int): GroupedIterator[B]

    Permalink
    Definition Classes
    Iterator
  44. def hasDefiniteSize: Boolean

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  45. def hasNext: Boolean

    Permalink
    Definition Classes
    MergeJoin → Iterator
  46. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  47. def indexOf[B >: O](elem: B): Int

    Permalink
    Definition Classes
    Iterator
  48. def indexWhere(p: (O) ⇒ Boolean): Int

    Permalink
    Definition Classes
    Iterator
  49. def isEmpty: Boolean

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  50. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  51. def isTraversableAgain: Boolean

    Permalink
    Definition Classes
    Iterator → GenTraversableOnce
  52. var leftRemaining: BufferedIterator[(K, V)]

    Permalink
    Attributes
    protected
  53. def length: Int

    Permalink
    Definition Classes
    Iterator
  54. def map[B](f: (O) ⇒ B): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  55. def max[B >: O](implicit cmp: Ordering[B]): O

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  56. def maxBy[B](f: (O) ⇒ B)(implicit cmp: Ordering[B]): O

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  57. def min[B >: O](implicit cmp: Ordering[B]): O

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  58. def minBy[B](f: (O) ⇒ B)(implicit cmp: Ordering[B]): O

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  59. def mkString: String

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  60. def mkString(sep: String): String

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  61. def mkString(start: String, sep: String, end: String): String

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  62. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  63. def next(): O

    Permalink
    Definition Classes
    MergeJoin → Iterator
  64. def nextIterator(): Iterator[O]

    Permalink
    Attributes
    protected
  65. def nonEmpty: Boolean

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  66. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  67. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  68. implicit val ord: Ordering[K]

    Permalink
    Attributes
    protected
  69. def padTo[A1 >: O](len: Int, elem: A1): Iterator[A1]

    Permalink
    Definition Classes
    Iterator
  70. def partition(p: (O) ⇒ Boolean): (Iterator[O], Iterator[O])

    Permalink
    Definition Classes
    Iterator
  71. def patch[B >: O](from: Int, patchElems: Iterator[B], replaced: Int): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  72. def product[B >: O](implicit num: Numeric[B]): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  73. def reduce[A1 >: O](op: (A1, A1) ⇒ A1): A1

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  74. def reduceLeft[B >: O](op: (B, O) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce
  75. def reduceLeftOption[B >: O](op: (B, O) ⇒ B): Option[B]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  76. def reduceOption[A1 >: O](op: (A1, A1) ⇒ A1): Option[A1]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  77. def reduceRight[B >: O](op: (O, B) ⇒ B): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  78. def reduceRightOption[B >: O](op: (O, B) ⇒ B): Option[B]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  79. def reversed: List[O]

    Permalink
    Attributes
    protected[this]
    Definition Classes
    TraversableOnce
  80. var rightRemaining: BufferedIterator[(K, W)]

    Permalink
    Attributes
    protected
  81. def sameElements(that: Iterator[_]): Boolean

    Permalink
    Definition Classes
    Iterator
  82. def scanLeft[B](z: B)(op: (B, O) ⇒ B): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  83. def scanRight[B](z: B)(op: (O, B) ⇒ B): Iterator[B]

    Permalink
    Definition Classes
    Iterator
  84. def seq: Iterator[O]

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  85. def size: Int

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  86. def slice(from: Int, until: Int): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  87. def sliding[B >: O](size: Int, step: Int): GroupedIterator[B]

    Permalink
    Definition Classes
    Iterator
  88. def span(p: (O) ⇒ Boolean): (Iterator[O], Iterator[O])

    Permalink
    Definition Classes
    Iterator
  89. def sum[B >: O](implicit num: Numeric[B]): B

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  90. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  91. def take(n: Int): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  92. def takeLeftValuesForKey(key: K): Iterator[(K, V)]

    Permalink
    Attributes
    protected
  93. def takeRightValuesForKey(key: K): Iterator[(K, W)]

    Permalink
    Attributes
    protected
  94. def takeWhile(p: (O) ⇒ Boolean): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  95. def to[Col[_]](implicit cbf: CanBuildFrom[Nothing, O, Col[O]]): Col[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  96. def toArray[B >: O](implicit arg0: ClassTag[B]): Array[B]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  97. def toBuffer[B >: O]: Buffer[B]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  98. def toIndexedSeq: IndexedSeq[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  99. def toIterable: Iterable[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  100. def toIterator: Iterator[O]

    Permalink
    Definition Classes
    Iterator → GenTraversableOnce
  101. def toList: List[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  102. def toMap[T, U](implicit ev: <:<[O, (T, U)]): Map[T, U]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  103. def toSeq: Seq[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  104. def toSet[B >: O]: Set[B]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  105. def toStream: Stream[O]

    Permalink
    Definition Classes
    Iterator → GenTraversableOnce
  106. def toString(): String

    Permalink
    Definition Classes
    Iterator → AnyRef → Any
  107. def toTraversable: Traversable[O]

    Permalink
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  108. def toVector: Vector[O]

    Permalink
    Definition Classes
    TraversableOnce → GenTraversableOnce
  109. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  110. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  111. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  112. def withFilter(p: (O) ⇒ Boolean): Iterator[O]

    Permalink
    Definition Classes
    Iterator
  113. def zip[B](that: Iterator[B]): Iterator[(O, B)]

    Permalink
    Definition Classes
    Iterator
  114. def zipAll[B, A1 >: O, B1 >: B](that: Iterator[B], thisElem: A1, thatElem: B1): Iterator[(A1, B1)]

    Permalink
    Definition Classes
    Iterator
  115. def zipWithIndex: Iterator[(O, Int)]

    Permalink
    Definition Classes
    Iterator

Inherited from Iterator[O]

Inherited from TraversableOnce[O]

Inherited from GenTraversableOnce[O]

Inherited from AnyRef

Inherited from Any

Ungrouped