Why Is The Combine Function Called Three Times?
I'm trying to understand the combine transformer in a apache beam pipeline. Considering the following example pipeline: def test_combine(data): logging.info('test combine')
Solution 1:
It looks it's happening due to the MapReduce structure. When using Combiners, the output that one combiner has is used as a input.
As an example, imagine summing 3 numbers (1, 2, 3). The combiner MAY sum first 1 and 2 (3) and use that number as input with 3 (3 + 3 = 6). In your case [1, 2, 3]
seems to be used as an input in the next combiner.
An example that really helped me understand this:
p = beam.Pipeline()
defmake_list(elements):
print(elements)
return elements
(p | Create(range(30))
| beam.core.CombineGlobally(make_list))
p.run()
See that the element [1,..,10]
is used in the next combiner.
Post a Comment for "Why Is The Combine Function Called Three Times?"