Does Spark Streaming provide a guarantee on the order of the date when reducing -


i wondering if when calling reducebykey in apache spark streaming order of records in stream guarantied. part of computation has last value.

here's example:

javapairdstream< string, double >  pairs; // ... pairs.reducebykey( new function2<double, double, double>() {              @override public double call(double first, double second) throws exception {                  return second;             }          }); 

no, isn't. intention of map reduce parallize tasks , when parallized cannot guarantee order. previous results might shuffled on way reduce processor. note reduce processor won't wait results arrive, justs grabs 2 values , starts reducing.

once created, distributed dataset (distdata) can operated on in parallel. example, might call distdata.reduce((a, b) => + b) to add elements of array.


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

nvd3.js - angularjs-nvd3-directives setting color in legend as well as in chart elements -