Does Spark Streaming provide a guarantee on the order of the date when reducing -
i wondering if when calling reducebykey in apache spark streaming order of records in stream guarantied. part of computation has last value.
here's example:
javapairdstream< string, double > pairs; // ... pairs.reducebykey( new function2<double, double, double>() { @override public double call(double first, double second) throws exception { return second; } });
no, isn't. intention of map reduce parallize tasks , when parallized cannot guarantee order. previous results might shuffled on way reduce processor. note reduce processor won't wait results arrive, justs grabs 2 values , starts reducing.
once created, distributed dataset (distdata) can operated on in parallel. example, might call distdata.reduce((a, b) => + b) to add elements of array.
Comments
Post a Comment