batch file - Slow performance in spark streaming -
i using spark streaming 1.1.0 locally (not in cluster). created simple app parses data (about 10.000 entries), stores in stream , makes transformations on it. here code:
def main(args : array[string]){ val master = "local[8]" val conf = new sparkconf().setappname("tester").setmaster(master) val sc = new streamingcontext(conf, milliseconds(110000)) val stream = sc.receiverstream(new myreceiver("localhost", 9999)) val parsedstream = parse(stream) parsedstream.foreachrdd(rdd => println(rdd.first()+"\nrule starts "+system.currenttimemillis())) val result1 = parsedstream .filter(entry => entry.symbol.contains("walking") && entry.symbol.contains("true") && entry.symbol.contains("id0")) .map(_.time) val result2 = parsedstream .filter(entry => entry.symbol == "disappear" && entry.symbol.contains("id0")) .map(_.time) val result3 = result1 .transformwith(result2, (rdd1, rdd2: rdd[int]) => rdd1.subtract(rdd2)) result3.foreachrdd(rdd => println(rdd.first()+"\nrule ends "+system.currenttimemillis())) sc.start() sc.awaittermination() } def parse(stream: dstream[string]) = { stream.flatmap { line => val entries = line.split("assert").filter(entry => !entry.isempty) entries.map { tuple => val pattern = """\s*[(](.+)[,]\s*([0-9]+)+\s*[)]\s*[)]\s*[,|\.]\s*""".r tuple match { case pattern(symbol, time) => new data(symbol, time.toint) } } } } case class data (symbol: string, time: int)
i have batch duration of 110.000 milliseconds in order receive data in 1 batch. believed that, locally, spark fast. in case, takes 3.5sec execute rule (between "rule starts" , "rule ends"). doing wrong or expected time? advise
so using case matching in allot of jobs , killed performance, more when introduced json parser. try tweaking batch time on streamingcontext. made quite bit of difference me. how many local workers have?
Comments
Post a Comment