batch file - Slow performance in spark streaming -

- August 15, 2012

i using spark streaming 1.1.0 locally (not in cluster). created simple app parses data (about 10.000 entries), stores in stream , makes transformations on it. here code:

def main(args : array[string]){      val master = "local[8]"     val conf = new sparkconf().setappname("tester").setmaster(master)     val sc = new streamingcontext(conf, milliseconds(110000))      val stream = sc.receiverstream(new myreceiver("localhost", 9999))      val parsedstream = parse(stream)      parsedstream.foreachrdd(rdd =>          println(rdd.first()+"\nrule starts "+system.currenttimemillis()))      val result1 = parsedstream        .filter(entry => entry.symbol.contains("walking")         && entry.symbol.contains("true") && entry.symbol.contains("id0"))        .map(_.time)      val result2 = parsedstream        .filter(entry =>         entry.symbol == "disappear" && entry.symbol.contains("id0"))        .map(_.time)      val result3 = result1       .transformwith(result2, (rdd1, rdd2: rdd[int]) => rdd1.subtract(rdd2))      result3.foreachrdd(rdd =>      println(rdd.first()+"\nrule ends "+system.currenttimemillis()))     sc.start()    sc.awaittermination() }  def parse(stream: dstream[string]) = {      stream.flatmap { line =>         val entries = line.split("assert").filter(entry => !entry.isempty)         entries.map { tuple =>              val pattern = """\s*[(](.+)[,]\s*([0-9]+)+\s*[)]\s*[)]\s*[,|\.]\s*""".r              tuple match {               case pattern(symbol, time) =>               new data(symbol, time.toint)             }          }     } }  case class data (symbol: string, time: int)

i have batch duration of 110.000 milliseconds in order receive data in 1 batch. believed that, locally, spark fast. in case, takes 3.5sec execute rule (between "rule starts" , "rule ends"). doing wrong or expected time? advise

so using case matching in allot of jobs , killed performance, more when introduced json parser. try tweaking batch time on streamingcontext. made quite bit of difference me. how many local workers have?

Search This Blog

Hide

batch file - Slow performance in spark streaming -

Comments

Post a Comment

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

HTML pattern attribute for email validation -