Finally, the high-level APIs would mean increased developer productivity.It also has support for stateful streaming aggregations and we could reduce our latency using micro-batches of seconds instead of minutes. ![]() ![]() Spark allows you to define arbitrarily complex processing pipelines without the need for external coordination.We took this opportunity to consider Spark for a major refactoring, encouraged by earlier prototypes and relying on the following features: Our jobs were stateful and soon we needed to add some features that would have meant two or more MR jobs that needed orchestration with something like Oozie. At one point we were running it in a tight one-minute loop across millions of events. That worked well for some time, but we kept pushing MR to obtain lower end-to-end latency. We started with a refactoring project for our Video Analytics product that was initially developed using MR and Kafka as building blocks. My team has been using Spark and Scala for about four years now. ![]() During recent years, we have started to modernize our data processing stack, adopting open source technology like Hadoop MapReduce (MR), Storm, and Spark, to name a few. Spark on Scala: Adobe Analytics Reference Architectureīuilding production-ready, real-world data applications with Spark and Scala by mixing the right amounts of FP and OOPĪdobe Analytics processes billions of transactions a day across major web and mobile properties to power the Adobe Experience Cloud.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |