Preview

Труды Института системного программирования РАН

Расширенный поиск

Обзор состояния области потоковой обработки данных

https://doi.org/10.15514/ISPRAS-2017-29(1)-13

Аннотация

В статье рассматриваются аспекты построения и использования современных фреймворков для организации потоковой обработки данных. Уделяется внимание архитектурным аспектам фреймворков, а также связанными с ними достоинствами и недостатками. Рассматривается проблема объективного оценивания характеристик потоковых фреймворков.

Об авторе

Р. С. Самарев
Московский государственный технический университет имени Н.Э. Баумана
Россия


Список литературы

1. Apache Apex. https://apex.apache.org/. [Обращение 2017-01-02].

2. Apache Flink: Scalable Batch and Stream Data Processing. https://flink.apache.org/. [Обращение 2017-01-02].

3. Apache Kafka. https://kafka.apache.org/. [Обращение 2017-01-02].

4. Apache Samza. http://samza.apache.org/. [Обращение 2017-01-02].

5. Apache Spark™ - Lightning-Fast Cluster Computing. https://spark.apache.org/. [Обращение 2017-01-02].

6. Apache Storm. https://storm.apache.org/. [Обращение 2017-01-02].

7. Drools - Business Rules Management System (Java™, Open Source). https://www.drools.org/. [Обращение 2017-01-02].

8. Guaranteeing message processing. http://storm.apache.org/releases/current/Guaranteeing-message-processing.html. . [Обращение 2016-12-23].

9. RocksDB a persistent key-value store. http://rocksdb.org/. [Обращение 2017-01-02].

10. Spring. https://spring.io/. [Обращение 2017-01-02].

11. An Overview of Apache Streaming Technologies. https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/, 2016. [Обращение 2017-01-02].

12. Apache Flume. https://flume.apache.org/, 2016. [Обращение 2017-01-02].

13. Heron. A realtime, distributed, fault-tolerant stream processing engine from Twitter. https://twitter.github.io/heron/, 2016. [Обращение 2017-01-02].

14. Samza. Comparison Introduction. http://samza.apache.org/learn/documentation/ latest/comparisons/introduction.html, 2016. [Обращение 2017-01-02].

15. Project Reactor. https://projectreactor.io/, 2017. [Обращение 2017-01-02].

16. Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120–139, August 2003.

17. Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. The stratosphere platform for big data analytics. The VLDB Journal, 23(6):939–964, December 2014.

18. Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, and Volker Markl. Emma in action: Declarative dataflows for scalable data analysis. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 2073–2076. ACM, 2016.

19. Henrique C. M. Andrade, Bugra Gedik, and Deepak S. Turaga. Fundamentals of Stream Processing: Application Design, Systems, and Analytics. Cambridge University Press, New York, NY, USA, 1st edition, 2014.

20. Arvind Arasu, Mitch Cherniack, Eduardo F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. Linear road: A stream data management benchmark. In Mario A. Nascimento, M. Tamer Özsu, Donald Kossmann, Renée J. Miller, José A. Blakeley, and K. Bernhard Schiefer, editors, (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 480–491. Morgan Kaufmann, 2004.

21. Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. Lightweight asynchronous snapshots for distributed dataflows. CoRR, abs/1506.08603, 2015.

22. S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holderbaugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky. Benchmarking streaming computation engines: Storm, flink and spark streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1789–1792, May 2016.

23. [Saliya Ekanayake. Towards a systematic study of big data performance and benchmarking. PhD thesis, the School of Informatics and Computing, Indiana University, United States – Indiana, 10 2016. http://pqdtopen.proquest.com/doc/1845860615.html?FMT=ABS.

24. Hueske Fabian. Stream Processing for Everyone with SQL and Apache Flink. https://flink.apache.org/news/2016/05/24/stream-sql.html, 2016. [Обращение 2017-01-02].

25. Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pages 1197–1208, New York, NY, USA, 2013. ACM.

26. Lukasz Golab and M. Tamer Özsu. Issues in data stream management. SIGMOD Rec., 32(2):5–14, June 2003.

27. Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):46:1–46:34, March 2014.

28. Kreps Jay. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform. https://www.confluent.io/blog/stream-data-platform-1/, https://www.confluent.io/blog/stream-data-platform-2/, 2015. [Обращение 2017-01-02].

29. Kreps Jay. Introducing kafka streams: Stream processing made simple - confluent. https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ , 2016. [Обращение 2017-01-02].

30. Kostas, Ewen Stephan, and Metzger Robert. High-throughput, low-latency, and exactly-once stream processing with apache flink. http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/, 2015. [Обращение 2016-12-23].

31. Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. Stream bench: Towards benchmarking modern distributed stream computing frameworks. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, UCC ’14, pages 69–78, Washington, DC, USA, 2014. IEEE Computer Society.

32. Nathan Marz and James Warren. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Greenwich, CT, USA, 1st edition, 2015.

33. Diana Matar. Benchmarking Fault-Tolerance in Stream Processing Systems. Master’s thesis. TU-Berlin, 2016, 57 p.

34. Zaharia Matei, Wendell Patrick, and Das Tathagata. Diving into apache spark streaming’s execution model. https://databricks.com/blog/2015/07/30/diving-into-apache-spark-streamings-execution-model.html, 2015. [Обращение 2016-12-23].

35. Guido Mazza. big data streaming processing engines under the umbrella of the apache foundation: benchmark and industrial applications. Master’s thesis. Universita` degli Studi di Modena e Reggio Emilia, 2015. http://www.dbgroup.unimo.it/tesi/Magistrale/ 201516_Guido_Mazza_tesi.pdf

36. Gualtieri Mike, Curran Rowan, Kisker Holger, Miller Emily, and Izzi Matthew. The forrester wave™: Big data streaming analytics, q1 2016. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-2, https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-big-data-streaming-analytics-108218.pdf, 2016. [Обращение 2017-01-02].

37. Zapletal Petr. Comparison of apache stream processing frameworks: Part 1. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1, 2016. [Обращение 2016-12-23].

38. Zapletal Petr. Comparison of apache stream processing frameworks: Part 2. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-2, 2016. [Обращение 2016-12-23].

39. Tilmann Rabl, Michael Frank, Manuel Danisch, Hans-Arno Jacobsen, and Bhaskar Gowda. The vision of bigbench 2.0. In Proceedings of the Fourth Workshop on Data Analytics in the Cloud, DanaC’15, pages 3:1–3:4, New York, NY, USA, 2015. ACM.

40. Michael Stonebraker, Uǧur Çetintemel, and Stan Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec., 34(4):42–47, December 2005. Перевод http://citforum.ru/database/articles/stream_8_req/.

41. Feng Tao. Benchmarking Apache Samza: 1.2 million messages per second on a single node. https://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node, 2015. [Обращение 2016-12-01].

42. Rohrmann Till. Introducing Complex Event Processing (CEP) with Apache Flink. https://flink.apache.org/news/2016/04/06/cep-monitoring.html, 2016. [Обращение 2017-01-02].

43. Rozov Vlad. Throughput, Latency, and Yahoo! Performance Benchmarks. Is there a winner? https://community.mapr.com/community/exchange/blog/2016/12/05/ throughput-latency-and-yahoo-performance-benchmarks-is-there-a-winner-by-vlad-rozov, 2016. [Обращение 2017-01-02].


Рецензия

Для цитирования:


Самарев Р.С. Обзор состояния области потоковой обработки данных. Труды Института системного программирования РАН. 2017;29(1):231-260. https://doi.org/10.15514/ISPRAS-2017-29(1)-13

For citation:


Samarev R.S. Survey of streaming processing field. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2017;29(1):231-260. (In Russ.) https://doi.org/10.15514/ISPRAS-2017-29(1)-13



Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)