

{"id":14928,"date":"2018-05-04T04:39:18","date_gmt":"2018-05-04T04:39:18","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=14928"},"modified":"2018-05-04T04:39:18","modified_gmt":"2018-05-04T04:39:18","slug":"kafka-streams","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/kafka-streams\/","title":{"rendered":"Kafka Streams | Stream, Real-Time Processing &amp; Features"},"content":{"rendered":"<p>In our previous <strong>Kafka tutorial<\/strong>, we discussed<strong> ZooKeeper in Kafka<\/strong>. Today, in this Kafka Streams tutorial, we will learn the actual meaning of Streams in Kafka.<\/p>\n<p>Also, we will see Kafka Stream architecture, use cases, and Kafka streams feature. Moreover, we will discuss stream processing topology in Apache Kafka.<\/p>\n<p><span style=\"font-weight: 400\">Kafka Streams is a client library for building applications and microservices, especially, where the input and output data are stored in Apache <strong>Kafka Clusters<\/strong>. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Basically, with the benefits of Kafka&#8217;s server-side cluster technology, Kafka Streams combines the simplicity of writing and deploying standard <strong>Java <\/strong>and <strong>Scala<\/strong> applications on the client side.\u00a0<\/span><\/p>\n<p>So, let&#8217;s begin with Apache Kafka Streams.<\/p>\n<p>First, let&#8217;s discuss a little about Stream and Real-Time Kafka Processing<\/p>\n<h2><span style=\"font-weight: 400\">Stream &amp; Real-Time Processing in Kafka\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The real-time processing of data continuously, concurrently, and in a record-by-record fashion is what we call Kafka Stream processing.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Real-time processing in Kafka is one of the <strong>applications of Kafka<\/strong>.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Basically, Kafka Real-time processing includes a continuous stream of data. Hence, after the analysis of that data, we get some useful data out of it.<\/span><\/p>\n<p><span style=\"font-weight: 400\"> Now, while it comes to Kafka, real-time processing typically involves reading data from a topic (source), doing some analysis or transformation work, and then writing the results back to another topic (sink). To do this type of work, there are several options.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Either we can write our own custom code with a <strong>Kafka Consumer<\/strong> to read the data and write that data via a<strong> Kafka Producer.\u00a0<\/strong><\/span><\/p>\n<p><span style=\"font-weight: 400\">Or we use a full-fledged stream processing framework like <strong>Spark Streaming<\/strong>, <strong>Flink<\/strong>, <strong>Storm<\/strong>, etc.<\/span><br \/>\n<span style=\"font-weight: 400\">However, there is an alternative to the above options, i.e. Kafka Streams. So, let\u2019s learn about Kafka Streams.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">What is Kafka Streams?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Kafka Streams, a client library, we use it to process and analyze data stored in Kafka. <\/span><\/p>\n<p><span style=\"font-weight: 400\">It relied on important streams processing concepts like properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, Kafka Streams has a low barrier to entry that means we can quickly write and run a small-scale proof-of-concept on a single machine. For that, we only need to run additional instances of our application on multiple machines to scale up to high-volume production workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400\"> Moreover, by leveraging Kafka&#8217;s parallelism model, it transparently handles the load balancing of multiple instances of the same application.<\/span><\/p>\n<p><strong>Some key points related to Kafka Streams:<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Kafka Stream can be easily embedded in any <strong>Java<\/strong> application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications because it is a simple and lightweight client library.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">There are no external dependencies on systems other than Apache Kafka itself as the internal messaging layer. <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In order to enable very fast and efficient stateful operations (windowed joins and aggregations), it supports the fault-tolerant local state.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">To guarantee that each record will be processed once and only once even when there is a failure on either Streams clients or <strong>Kafka brokers<\/strong> in the middle of processing, it offers exactly-once processing semantics.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In order to achieve millisecond processing latency, employs one-record-at-a-time processing. Also, with the late arrival of records, it supports event-time based windowing operations.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Along with a high-level Streams DSL and a low-level Processor API, it offers necessary stream processing primitives.<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Stream Processing Topology in Kafka<\/span><\/h2>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Kafka Streams most important abstraction is a <\/span><b>stream<\/b><span style=\"font-weight: 400\">. Basically, it represents an unbounded, continuously updating data set. In other words, on order, replayable, and fault-tolerant sequence of immutable data records, where a data record is defined as a key-value pair, is what we call a stream.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Moreover, any program that makes use of the Kafka Streams library, is a <\/span><b>stream processing application<\/b><span style=\"font-weight: 400\">. Through one or more processor topologies, it defines its computational logic, especially where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In the Stream processor topology, there is a node we call a <\/span><b>stream processor<\/b><span style=\"font-weight: 400\">. It represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it. Also, may subsequently produce one or more output records to its downstream processors.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Two special processors in the topology of Kafka Streams are:<\/span><\/p>\n<h3><span style=\"font-weight: 400\">a. Source Processor<\/span><\/h3>\n<p><span style=\"font-weight: 400\">It is a special type of stream processor which does not have any upstream processors. By consuming records from one or multiple Kafka topics and forwarding them to its down-stream processors it produces an input stream to its topology.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">b. Sink Processor<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Unlike, Source Processor, this stream processor does not have down-stream processors. Basically, it sends any received records from its up-stream processors to a specified <strong>Kafka topic<\/strong>.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Note: While processing the current record, other remote systems can also be accessed in normal processor nodes. Thus, the processed results can either be streamed back into Kafka or written to an external system.<\/span><\/p>\n<div id=\"attachment_14932\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-14932\" class=\"wp-image-14932 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology.png\" alt=\"Kafka streams\" width=\"634\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology.png 634w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology-300x297.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Processor-Topology-100x100.png 100w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><\/a><p id=\"caption-attachment-14932\" class=\"wp-caption-text\">Kafka stream Processor Topology<\/p><\/div>\n<h2><span style=\"font-weight: 400\">Kafka Streams Architecture<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, by building on the Kafka producer and consumer libraries and leveraging the native capabilities of Kafka to offer data parallelism, distributed coordination, fault tolerance, and operational simplicity, Kafka Streams simplifies application development.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Below image\u00a0describes the anatomy of an application that uses the Kafka Streams library.\u00a0<\/span><\/p>\n<div id=\"attachment_14933\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-14933\" class=\"wp-image-14933 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture.png\" alt=\"Kafka streams\" width=\"634\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture.png 634w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture-300x297.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Kafka-Stream-Architecture-100x100.png 100w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><\/a><p id=\"caption-attachment-14933\" class=\"wp-caption-text\">Kafka Streams\u00a0Architecture<\/p><\/div>\n<h3><span style=\"font-weight: 400\">a. Streams Partitions and Tasks<\/span><\/h3>\n<p><span style=\"font-weight: 400\">However, for storing and transporting, the messaging layer of Kafka partitions data.\u00a0Similarly, for processing data Kafka Streams partitions it. <\/span><\/p>\n<p><span style=\"font-weight: 400\">So, we can say partitioning is what enables data locality, elasticity, scalability, high performance, and fault tolerance. In the context of parallelism there are close links between Kafka Streams and Kafka:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\">Each Kafka streams partition is a sequence of data records in order and maps to a Kafka topic partition.<\/span><\/li>\n<li><span style=\"font-weight: 400\">A data record in the stream maps to a Kafka message from that topic.<\/span><\/li>\n<li><span style=\"font-weight: 400\">In both Kafka and Kafka Streams, the keys of data records determine the partitioning of data, i.e., keys of data records decide the route to specific partitions within topics.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Moreover, by breaking an application&#8217;s processor topology into multiple tasks, it gets scaled.\u00a0However, on the basis of input stream partitions for the application, Kafka Streams creates a fixed number of tasks, with each task assigned a list of partitions from the input streams in Kafka (i.e., Kafka topics).<\/span><span style=\"font-weight: 400\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Also, without manual intervention, Kafka stream tasks can be processed independently as well as in parallel.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Below image describes two tasks each assigned with one partition of the input streams.<\/span><\/p>\n<div id=\"attachment_14934\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-14934\" class=\"wp-image-14934 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks.png\" alt=\"Kafka streams\" width=\"634\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks.png 634w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks-300x297.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Stream-Partitions-and-Tasks-100x100.png 100w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><\/a><p id=\"caption-attachment-14934\" class=\"wp-caption-text\">Kafka stream Architecture- Streams Partitions and Tasks<\/p><\/div>\n<h3><span style=\"font-weight: 400\">b. Threading Model<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Kafka Streams allows the user to configure the number of threads that the library can use for parallelizing process within an application instance. <\/span><\/p>\n<p><span style=\"font-weight: 400\">However, with their processor topologies independently, each thread can execute one or more tasks. For example, below image describes one stream thread running two-stream tasks.<\/span><\/p>\n<div id=\"attachment_14935\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-14935\" class=\"wp-image-14935 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model.png\" alt=\"Kafka streams\" width=\"634\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model.png 634w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model-300x297.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Threading-Model-100x100.png 100w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><\/a><p id=\"caption-attachment-14935\" class=\"wp-caption-text\">Kafka stream architecture- Threading Model<\/p><\/div>\n<h3><span style=\"font-weight: 400\">c. Local State Stores<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Kafka Streams\u00a0offers so-called<em> state stores<\/em>.\u00a0Basically, we use it to store and query data by stream processing applications, which is an important capability while implementing stateful operations. <\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, the Kafka Streams DSL automatically creates and manages such state stores when you are calling stateful operators such as join() or aggregate(), or when you are windowing a stream.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In Kafka Streams application, every stream task may embed one or more local state stores that even APIs can access to the store and query data required for processing. Moreover, such local state stores Kafka Streams offers fault-tolerance and automatic recovery.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Below image describes two stream tasks with their dedicated local state stores.<\/span><\/p>\n<div id=\"attachment_14936\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-14936\" class=\"wp-image-14936 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store.png\" alt=\"Kafka streams\" width=\"634\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store.png 634w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store-300x297.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Local-state-store-100x100.png 100w\" sizes=\"auto, (max-width: 634px) 100vw, 634px\" \/><\/a><p id=\"caption-attachment-14936\" class=\"wp-caption-text\">Kafka streams- Local State Store<\/p><\/div>\n<h3><span style=\"font-weight: 400\">d. Fault Tolerance<\/span><\/h3>\n<p><span style=\"font-weight: 400\">However, integrated natively within Kafka, it is built on fault-tolerance capabilities. While stream data is persisted to Kafka it is available even if the application fails and needs to re-process it. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, to handle failures, tasks in Kafka Streams leverage the fault-tolerance capability offered by the Kafka consumer client.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, here local state stores are also robust to failures. Hence, it maintains a replicated changelog Kafka topic in which it tracks any state updates, for each state store. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Kafka Streams guarantees to restore their associated state stores to the content before the failure by replaying the corresponding changelog topics prior to resuming the processing on the newly started tasks if tasks run on a machine that fails and is restarted on another machine. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Hence, failure handling is completely transparent to the end-user.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Implementing Kafka Streams<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, built with Kafka Streams, a stream processing application looks like:<\/span><\/p>\n<h3><span style=\"font-weight: 400\">a. Providing Stream Configurations<\/span><\/h3>\n<p><b>Properties streamsConfiguration = new Properties();<\/b><br \/>\n<b>streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, &#8220;Streaming-QuickStart&#8221;);<\/b><br \/>\n<b>streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, &#8220;localhost:9092&#8221;);<\/b><br \/>\n<b>streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());<\/b><br \/>\n<b>streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());<\/b><\/p>\n<h3><span style=\"font-weight: 400\">b. Getting Topic and Serdes<\/span><\/h3>\n<p><b>String topic = configReader.getKStreamTopic();<\/b><br \/>\n<b>String producerTopic = configReader.getKafkaTopic();<\/b><br \/>\n<b>final Serde stringSerde = Serdes.String();<\/b><br \/>\n<b>final Serde longSerde = Serdes.Long();<\/b><\/p>\n<h3><span style=\"font-weight: 400\">c. Building the Stream and Fetching Data<\/span><\/h3>\n<p><b>KStreamBuilder builder = new KStreamBuilder();<\/b><br \/>\n<b>KStream&lt;String, String&gt; inputStreamData = builder.stream(stringSerde, stringSerde, producerTopic);<\/b><\/p>\n<h3><span style=\"font-weight: 400\">d. Processing of Kafka Stream<\/span><\/h3>\n<p><b>KStream&lt;String, Long&gt; processedStream = inputStreamData.mapValues(record -&gt; record.length() )<\/b><br \/>\n<span style=\"font-weight: 400\">There is a list of other transformation operations provided for KStream, apart from join and aggregate operations. Hence, each of these operations may generate either one or more KStream objects. Also, can be translated into one or more connected processors into the underlying processor topology. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, to compose a complex processor topology, all of these transformation methods can be chained together.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Among these transformations, filter, map, mapValues, etc., are stateless transformation operations with which users can pass a customized function as a parameter, such as a predicate for the filter, KeyValueMapper for the map, etc. as per their usage in a language.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">e. Writing Streams Back to Kafka<\/span><\/h3>\n<p><b>processedStream.to(stringSerde, longSerde, topic);<\/b><br \/>\n<span style=\"font-weight: 400\">Here, even after initialization of internal structures, the processing\u00a0doesn&#8217;t start. So, by calling the start() method, we have to explicitly start the Kafka Streams thread:<\/span><br \/>\n<b>KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);<\/b><br \/>\n<b>streams.start();<\/b><br \/>\n<span style=\"font-weight: 400\">Hence, the last step is closing the Stream.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Features of Kafka Streams<\/span><\/h2>\n<ol>\n<li><span style=\"font-weight: 400\">The best features are elasticity, high scalability, and fault-tolerance.<\/span><\/li>\n<li><span style=\"font-weight: 400\">Deploy to containers, VMs, bare metal, cloud.<\/span><\/li>\n<li><span style=\"font-weight: 400\">For small, medium, &amp; large use cases, it is equally viable.<\/span><\/li>\n<li><span style=\"font-weight: 400\">It is fully in integration with <strong>Kafka security<\/strong>.<\/span><\/li>\n<li><span style=\"font-weight: 400\">Write standard <strong>Java applications<\/strong>.<\/span><\/li>\n<li><span style=\"font-weight: 400\">Exactly-once processing semantics.<\/span><\/li>\n<li><span style=\"font-weight: 400\">There is no need of separate processing cluster.<\/span><\/li>\n<li><span style=\"font-weight: 400\">It is developed on Mac, <strong>Linux<\/strong>, Windows.<\/span><\/li>\n<\/ol>\n<h2><span style=\"font-weight: 400\">Kafka Streams Use Cases<\/span><\/h2>\n<h3><span style=\"font-weight: 400\">a. The New York Times<\/span><\/h3>\n<p><span style=\"font-weight: 400\">In order to store and distribute, in real-time, published content to the various applications and systems that make it available to the readers, it uses Apache Kafka and the Kafka Streams.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">b. Zalando<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Zalando uses Kafka as an ESB (Enterprise Service Bus) as the leading online fashion retailer in Europe. That helps them in transitioning from a monolithic to a micro-services architecture. Moreover, using Kafka for processing event streams their technical team does near-real-time business intelligence.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">c. LINE<\/span><\/h3>\n<p><span style=\"font-weight: 400\">To communicate to one another LINE uses Apache Kafka as a central data hub for their services. As in Line, hundreds of billions of messages are produced daily and are used to execute various business logic, threat detection, search indexing and data analysis. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Also, Kafka helps LINE to reliably transform and filter topics enabling sub-topics consumers can efficiently consume and meanwhile retains easy maintainability.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">d. Pinterest<\/span><\/h3>\n<p><span style=\"font-weight: 400\">In order to power the real-time, predictive budgeting system of their advertising infrastructure, Pinterest uses Apache Kafka and the Kafka Streams at large scale. There spend predictions are more accurate than ever, with Kafka Streams.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">e. Rabobank<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Apache Kafka powers digital nervous system, the Business Event Bus of the Rabobank. It is one of the 3 largest banks in the Netherlands. By using Kafka Streams, this service alerts customers in real-time on financial events.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Conclusion<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Hence, we have learned the concept of Apache Kafka Streams in detail. We discussed Stream Processing and Real-Time Processing. Moreover, we saw Stream Processing Topology and its special processor. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Afterward, we move on to Kafka Stream architecture and implementing Kafka Streams. Finally, we looked at features and use cases of Kafka Streams. Still, if any doubt occurs feel free to ask. We will definitely respond to you back.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our previous Kafka tutorial, we discussed ZooKeeper in Kafka. Today, in this Kafka Streams tutorial, we will learn the actual meaning of Streams in Kafka. Also, we will see Kafka Stream architecture, use&#46;&#46;&#46;<\/p>\n","protected":false},"author":5,"featured_media":15817,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[6601,7928,7948,7949,7950,7951,7952,7953,11396,11397,13884,13885,13886,15803],"class_list":["post-14928","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kafka","tag-implementing-kafka-streams","tag-kafka-real-time-processing","tag-kafka-stream-architecture","tag-kafka-stream-features","tag-kafka-stream-processing","tag-kafka-stream-tutorial","tag-kafka-stream-use-cases","tag-kafka-streams","tag-real-time-processing","tag-real-time-processing-in-kafka","tag-stream-processing","tag-stream-processing-in-kafka","tag-stream-processing-topology","tag-what-is-kafka-stream"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Kafka Streams | Stream, Real-Time Processing &amp; Features - DataFlair<\/title>\n<meta name=\"description\" content=\"Kafka Streams,Stream Processing,Real-Time Processing,Stream Processing Topology,Kafka Stream Architecture,what is Kafka Streams,Kafka Stream features\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/kafka-streams\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Kafka Streams | Stream, Real-Time Processing &amp; Features - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Kafka Streams,Stream Processing,Real-Time Processing,Stream Processing Topology,Kafka Stream Architecture,what is Kafka Streams,Kafka Stream features\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/kafka-streams\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-05-04T04:39:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Kafka Streams | Stream, Real-Time Processing &amp; Features - DataFlair","description":"Kafka Streams,Stream Processing,Real-Time Processing,Stream Processing Topology,Kafka Stream Architecture,what is Kafka Streams,Kafka Stream features","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/kafka-streams\/","og_locale":"en_US","og_type":"article","og_title":"Kafka Streams | Stream, Real-Time Processing &amp; Features - DataFlair","og_description":"Kafka Streams,Stream Processing,Real-Time Processing,Stream Processing Topology,Kafka Stream Architecture,what is Kafka Streams,Kafka Stream features","og_url":"https:\/\/data-flair.training\/blogs\/kafka-streams\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-05-04T04:39:18+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823"},"headline":"Kafka Streams | Stream, Real-Time Processing &amp; Features","datePublished":"2018-05-04T04:39:18+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/"},"wordCount":2073,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg","keywords":["Implementing Kafka Streams","Kafka Real Time Processing","Kafka Stream Architecture","Kafka Stream features","Kafka Stream Processing","Kafka Stream Tutorial","Kafka Stream Use cases","Kafka Streams","real time processing","Real time processing in Kafka","stream processing","Stream processing in Kafka","Stream Processing Topology","what is kafka stream"],"articleSection":["Apache Kafka Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/kafka-streams\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/","url":"https:\/\/data-flair.training\/blogs\/kafka-streams\/","name":"Kafka Streams | Stream, Real-Time Processing &amp; Features - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg","datePublished":"2018-05-04T04:39:18+00:00","description":"Kafka Streams,Stream Processing,Real-Time Processing,Stream Processing Topology,Kafka Stream Architecture,what is Kafka Streams,Kafka Stream features","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/kafka-streams\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/05\/Apache-Kafka-Streams-01-2.jpg","width":1200,"height":628,"caption":"Kafka Stream tutorial"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/kafka-streams\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Kafka Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/kafka\/"},{"@type":"ListItem","position":3,"name":"Kafka Streams | Stream, Real-Time Processing &amp; Features"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team creates expert-level guides on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our goal is to empower learners with easy-to-understand content. Explore our resources for career growth and practical learning.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam1\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/14928","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=14928"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/14928\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/15817"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=14928"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=14928"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=14928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}