Imagine a scenario that rolling out new IoT enabled device or getting telemetry from millions of mobile apps or games. Architecting for this type of solution poses many challenges. How do you enable ingest of messages at this scale? More importantly, how do you enable processing of the messages in a manner which does not create a performance bottleneck? How do you independently scale the message ingest and message processing capabilities? How do you provide for loose coupling between message ingress and egress and ability to scale out as needed?
Enter Azure Event Hubs.
Azure Event Hubs is a highly scalable publish-subscribe ingestor that can intake millions of events per second so that you can process and analyze the massive amounts of data produced by your connected devices and applications. Once collected into Event Hubs you can transform and store data using any real-time analytics provider or with batching/storage adapters.
Azure Event Hubs is a new member to join the Azure Service bus family adding to the existing topics and queues offering. Event Hubs offer the ability to process a very high volume of messages fast. Event Hubs provides FIFO (First In First Out) messaging much like queues and topics and also supports pub-sub messaging just like topics.
Planning for Azure Event Hubs.
You must put some effort in capacity planning before you create an Event Hub. In order to make the right decisions let’s go over a couple details about Event Hubs. Event Hubs are partitioned. The minimum number of partitions is 8 and the maximum (public) number of partitions is 32. If you need more partitions you must go through a support ticket. At that time, you can request up to 1024 (and higher) partitions for your Event Hub.
A partition is an ordered sequence of events that is held in an Event Hub. As newer events arrive, they are added to the end of this sequence. A partition can be thought of as a “commit log.”
Partitions retain data for a configured retention time that is set at the Event Hub level and applies across all partitions in the Event Hub. Events expire on a time basis; you cannot explicitly delete them. Each partition is independent and contains its own sequence of data. As a result, partitions often grow at different rates.
Each partition has a performance target of 1 MB ingress or 1000 operations and 2 MB egress per second. By default, each Event Hub is created with 16 partitions. This corresponds to 16 MB ingress or 16,000 operations and 32 MB egress per second.
Although this is a lot, there is also need to look at message size here. If messages were 1 KB in size we could technically hit the performance target of 1000 operations per second. This could technically represent 16,000 messages per second. Calculating the number of partitions you need starts by calculating potential throughput in megabytes. Then you need to calculate the number of messages per second. This will give you the first part of the equation. The second part of the equation is found by calculating the number of processing nodes required to meet your performance targets. Since a single partition cannot be processed concurrently by multiple nodes, you must have at least as many partitions as you have backend node instances. For example, a system that must process 30 messages simultaneously, where each process requires a dedicated processing node, requires the Event Hub to have at least 30 partitions. Number of partitions = MAX (cumulative throughput required for the stream – given 1 MB per partition, number of nodes needed by the backend processing application)
Changing the number of partitions once in production can cause quite a bit of headaches because it means that we need to create a new Event Hub and reconfigure publishers to use the new Event Hub. While events are piling up let the backend nodes empty the Event Hub. Once it’s empty, you can reconfigure your backend nodes to consumer events from the new Event Hub. Switching to the backend processing nodes to the new Event Hub too early would break the order of events. From a billing perspective, the number of partitions is irrelevant because there is not charge for partitions.
Once the application is deployed you can provision throughput units to scale the Event Hub’s throughput capacity. A single throughput unit has the capacity of 1 MB per second of ingress events (events sent into an Event Hub), but no more than 1000 ingress events, management operations or control API calls per second. It has 2 MB per second of egress events (events consumed from an Event Hub) and 84 GB of event storage (sufficient for the default 24-hour retention period). While partitions are a data organization concept, throughput units are purely a capacity concept. Throughput units are billed per hour and are purchased ahead of time. Once purchased, throughput units are billed for a minimum of one hour. Up to 20 throughput units can be purchased for a Service Bus namespace, and there is an Azure account limit of 20 throughput units. These throughput units are shared across all Event Hubs in a given namespace. Throughput units are provisioned on a best effort basis and may not always be available for immediate purchase. If you require a specific capacity, it is recommended that you purchase those throughput units ahead of time. If you require more than 20 throughput units, you can contact Microsoft Azure Service Bus support to purchase more throughput units on a commitment basis in blocks of 20, up to the first 100 throughput units. Beyond that, you can also purchase blocks of 100 throughput units. It is recommended that you carefully balance throughput units and partitions in order to achieve optimal scale with Event Hubs. A single partition has a maximum scale of one throughput unit. The number of throughput units should be less than or equal to the number of partitions in an Event Hub.
If the total ingress throughput or the total ingress event rate across all Event Hubs in a namespace exceeds the aggregate throughput unit allowances, senders will be throttled and receive errors indicating that the ingress quota has been exceeded. If the total egress throughput or the total event egress rate across all Event Hubs in a namespace exceeds the aggregate throughput unit allowances, receivers are throttled and receive errors indicating that the egress quota has been exceeded. Ingress and egress quotas are enforced separately, so that no sender can cause event consumption to slow down, nor can a receiver prevent events from being sent into Event Hubs.
Enough Theory – Lets Create Event Hub
An EventHub can be created from the Azure Management Portal, under the current Service Bus area.Click on the +NEW area at the bottom of the portal and navigate to the Event Hub service
If you have an existing Service Bus namespace, then you can reuse it as I have done. Otherwise, you will need a new namespace before you create Event Hub. A wizard will show on screen to assist you with the creation. Enter an Event Hub name, select a Region that hosts the services that will process the events and click next. Be sure that you are creating the resource on the correct subscription and desired Azure Service Bus Namespace.
The next panel prompts for important information. This is where you specify the number of partitions for your Event Hub. Then because this is a Standard Event Hub, you can specie the number of days that an event stays in the Event Hub. This is especially handy when you need to replay events from a week ago. The maximum number of days is set to 30, the minimum is set to 1 day.
Security and Connectivity
In order to publish events to an Event Hub, you must create Shared Access Policies. In order to publish an event we need a Shared Access Policy that allows us to send events. Navigate to the Service Bus namespace and click on the Event Hubs tab. From the list choose the newly created Event Hub.Then click on Configure. In this screen, you will be able to create a Send enabled Shared Access Policy.
Now we need to retrieve a SAS key connection strings to authenticate and interact with the Event Hub. Select the Dashboard tab and click on the View Connection String link. From the wizard copy the Connection String that is enabled for Send operations. Click the copy button to store the connection string in you clipboard. This will allow you to paste the connection string in your project’s configurations.
Create Event Sender in Code
As my sample will be pretty basic I will just create a simple C# console project in Visual Studio.
In the Solution Explorer, right-click on References and select Manage NuGet Packages…
In the Search Online box type Azure Service Bus.
Install Microsoft Azure Service Bus version 2.6.1 or later.
After that its simply a matter of code. Below sample synchronously sends very simple messages to the hub:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Threading; using Microsoft.ServiceBus.Messaging; using System.Diagnostics; namespace EventHubSender { class Program { static string eventHubName = "{put your hub name here}"; static string connectionString = "Endpoint=sb:/{put your namespace here}/;SharedAccessKeyName=SendRule;SharedAccessKey={put your access key here}"; static void Main(string[] args) { Console.WriteLine("Press Ctrl-C to stop the sender process"); Console.WriteLine("Press Enter to start now"); Console.ReadLine(); SendingRandomMessages(); } static void SendingRandomMessages() { var eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, eventHubName); while (true) { try { var message = Guid.NewGuid().ToString(); Console.WriteLine("{0} > Sending message: {1}", DateTime.Now, message); eventHubClient.Send(new EventData(Encoding.UTF8.GetBytes(message))); Console.WriteLine("Duration: " + sw.ElapsedMilliseconds.ToString()); } catch (Exception exception) { Console.ForegroundColor = ConsoleColor.Red; Console.WriteLine("{0} > Exception: {1}", DateTime.Now, exception.Message); Console.ResetColor(); } Thread.Sleep(200); } } } }
You will notice above that certain configuration items are marked {put your… here}. Replace {put your namespace here} with the name of the Service Bus Namespace you created in the Azure portal. Replace your {put your hub name here} with name of the event hub you created. Finally you’ll see SharedAccessKeyName=. I want you to replace SendRule I used with the name of the data ingress shared access policy you created in your Event Hub.In order to replace {put your access key here} with the correct value, go to the Dashboard page of your Event Hub and click the Connection Information key at the bottom of the page. A dialog containing access connection information with connection strings will appear. Copy the connection string from the data ingress shared access policy you created and paste it into notepad because it contains too much information. Just copy the SharedAccessKey value at the end of the connection string into {put your access key here} and then save and close the file.
Finally running this application is really easy:
Output
Back in Azure Portal in the dashboard for our event hub I can see events being received:
For more and deeper information see these excellent articles and blogs :
- MSDN Event Hub Programming Guide – https://msdn.microsoft.com/en-us/library/azure/dn789972.aspx
- MSDN Magazine article by Bruno Terkaly – https://msdn.microsoft.com/en-us/magazine/dn948106.aspx
- Event Hubs Overview on MSDN – https://msdn.microsoft.com/en-us/library/azure/dn836025.aspx
- Chris Myers Blog – http://www.bloggedbychris.com/2014/08/16/exploring-azure-event-hubs-code-implementation/
- DevScrum.NET blog – http://blog.devscrum.net/2014/09/azure-event-hubs-high-volume-processing/
Hope this helps.