Ajay Solanki: August 2012

Tuesday 28 August 2012

Windows Azure Queues–The Complete Works

My last post had concentrated on Service Bus Queues, I do get a lot of questions from the customer when to use Azure Queues vs. Service Bus Queues. This post I try to establish the decision which help one make better choices between the two.

Digging deeper into Azure Queues: Windows Azure Queue the expectation is it will be lower cost alternative to the Service Bus Queue’s. In principle Windows Azure Queue going ahead referred to as WAQ

Are asynchronous reliable delivery messaging construct.
Highly available , durable and performance efficient. The performance numbers to which the WAQ can manage is area of some research.
Ideally they are process At Least Once.
REST based interface support.
WAQ doesn’t have a limit on the number of messages stored in queue.
TTL for WAQ is 1 week , post that they will be garbage collected.
Meta data support in form name value pair exists
Maximum message size is 64KB
Message inside WAQ can be put in binary when read back it comes as XML
No guaranty on sequencing of the message
No support for duplicate messages identification.
Parameters of WAQ include
- MessageID: GUID
- Visibility Timeout: Default is 30 seconds maximum is 2 hours, Ideally use for read and process and then issue a delete.
- PopReceipt: On reading the queue there is visibility timeout associated with it, the receiver reads the messages tries to complete some processing and then may decide to issue a delete. The message which is read has a PopReceipt associated with it. PopReceipt is used while issuing a delete it goes with the MessageId.

PopReceipt, is

Property of CloudQueueMessage
Set every time a message is popped from the queue (GetMessage or GetMessages)
Used to identify the last consumer to pop the message
A valid pop receipt is required to delete a message
An exception is thrown if an invalid pop receipt is passed
PopReceipt is used in conjuction with the message id to issue a Delete of a message , for which a visibility timeout is set. We have the following scenarios
A Delete is issued within the visibility timeout the Delete the message is deleted from the queue, the assumption here is the message has been read and processing required has been done term it the happy path.

A Delete is issued post expiry of the visibility time, this assumed to be exception flow “ ex: the receiver process has crashed” and message is available in queue for re-processing. This failure recovery process rarely happens, and it is there for your protection. But it can lead to a message being picked up more than once. Each message has a property, DequeueCount, that tells you how many times this message has been picked up for processing. For example above, when receiver A first received the message, the dequeuecount would be 0. When receiver B picked up the message, after server A’s tardiness, the dequeuecount would be 1. This becomes a strategy to detect problem or poison message and route it to a log,repair and resubmit process.

Poison message is a message that is somehow continually failing to be processed correctly. This is usually caused by some data in the contents that causes the processing code to fail. Since the processing fails, the messages timeout expires and it reappears on the queue. The repair and resubmit process is sometimes a queue that is managed by a system management software. There is a need to check for and set a threshold for this dequeuecount for messages.

MessageTTL : This specifies the time-to-live interval for the message, in seconds. The maximum time-to-live allowed is 7 days. If this parameter is omitted, the default time-to-live is 7 days. If a message is not deleted from a queue within its time-to-live, then it will be garbage collected and deleted by the storage sytem.

Notes: It is important to note that all queue names must be lower case. The CreateIfNotExist() method will see if the queue really does exist in Windows Azure, and if it doesn’t it will create it for you.

Comparison of Azure Queues with Service Queues

A good post which covers that can be found here -http://preps2.wordpress.com/2011/09/17/comparison-of-windows-azure-storage-queues-and-service-bus-queues/

Design Consideration for Azure Queues

The messages are pushed into the queues the receiver will read the message process & delete. The general technique for reading messages from a queue used is Polling. The use of a classic queue listener with a polling mechanism may not be the optimal choice when using Windows Azure queues because the Windows Azure pricing model measures storage transactions in terms of application requests performed against the queue, regardless of if the queue is empty or not. If the number of messages increase in the queue “load leveling” will kick in and more receivers roles will spin off. These receivers will continue to run and accrue cost.

The costing of a single queue listener using polling mechanism

Assuming a hypothetical situation there is a single queue listener constantly polling for messages in the queue. The business transaction data arrives at regular intervals. However, let’s assume

The solution is busy processing workload just 25% of the time during a standard 8-hour business day.
That results in 6 hours (8 hours * 75%) of “idle time” when there may not be any transactions coming through the system.
Furthermore, the solution will not receive any data at all during the 16 non-business hours every day.

Total Idle time= 22 hours, there is dequeue work i.e GetMessage() called from Polling function that amounts

22 hrs X 60 min X 60 transaction/min – assuming polling at 1 second= 79,200 transaction/day

Cost of 100,000 transactions = $0.01

The storage transactions generated by a single dequeue thread in the above scenario will add approximately = 79,200 / 100,000 * $0.01 * 30 days = $0.238/ month for 1 queue listener in polling mode.

Architects will not plan for a single queue listener for the entire application and chances are number queue listeners will be high & there are going to different queues for different requirements. I’m assuming a total 200 queues used in an application with polling

200 queues X $0.238 $45. 720 per month - is the cost incurred when the solution was not performing any computations at all, just checking on the queues to see if any work items are available

Addressing The Polling Hell…

To address the polling hell following techniques can be used

Back off polling, a method to lessen the number of transactions in your queue and therefore reduce the bandwidth used. A good implementation can be found here http://www.wadewegner.com/2012/04/simple-capped-exponential-back-off-for-queues/
Triggering (push-based model): A listener subscribes to an event that is triggered (either by the publisher itself or by a queue service manager) whenever a message arrives on a queue. The listener in turn can initiate message processing thus not having to poll the queue in order to determine whether or not any new work is available. The implementation specifics of a Push Based Model is made easier with introduction of internal IP addresses for roles. An internal endpoint in the Windows Azure roles is essentially the internal IP address automatically assigned to a role instance by the Windows Azure fabric. This IP address along with a dynamically allocated port creates an endpoint that is only accessible from within a hosting datacenter with some further visibility restrictions. Once registered in the service configuration, the internal endpoint can be used for spinning off a WCF service host in order to make a communication contract accessible by the other role instances. A Publish Subscriber implementation based on this straightforward. The limitations of this approach are.

Note: Given that application is not a large scale application spreading across geo location the pub sub model can still be implemented using the above approach. The limitation hit hard in large scale geo distributed applications. In case we are to look at a large scale geo distributed application the idea would be go for service bus.

Look at Service Bus Queues as alternative after a complete cost analysis as the Pub Sub implementation on Service Bus is out of box.

Dynamic Scaling

Dynamic scaling is the technical capability of a given solution to adapt to fluctuating workloads by increasing and reducing working capacity and processing power at runtime. The Windows Azure platform natively supports dynamic scaling through the provisioning of a distributed computing infrastructure on which compute hours can be purchased as needed.

It is important to differentiate between the following 2 types of dynamic scaling on the Windows Azure platform:

Role instance scaling refers to adding and removing additional web or worker role instances to handle the point-in-time workload. This often includes changing the instance count in the service configuration. Increasing the instance count will cause Windows Azure runtime to start new instances whereas decreasing the instance count will in turn cause it to shut down running instances. It takes 10 minutes to add a new instance.

Process (thread) scaling refers to maintaining sufficient capacity in terms of processing threads in a given role instance by tuning the number of threads up and down depending on the current workload.

Dynamic scaling in a queue-based messaging solution would attract a combination of the following general recommendations:

Monitor key performance indicators including CPU utilization, queue depth, response times and message processing latency.
Dynamically increase or decrease the number of role instances to cope with the spikes in workload, either predictable or unpredictable.
Programmatically expand and trim down the number of processing threads to adapt to variable load conditions handled by a given role instance.
Partition and process fine-grained workloads concurrently using the Task Parallel Library in the .NET Framework 4.
Maintain a viable capacity in solutions with highly volatile workload in anticipation of sudden spikes to be able to handle them without the overhead of setting up additional instances.

Note: To implement a dynamic scaling capability, consider the use of the Microsoft Enterprise Library Autoscaling Application Block that enables automatic scaling behavior in the solutions running on Windows Azure. The Autoscaling Application Block provides all of the functionality needed to define and monitor autoscaling in a Windows Azure application. It covers the latency impact, storage transaction costs and dynamic scale requirements.

Additional Consideration for Queues

HTTP 503 Server Busy on Queue Operations

At present, the scalability target for a single Windows Azure queue is “constrained” at 500 transactions/sec. If an application attempts to exceed this target, for example, through performing queue operations from multiple role instance running hundreds of dequeue threads, it may result in HTTP 503 “Server Busy” response from the storage service. I have found Transient Fault Handling Application Block pretty handy in retry mechanism - http://msdn.microsoft.com/en-us/library/hh680905(v=pandp.50).aspx

Important References

Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity post on the Windows Azure Storage team blog.

Queue Read/Write Throughput study published by eXtreme Computing Group at Microsoft Research.

The Transient Fault Handling Framework for Azure Storage, Service Bus & Windows Azure SQL Database project on the MSDN Code Gallery.

The Autoscaling Application Block in the MSDN library.

Windows Azure Storage Transaction - Unveiling the Unforeseen Cost and Tips to Cost Effective Usage post on Wely Lau’s blog.

Saturday 25 August 2012

Windows Azure Service Bus- Messaging Features

The Service Bus is single most important component be it an Enterprise Integration scenario or a cloud (which by the way happens to be mass scale integration of massive number of applications). The expectation from Service Bus in the cloud are very many, when compared to an Enterprise scenario the Enterprise Service Bus does cater to a bare minimum of the following features

Messaging Services:
Management Services
Security Services
Metadata Services
Mediation Services
Interface Service

ESB is a messaging expert not to get into history of Traditional EAI, EAI broker, MOM architecture. Messaging is a feature which has seen significant improvement in past 2 decades in ESB. This post I’m specifically concentrating on Windows Azure Service Bus – Messaging capabilities and compare it with what is the standard ESB implementation. Before dwelling into the details of the Azure Service Bus Messaging setting the context on Messaging features on a standard ESB.

ESB – Messaging features – What to expect

The Message: A message is typically composed of 3 basic parts: the header, the properties and the message payload. The header is used by the messaging system and application developer to provide information such as the destination, reply to destination, message type & message expiration time. Properties section is generally a name-value pair. These properties are essentially a part of the message payload or body that get promoted to a special section of the message so that filtering can be applied to the message by consumer or specialized routers. The format of the message payload can vary across messaging implementation example plain text, binary or xml.

ESB is a messaging expert so that it can manage whatever type of messaging you can throw at it. The types of messaging which can potentially be exchanged in mid sized organization between different business and support application can be very many and cloud scale can be a very different playground. Obvious to the fact there will a standard set of messaging which will be supported by an ESB below are the following

Point to point Messaging : P2P messages can also be marked as persistent or non-persistent
Point to point request/response: Request/ Response Messaging Pattern for most ESB is synchronous , asynchronous in nature. Applications and services in fire and forget mode which allows an application to go about its business once a message is asynchronously delivered. A variant of this is the Reply Forward Pattern where by response of the message is send to another destination.
Broadcast message
Broadcast request/response
Publish subscribe: Pub Sub is self explanatory a common misconception regarding Pub Sub is lightweight compared to point to point. A pub sub message can be delivered just as reliably as a point to point message can. A message delivered on a point to point queue can be delivered with little additional overhead if it is not marked persistent. A reliable pub sub message is delivered using a combination of persistent message and durable subscriptions. When an application register to receiving message of a specific topic it can specify that the subscription is durable. A durable subscription will survive is the subscribing client fails. This means that if that intended receiver of a message becomes unavailable for any reason, the message server will continue to store the messages on behalf of the receiver until the receiver becomes available again.

Store and forward: ESB provides message queuing and guaranteed delivery semantics which ensure that “unavailable” application will get their data queued and delivered at a later time. The message delivery semantics can cover a range of options from exactly-once delivery to at-least once to at most once delivery. Message when marked as persistent will utilize store and forward mechanism.

In a ESB the concept of store and forward should be capable of being repeated across multiple servers that are chained together. In this scenario each message server uses store and forward and message acknowledgements to get the message to the next server in the chain. Each server to server handoff maintains minimum reliability and the QoS that are specified by the sender. It would be interesting to understand how Azure really manages this internally. MSFT has not given out the details on the same. This is were the idea of dynamic routing comes into play.

Transacted Messages an important aspect of messaging in simpler words “transactional messaging”. ESB is predominantly built around “loose coupled architecture”, introducing an idea of producers and consumers of message participate in one global transaction is defeating the purpose of purpose of loosely coupled architecture. What is effective in the ESB scenario is local transaction. The local transaction is in context of an individual sender or an individual receiver where multiple operations are grouped as a single transaction. An example is the grouping together of multiple messages in all or nothing fashion. The transaction follows the convention of separating send and receive operations. From a sender’s perspective the message are held by the message server until a commit command is issued in which case the messages are sent to the receiver. In case of a rollback the messages are discarded.

There are specific situation where sending or receiving of a local transaction with the update of another transactional resource, such as a database or transactional completion of workflow code. This typically involves an underlying transaction manager that takes care of coordinating the prepare commit or rollback operation, each resource participating in the transaction. ESB in general provides interfaces for accomplishing this, allowing a message producer or a consumer to participate in a transaction with any other resource that is compliant with the XOpen/ XA two phased commit transactional protocol. This ideally becomes a distributed transaction.

Having covered enough on standard ESB messaging dwelling into what Azure Service Bus has to offer is next.

Azure Service Bus Messaging

Azure Service Bus consist of a bare minimum of following features. The focus of this post is Service Bus Messaging

On July 16, 2012 Microsoft released the beta of Microsoft Service Bus 1.0 for Windows Server. This release has been tightly kept under wraps for several months and my team was fortunate enough to have the opportunity to evaluate the early bits and help shape this release. A separate blog post on the same will out soon.

The service bus server component mentioned above is clear replacement to MSMQ.

Azure Service Bus supports the following Messaging Patterns, not getting too overwhelmed with earlier discussion of messaging types , there is a direct comparison of the same towards the end of this post.

At a high level Azure Service supports the following types of Messaging Patterns

Relayed Messaging: Message Session Relay Protocol used in the computer networking world is a protocol for transmitting a series of related messages in the context of a communications session. MSRP messages can also be transmitted by using intermediaries. Relay Messaging Pattern is similar in many ways to MSRP . Service Bus in Windows Azure provides a highly load balanced relay service that supports a variety of transport protocols and WS standards. This includes SOAP, WS-* and even REST. The relay service supports the following messaging types

One way messaging
Request/ Response
Point to Point
Publish / Subscribe scenarios
Bidirectional socket communication for increased point to point efficiency.

In a relay messaging pattern an on premise service connects to the relay service through an outbound port and creates a bidirectional socket for communication tied to a particular rendezvous address. The client can then communicate to the on premise service by sending messages to the relay service targeting the rendezvous address. The relay service will relay messages to the on premise service through the bidirectional socket already in place. The client does not need a direct connection to the on premise service nor is it required to know where the service resides and the on premise service does not need any inbound ports open on the firewall. To support this at a code level the .NET framework WCF supports relay bindings. Relay Service require a server and client components to be online at the same time. So essentially the persistent and durable messaging is not something which the relay can outright support in its vanilla form. Looking the Jan 2012 release of Azure “it supported only relay messaging” which in my personal opinion was “half baked ESB messaging”. HTTP style communications in which the requests may not be typically long lived, the clients that connect only occasionally , such as browsers, mobile applications don’t fit the bill for relay messaging.

Does relay messaging only support synchronous behavior is something which needs more discussion?

July 2012 MSFT decided correct the mistake with introduction of brokered messaging.

Brokered Messaging

Brokered message is the asynchronous option for messaging or temporal decoupled. Producers(senders) and consumers(receivers) do not have to be online at the same time. The messaging infrastructure reliably stores messages until the consuming party is ready to receive. This allows the components of distributed applications to be disconnected, and connect whenever desired and download the messages. The core components of Service Bus brokered infrastructure are Queues, Topics, Subscription. These components enable new asynchronous messaging scenarios such s

Temporal decoupling
Publish/Subscribe.

Brokered Messaging essentially filled the gap for persistent durable messaging.

Service Bus Queues

Service Bus Queues are decoupled messaging construct. In the service bus they have the following characteristics

FIFO- delivery of messages to one of more consumers in a sequenced order.

Load leveling is a perceived benefit which is standard benefits of using a queue. Since the sender and receivers are decoupled the message sending and consumption strategies can be many offline receive. Fan out receivers in case of messages in the queue are too many.

Note: Rolling out more receivers instances on Windows Azure will take the order to 10 minutes.

At a feature level Queue has the following functionality

Receive and Delete- allows the options for the receivers to receive and message and later issue a delete.

PeekLock- receive operation is two stage which makes it possible to support application that cannot tolerate missing messages. When Service Bus receives the request it finds the next message to be consumed, locks it to prevent other consumer from receiving it and then returns it to the application. After the application is finishes processing the message it completes the second stage of receive process by calling Complete on the received message, this will mark the message as being consumed. In cases where the application is unable process the message it can call abandon the Service Bus will unlock the message and make it available to be received by other applications. Usually a timeout is associated with PeekLock beyond which the Service Bus unlocks the message.

In case of message being read and no Complete issued the Service Bus considers this as Abandon situation. Going back the standard implementation of Store and Forward it supports all At Least Once.

If the scenario cannot tolerate duplicate processing, then additional logic is required in the application to detect duplicates which can be achieved based upon the MessageId property of the message which will remain constant across delivery attempts. This is known as Exactly Once processing.

Topics & Subscription
A deliberate move to support Publish Subscribe in more structured manner topics had been introduced. In normal queue based communication we see a single sender and a single receivers, Topics and subscription provide one to many communication in a pure pub sub manner. Useful for scaling to very large numbers recipients, each published message is made available to each subscription registered within the topic. Messages are sent to a topic and delivered to one or more associated subscription depending on the filter rules that can be set on a per subscription basis. The subscription can use additional filters to restrict that they want to receive. Message are sent to topic the same manner as the queue apparently received from subscription.

While the topics receives all the messages , each subscription picks a subset of the messages based on the subscription need. There is still a requirement filter the messages coming down to the subscription. The volumes of the messages can be large so filters ideally give you another chance to apply a where cause and have more targeted messaging. The filter expression is where clause on one of the properties and based on Sql 92 standards example given below
namespaceManager.CreateSubscription(("Dashboard", new SqlFilter("StoreName = 'Store1'");
Important notes

Filters are SQL 92 expressions , Correlation filter & Tagging Filter
Support for 2000 rules per subscription.
Each matched rule yields a message copy

What is additional overhead of having subscription and filter from a compute standpoint is something which one needs to understand?

Partitioning is one more targeted messaging construct which allows an additional rule to the filter by which the incoming message can be logically sub divided.

Example below

Composite Patterns of Messaging on Service Bus
CQRS have written about this in one of earlier post , In relation to messaging it makes perfect sense what they call it in the Messaging World is “Update Read Separation”.

· Reads on partitioned stores
· All writes through messages
· Distribution via fan-out
· Trades timeliness and instant feedback for robustness and scale

Diagnostics and Statistics

In the cloud world diagnostics and statistics is pretty much at a reset with new tools and new challenges. If one were to use messaging in service bus “diagnostic” deserve a special mention where the messaging can be some assistance.

The strategy for this could be to have

Flow diagnostics events from backend services to the diagnostic queues.
Vary the TTL by the severity, verbose errors short lived, fatal error reports long lived.
Filter by severity or needs of different audience

Correlation Pattern

If there is need to set the reply paths between a sender and receiver. A sender needs to receive back a response on a different queue. The Sender sends in a correlation id with the Queue Name (Response Queue) where it wishes to receive the response. The receivers queue which receives the message gets picked up by an application which in turn post processing sends a responds to senders correlated information queue.

3 correlation models supported in Service Bus are

Message Correlation
Subscription Correlation
Session Correlation

N to 1 Correlation : This is a scenario where multiple Senders will send in the same correlation id or queue. What this ideally means multiple senders and the response to that needs to go back to a single response queue.

N to M Correlation : This is a scenario where multiple Senders will send in the different correlation id or queue. What this ideally means multiple senders and the response to that needs to go back to a multiple response queue.

Correlation in Service Bus

Message Correlation (Queues)

Originator sets Message or CorrelationId, Receiver copies to the reply
Reply sent to Originator owned Queue indicated by ReplyTo
Originator receives and dispatches on CorrelationId

Subscription Correlation (Topics)

Originator sets Message or CorrelationId, Receiver copies to the reply
Originator has Subscription on shared reply Topic w/ rule covering Id
Originator receives and dispatches on CorrelationId

Session Correlation

Originator sets some SessionId, on outbound session
Receivers reuses SessionId for reply session
Originator filters on known SessionId using session receiver.

Additional features

Local Transaction support exists in Service Bus Messaging
Message Scheduling
Dead Lettering
Duplicate Detection
Prefetching

Summary

In principle the Service Bus Messaging supports pretty much all messaging type via relay or brokered messaging. In addition to it has lot more. Next Post comparing Azure Queues to Azure Service Bus Queue.

Codebase for All Supported Messaging on Services Bus can be found at my github here – still working - https://github.com/ajayso/Azure-Service-Bus---Messaging-Samples.git