Ajay Solanki: 2012

Sunday 2 December 2012

Architecting Database Applications on Windows Azure

90% of application developed on premise or on cloud are likely to have a database of some nature. The database of choice could be very many ranging from blobs , tables, big data i.e No SQL to SQL Azure.

In the last decade we have seen most database tend to relational in nature its only in last 2 years we have had an adventures of No SQL databases. In the post I go around the areas which needs special consideration for SQL Azure. Below are some the few.

SQL Azure is not Microsoft SQL Server, on the contrary its a managed version of SQL Server.The SQL database is a multi tenanted system in which many database instances are hosted on a Single SQL Server running on a physical server or node. That is the very reason to expect a very different performance characteristics what one would expect from pure SQL Server
Every instance of SQL Database in Azure has one primary and 2 secondary replica. SQL Database uses quorum commit in which a commit is deemed successful as soon as primary and secondary replica have completed the commit. One of the many reason why a write will be slower.
Security in SQL Azure

Use secure connections via using TDS protocol secure encrypted connecting on port 1433.
Handle authentication and authorization separately- SQL Database provides security administration to create logins and users in a way similar to SQL Server. Security administration for the database level is almost the same as in SQL Server. However, the server level administration is different because SQL Database is assembled from distinct physical machines. Therefore, SQL Database uses the master database for server level administration in order to manage the users and logins.

To manage network access, use the SQL Database service firewall that handles the network access control. You can configure firewall rules that grant or deny access to specific IP or range of IPs. The firewall can be indication of further latency

Connection Timeouts -SQL Database offers high availability (HA) out of the box by maintaining three replicas spread out on different nodes in the datacenter and considering a transaction to be complete as soon as two of the three replicas have been updated. In addition, a fault detection mechanism will automatically launch one of the database copies if needed: when a fault is detected, the primary replica is substituted with the second replica. However, this can trigger a short-term configuration modification in the SQL Database management and result in a short connection timeout (up to 30 seconds) to the database.
Back up Issues - HA is also enforced in the scope of the datacenter itself; there is no data redundancy across geographic locations. This means that any major datacenter fault can cause a permanent loss of data.

To protect your data, be sure to back up the SQL Database instance to Windows Azure storage in a different datacenter. To reduce data transfer costs, you can choose a datacenter in the same region.To mitigate the risk of this connection timeout, it is a best practice to implement an application retry policy for reconnecting to the database. To reduce the overall reconnection time, consider a back-off reconnection strategy that increases the amount of time for each connection attempt. There is snapshot recovery of SQL Database as of current, with the acquisition of Stor Simple this can be possible.

Use of CQRS or EF: With the slow writes and requiring fast reads. CQRS can be considered as it segregates the reads from writes given the uncanny love for EF. A short prototype can help one decide following areas are to be watched out for

Preventing exceptions resulting from closed connections in the connection pool: First, the EF uses ADO.Net to handle its database connections. Because creating database connections can be time-consuming, a connection pool is used, which can lead to an issue. Specifically, SQL Database and the cloud environment can cause a database connection to be closed for various reasons, such as a network problem or resource shortage. But even though the connection was closed, it still remains in the connection pool, and the EF ObjectContext will try to grab the closed connection from the pool, resulting in an exception. To mitigate this issue, use a retry policy for the entity connection as offered by the Transient Fault Handling Application Block so that multiple attempts can occur in order to accomplish the command.
Early loading: EF offers early and lazy loading both developers are not aware how queries are fired to the database which can result in performance degradation. Lazy loading may add to the problem as the reads can also be potentially slow if the data is spread across separate tables because multiple round trips will be required to traverse each object. This problem can be eliminated by using eager loading, which enables joining information in separate tables (connected by foreign key) in a single query.
Avoid LINQ query which uses distribution transactions.

Handling connection failures: SQL Database is a distributed system which applications access over a network in a Windows Azure datacenter. Connections across this network are subject to failures that can lead to the connections being killed.

Specifically, when there is a failure in either a data node or the SQL Server instance it hosts, SQL Database moves activity off of that node or instance. Each primary replica it hosts is demoted and an associated secondary replica is promoted. As part of this process, connections to the now demoted primary server are killed. However, it can take several seconds for the information about the new primary replica to propagate through SQL Database, so it is essential that applications handle this transient failure appropriately.

In addition SQL Database is a multitenant system in which each data node hosts many instances. Connections to these instances compete for the resources provided by the data node. In times of high load, SQL Database can throttle connections that are consuming a lot of resources. This throttling represents another transient failure that applications need to handle appropriately.

Designing applications to handle connection failures-The first step in handling connection failures is to determine whether the failure is transient. If it is, the application should wait a brief time for the transient problem to be resolved and then retry the operation until it succeeds. Use of Transient Application Block is a must here.

Throttling: The physical resource on which the SQL Database is hosted is shared among many applications. MSFT does not provide any way to reserve a guaranteed level of resource availability. Instead SQL databases throttles connection to instances that consume too many resources. SQL Database consider resource use as a 10 second interval referred to as throttling sleep interval. Instances that make use of many resources in these intervals may be throttled for one or more of the sleep interval until resource level reaches acceptable levels. Two types of throttling soft and hard depending how severely resource usage limits are exceeded. Figuring out throttling issues is if transient connection failures are high.
No chatty applications. Use Windows Azure caching techniques to avoid chatty database calls.
Monitoring Limitations: SQL Database has fewer monitoring options that SQL Server-Various monitoring methods available in SQL Server, such as audit login, running traces and performance counters, are not supported in SQL Database. However, one monitoring option, Dynamic Management Views (DMVs), is supported, although not to the same extent as in SQL Server.
Backup and Restore-SQL Database provides fault tolerance internally by using triplet copies of each data committed. However, even the strongest database box won’t prevent data corruption due to hardware malfunctions, internal application faults or human errors. Therefore, the DBA for any application needs to be concerned with database backup and restore. We recommend the following practices.

First consider scheduling a backup task every day to create recent restore points.

Second, consider setting the backup target to Azure storage by creating and exporting a BACPAC file from the SQL Database to Windows Azure blob storage; you can either use the Windows Azure portal or an API command (check out sqldacexamples for more information). Be sure to make the link specific. If you do choose to back up to Windows Azure storage, make sure that the storage account is located at a different datacenter (but on the same region to minimize data transfer rates) to prevent loss in the event of a major datacenter failure.

Third, consider using Microsoft SQL Data Sync to sync data between SQL Database instances (copy redundancy) or to sync a SQL Database instance to an on-premises Microsoft SQL Server database (be aware that SQL Data Sync currently does not provide versioning). Finally it’s worth mentioning that if you are planning on a major application upgrade, you should manually back up your databases to prevent an unexpected regression.

SQL Database provides fault tolerance internally by using triplet copies of each data committed. However, even the strongest database box won’t prevent data corruption. Therefore, the DBA for any application needs to be concerned with database backup and restore.

Scaling out the database-SQL Database instance size is limited and performance is not guaranteed

SQL Database is a multi-tenanted system in which the physical resources are shared among many tenants. This resource sharing affects both the maximum instance size supported by SQL Database and the performance characteristics of each instance. Microsoft currently limits the instance size to 150 GB and does not guarantee a specific performance level.

Using sharding to scale out the database with SQL Database Federations

The solution to both the database size problem and the performance problem is to scale the database horizontally into more than one database instance using a technique known as sharding.

SQL Database Federations is the managed sharing feature of SQL Database. A federated database comprises a root database to which all connections are made and one or more federations. Each federation comprises one or more SQL Database instances to which federated data is distributed depending on the value of a federation key that must be present in every federated table. A restriction is that the federation key must be present in each clustered or unique index in the federated tables. The only distribution algorithm currently supported is range, with the federation key restricted to one of a small number of data types.

SQL Database Federations provides explicit support to split a federation instance in two and ensure that the federated data is allocated to the correct database in a transactionally consistent manner. SQL Federations also provides support to merge two instances, but this causes the data in one of the instances to be lost.

An application using a federated database connects to the root database and specifies the USE FEDERATION statement to indicate which instance the connection should be routed to. This provides the benefit of connection pooling on both the client and the server.

SQL Federations provides the ability to scale out a SQL Database application to a far larger aggregate size than can be provided by a single instance. Since each individual instance has the same performance characteristics, SQL Federations allows an application to scale out performance by using many instances.

The solution to both the database size problem and the performance problem is to scale the database horizontally into more than one database instance using a technique known as sharding.

Synchronizing Data- Where should the SQL Database instance be located? Windows Azure is a global cloud service available in eight datacenters on three continents. A website can be hosted in multiple Windows Azure datacenters, and Windows Azure Traffic Manager can be configured to allow users to access the closest datacenter. The question therefore arises of where to locate the SQL Database instance to store application data. Which datacenter should it be in?

There is an increasing interest in hybrid solutions, in which part of the application remains on-premises and part is migrated to Windows Azure. Again, the problem arises of how to handle data. Should it be stored on premises and a VPN set up to allow cloud services hosted in Windows Azure to access it? Or should it be stored in the cloud?

Using Microsoft SQL Data Sync

Microsoft SQL Data Sync provides a solution for both of these situations. It can be used to configure bi-directional data synchronization between two SQL Database instances, or between a Microsoft SQL Server database and a SQL Database instance. It uses a hub-and-spoke topology in which the hub must be a SQL Database instance.

Consequently, Microsoft SQL Data Sync can be used together with Windows Azure Traffic Manager to create truly global applications where both the cloud service and the SQL Database instance are local to each datacenter. This minimizes application latency, which improves the user experience.

Similarly, Microsoft SQL Data Sync can be configured to synchronize data between an on-premises SQL Server database and a SQL Database instance. This removes the need to privilege one location over the other, and again improves application performance by keeping the database close to the application.

Monday 26 November 2012

Narwhal- Big Data for US Election

Codename Narwhal is Obama secret data integration project which started 9 months has paid off. The team of data scientists , developers, and digital advertising experts, putting there heads together to really get big data to help the team make better decision.

4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests. Design, deploy, dismantle in 583 days to elect the President…

Key Take Away

While the entire platform is built on Amazon, its a greatly proven architecture, a lot of the application were build around the start up strategy “the open source culture” and the “idea of core Platform Services in the form of Narwhal Services” has made the overall picture very simple.

Background

At a very high level Narwhal integrated data across multiple applications

Facebook,
National list of voters and lot more data from the swing state.
Swing state data : As a standard sales principle the 60% of customer who are on the fence are to be targeted well.
Public voting records
Responses coming directly from the voters
Tracking voters across the web
Serving ads to public with targeted messages on the campaign sites
Analysing what does a voter read online
Obama supporter on Facebook – cross sell sending emails to a supported about there friends in the swing states encouraging them to vote.

The starting of the data architecture was the database of registered voters from Democratic National Committee & keeping it up to date. Playing around with this data in terms of adding voter data mix seeing the trends.

Data collection has been highly private and running analysis this data to decide next probable strategies.

High Level Analytics

If we look into the Analytics -The Obama campaign had a list of every registered voter in the battleground states. The job of the campaign’s much-heralded data scientists was to use the information they had amassed to determine which voters the campaign should target— and what each voter needed to hear.

What did the data help them in

Deep Targeting of voters

Race wise targeting example Latino community using diversity

The data scientist really came down to the following Individual estimates for each swing state voter’s behaviour.

These four numbers were included in the campaign’s voter database, and each score, typically on a scale of 1 to 100, predicted a different element of how that voter was likely to behave.
Two of the numbers calculated voters’ likelihood of supporting Obama, and of actually showing up to the polls. These estimates had been used in 2008. But the analysts also used data about individual voters to make new, more complicated predictions.
If a voter supported Obama, but didn’t vote regularly, how likely was he or she to respond to the campaign’s reminders to get to the polls?

The final estimate was the one that had proved most elusive to earlier campaigns—and that may be most influential in the future.

Micro targeting another numerical scoring mechanism is been used widely so more data on the same is here.

The Complete Architecture

The central piece or key application block is Amazon's cloud computing services for computing and storage power. At its peak, the IT infrastructure for the Obama campaign took up "a significant amount of resources in AWS's Northern Virginia data center,".

Narwhal Services

The key architectural decision marking in an ambiguous architecture situation is to get the core perfect. The Obama team build the core Narwhal a set of services that acted as an interface to a shared data stores for all application. Moreover the service layer was REST based which allowed building applications in any development language and platform making it possible to quickly develop new applications and to integrate existing ones into the campaign's system. Those apps include sophisticated analytics programs like Dreamcatcher, a tool developed to "microtarget" voters based on sentiments within text. And there's Dashboard, the "virtual field office" application that helped volunteers communicate and collaborate.

With introduction of Narwhal Services Layer this gave the option decoupling all application and allow each application scale up individually and at the same time allow to share data across all application. Given the nature of the business and the need to build the application on the fly it was important to build something like a Narwhal Services Layers

Platform Agnostic Development

With all services exposed as REST Based the option for developers to build an application in any language and platform.

The team

The idea of recruiting people who already knew the territory, snapping up both local talent and people from out of town with Internet bona fides—veterans from companies like Google, Facebook, Twitter, and TripIt.

"All these guys have had experience working in startups and experience in scaling apps from nothing to huge in really tight situations like we were in the campaign,".

The need to hire engineers who understand APIs—engineers that spend a lot of time on the Internet building platforms.

The Technical Stack

Narwhal. Written in Python, the API side of Narwhal exposes data elements through standard HTTP requests. While it was designed to work on top of any data store, the Obama tech team relied on Amazon's MySQL-based Relational Database Service (RDS). The "snapshot" capability of RDS allowed images of databases to be dumped into Simple Storage Service (S3) instances without having to run backups.

Even with the rapidly growing sets of shared data, the Obama tech team was able to stick with RDS for the entire campaign—though it required some finesse.

They were some limitations with RDS but they were largely self-inflicted . They were able to work around those and stretch how far we were able to take RDS. If the campaign had been longer, it would have definitely had to migrate to big EC2 boxes with MySQL on them instead."

The team also tested Amazon's DynamoDB "NoSQL" database when it was introduced. While it didn't replace the SQL-based RDS service as Narwhal's data store, it was pressed into service for some of the other parts of the campaign's infrastructure. In particular, it was used in conjunction with the campaign's social networking "get-out-the-vote" efforts.

The integration element of Narwhal was built largely using programs that run off Amazon's Simple Queue Service (SQS). It pulled in streams of data from NGP VAN's and Blue State Digital's applications, polling data providers, and many more, and handed them off to worker applications—which in turn stuffed the data into SQS queues for processing and conversion from the vendors' APIs. Another element of Narwhal that used SQS was its e-mail infrastructure for applications, using worker applications to process e-mails, storing them in S3 to pass them in bulk from one stage of handling to another.

Initially, Narwhal development was shared across all the engineers. As the team grew near the beginning of 2012, however, Narwhal development was broken into two groups—an API team that developed the interfaces required for the applications being developed in-house by the campaign, and an integration team that handled connecting the data streams from vendors' applications.

The applications

As the team supporting Narwhal grew, the pace of application development accelerated as well, with more applications being put in the hands of the field force. Perhaps the most visible of those applications to the people on the front lines were Dashboard and Call Tool.

Written in Rails, Dashboard was launched in early 2012. "It's a little unconventional in that it never talks to a database directly—just to Narwhal through the API," Ecker said. "We set out to build this online field office so that it would let people organize into groups and teams in local neighbourhoods, and have message boards and join constituency groups."

An Obama campaign video demonstrating how to use Dashboard.

Enlarge / The Dashboard Web application, still live, helped automate the recruitment and outreach to would-be Obama campaign volunteers.

Dashboard didn't replace real-world field offices; rather, it was designed to overcome the problems posed by the absence of a common tool set in the 2008 election, making it easier for volunteers to be recruited and connected with people in their area. It also handled some of the metrics of running a field organization by tracking activities such as canvassing, voter registration, and phone calls to voters.

The Obama campaign couldn't mandate Dashboard's use. But the developer team evolved the program as it developed relationships with people in the field, and Dashboard use started to pick up steam. Part of what drove adoption of Dashboard was its heavy social networking element, which made it a sort of Facebook for Obama supporters.

Enlarge / Call Tool offered supporters a way to join in on specific affinity-group calling programs.

Call Tool was the Obama campaign's tool to drive its get-out-the-vote (GOTV) and other voter contact efforts. It allowed volunteers anywhere to join a call campaign, presenting a random person's phone number and a script with prompts to follow. Call Tool also allowed for users to enter notes about calls that could be processed by "collaborative filtering" on the back end—identifying if a number was bad, or if the person at that number spoke only Spanish, for instance—to ensure that future calls were handled properly.

Both Call Tool and Dashboard—as well as nearly all of the other volunteer-facing applications coded by the Obama campaign's IT team—integrated with another application called Identity. Identity was a single-sign-on application that tracked volunteer activity across various activities and allowed for all sorts of campaign metrics, such as tracking the number of calls made with Call Tool and displaying them in Dashboard as part of group "leaderboards." The leaderboards were developed to "gamify" activities like calling, allowing for what Ecker called "friendly competition" within groups or regions.

All of the data collected through various volunteer interactions and other outreach found its way into Narwhal's data store, where it could be mined for other purposes. Much of the data was streamed into Dreamcatcher and into a Vertica columnar database cluster used by the analytics team for deep dives into the data.

A good comparison http://communities-dominate.blogs.com/brands/2012/11/orca-meets-narwhal-how-the-obama-ground-game-crushed-romney-a-look-behind-the-math.html

Solving real business problems with Cloud………………. Just the beginning

Friday 16 November 2012

StorSimple Likely to address Gaps in Azure Storage

Cloud Integrated storage primarily for backup, archival and disaster recovery story sounds an interesting proposition for MSTF. Looking for an pure applicability standpoint If one takes a closer look at enterprise grade application which is deployed on the cloud the following areas which are data concerns to the customer

Backup of structured and unstructured data
Archival Strategy and Implementation with Quick Retrieval of archived data.
Virtual Machine Backup and Restoration
Disaster Recovery.
Snapshot Recover for data
Stringent Data Security
Applications level backup and recoveries, windows file shares, SharePoint libraries and version control.

If I take a good closer look at Windows Azure what we have in the name of DR is maintaining 3 copies of the data across the data center which kind addresses the availability aspect, the Archival Strategy is totally missed out, Snapshot recovery to a specific point in time is not possible.

Application level snapshot with version control is non existent. StorSimple does bring a unique value proposition for addressing storage in complete scheme of things for both on premise and cloud.

It would be interesting to see how does the Azure Storage end up harnessing the benefits of StorSimple to fill in the gaps of its storage strategy. Moreover relooking into the Sql Azure storage to utilize StorSimple for backup / restore , snapshot restore, archival would be good. It would be long before these features start showing up into Sql Azure, I’m hoping it comes by end of next year as of current Sql Azure has no backup/ restore or archival features.

In addition of Azure , Office 365 can also end up leverage StorSimple.

They are quite a few gaps in the storage strategy of Windows Azure as current.

The complete article can be found here http://blogs.msdn.com/b/windowsazure/archive/2012/11/15/microsoft-acquires-storsimple.aspx.

Wednesday 7 November 2012

Solving Azure Storage latency issues via FNS

The Azure Storage access has been plagued with latency issues until MSFT decided to change the network design to FNS.FNS(Flat Network Storage) is a good way to solve the networking issues which arise due to a hierarchical network structure. Azure embracing FNS as Gen 2 storage SKU is a very welcome move. The isolation of compute and storage network is very much required. Having a separate durable network which allows to read , write azure storage at faster speed. This non functional requirement has always been a must required for Azure, the earlier speeds were very slow.

Moreover application plumbing code of managing the latency if the slower reads, write will get some relief.

The patterns are changing and framework codebase are likely to change as well.The scalability numbers of azure storage have to be tested based on the documentation following are the numbers

Within a storage account, all of the objects are grouped into partitions as described here. Therefore, it is important to understand the performance targets of a single partition for our storage abstractions, which are (the below Queue and Table throughputs were achieved using an object size of 1KB):

Single Queue– all of the messages in a queue are accessed via a single queue partition. A single queue is targeted to be able to process:
- Up to 2,000 messages per second

Single Table Partition– a table partition are all of the entities in a table with the same partition key value, and usually tables have many partitions. The throughput target for a single table partition is:
- Up to 2,000 entities per second
- Note, this is for a single partition, and not a single table. Therefore, a table with good partitioning, can process up to the 20,000 entities/second, which is the overall account target described above.

Single Blob– the partition key for blobs is the “container name + blob name”, therefore we can partition blobs down to a single blob per partition to spread out blob access across our servers. The target throughput of a single blob is:
- Up to 60 Bytes/sec

Some of the definite goods of FNS

The flat network design in order to provide very high bandwidth network connectivity for storage clients. This new network design and resulting bandwidth improvements allows us to support Windows Azure Virtual Machines, where we store VM persistent disks as durable network attached blobs in Windows Azure Storage. Additionally, the new network design enables scenarios such as MapReduce and HPC that can require significant bandwidth between compute and storage.

Segregation of customer VM based compute from storage from a networking standpoint makes it easier to provide for multi tenancy.

The FNS design does call for a new network design and a software load balancer on the contrary the 10GBps network speed for storage node network solves many of the design challenges at the application level.

The changes to new storage hardware and to a high bandwidth network comprise the significant improvements in our second generation storage (Gen 2), when compared to our first generation (Gen 1) hardware, as outlined below:

Above are my thoughts The original article can be found here - http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx

Sunday 23 September 2012

Hadoop, Sql Azure the BI and Analytics Dilemma

MSFT teams in Redmond haven’t made any official statement on Analysis Service (SSAS) for Sql Azure. With Azure making big commitment on Hadoop the dilemma becomes two fold building BI & Analytical capabilities for Sql Azure and Hadoop. I’m sure countless discussions have happened in those Redmond buildings deciding the same. Some guess work around this “MSFT will release Analysis Services for Sql Azure early next year”. The Analysis Service should involve support for both Sql and No Sql.
I started with a simple application in the financial industry vertical in cloud. I soon reached a point where I was pushed to corner to make some hard decision on cloud as the application besides the transactional features required tons of BI and Analytics spread across both structured and unstructured world.
Sql Azure unwillingly became the choice of structured database and Hadoop and Azure became willing choice for unstructured data. Hadoop provide very quick and optimal search across terra bytes of data and Sql Azure with its limited on cloud offering gave transactional. The BI and Analytical capability became a nightmare.
The MSFT teams on Sql Azure are tight lipped about an “Analysis framework in Sql Azure which can do both structured & un structured”. For right now I have a Sql 2012 Analysis Service running on Azure virtual machines which did pretty much what I wanted. But then “unwillingly I have to say SSAS of Sql 2012 is coming in Sql Azure” pure guesswork.
The application as of current looks like this

What below is my thought process pure guesswork? The shift to No Sql is evident MSFT – Azure platform has to embrace the No Sql platform completely this involves extending the BI and Analytics. What seems to be emergent is an Analysis Service architecture is expected to support both Sql and No Sql running out of Sql Azure platform. The Sql Server 2012Analysis Service - xvelocity an in memory analytics engine is step to move into the PaaS model for Sql Azure.This new engine is delivered within the following modules

xVelocity for Data Warehousing: is a memory optimized columnstore index for high speed data querying (relational queries).
xVelocity for Business Intelligence: is the in-memory analytics engine for Analysis Services (Tabular Model) and PowerPivot.

As the name in-memory engine implies all the data is stored in memory. Although todays computer systems are equipped with gigabytes of memory, memory still is an expensive resource. Therefore we need to be able to analyze the memory usage of the Analysis Services in-memory engine to understand how much memory is consumed by the different applications. The SQL Azure Analysis Service Framework should deliver a massively parallel processing infrastructure with a software solution that embeds both SQL and MapReduce analytic processing for deeper analytic insights on multi-structured data and new analytic capabilities driven by data science. Analysis service is most likely to uses an integrated MapReduce analytics engine for embedded analytic processing, simplifying enterprise access for big data analytics. Sql Server Analysis Service formally supported only SQL so that any business intelligence tool that generates standard SQL or any business analyst that knows SQL can immediately invoke the power of data science without having to learn programming languages or new interfaces. What is expected Sql Analysis Services Framework expected to look like is

While writing my applications I have found pushing data into Hadoop Azure cumbersome. I have resorted to writing data poll mechanism which will pull data from the Azure Blob storage push to Hadoop Head Node. This is not a documented way, it works. Find the code here. https://github.com/ajayso/AzureHeadNodeFileReceiver.git

Tuesday 4 September 2012

Real-World Windows Azure Case Study: Hearst Newspapers- Getting Critical about it.

I’ve been following the real world examples where Azure has been deployed in volumes, with absolute regret I must state there are very few. I started reading the Hearst Newspaper case study deployed on Azure, I started digging around for some answers this what I have come as my findings.

For the complete case study find it here - http://blogs.msdn.com/b/windowsazure/archive/2012/08/29/real-world-windows-azure-hearst-newspapers-powers-premium-digital-news-service-with-windows-azure.aspx.

In blue is the excerpts from the case study mentioned above and in the professional red are my comments.

Setting the background Hearst Newspapers owns 15 newspapers including Houston Chronicle, San Francisco Chronicle, Albany Times Union and San Antonio Express-News.

The FY 11 Annual Review find here http://www.hearst.com/annualreview2011/

Digital News Service with Windows Azure is what Hearst was trying to implement. Hearst Newspaper in the internet world pretty much lived off no fee website and business goal was to get test subscription based premium content delivered to mobile apps.

Apparently the sentence which nails the debate is “ We also planned to start a new initiative with an app for the Apple iPad and subscription through iTunes store”

The following points seem to have missed the architects eye in this solution.

“Azure has come out with Mobile Services recently” I’m sure as an architect one would have a foresight what’s the roadmap of Azure. http://weblogs.asp.net/scottgu/archive/2012/08/28/announcing-windows-azure-mobile-services.aspx
To support the argument further with Azure Mobile Services supporting iPad application example: http://chrisrisner.com/Windows-Azure-Mobile-Services-and-iOS , Thriving on the MVC architecture to support iOs applications using Object C. Mobile Services end points are just looking for data to come across in JSON format.
Customer Value Prop Missed out “Needed a solution that would tie these systems together that would orchestrate which offers were available to which users, based on their subscription status or purchase”. I think this requirement is not addressed or missed out.

The Content Management Story

At Hearst multiple content managements serve up for multiple existing news website. I don’t know the intricate details “The database of print subscriber was on another system”. This sounds an organization which has grown its IT in a very unorganized manner. I could be wrong. Content Management Systems need a master central information and rules repository for generic information & governance considering the risk the Media company are up against, and a localized repository can replicate the rules and generic information.The publishing of content to various website can happen from localized or centralized repository.

Dissecting the Requirement Statement

The requirement as stated by Hearst “ Needed a solution that would tie these systems together that would orchestrate which offers were available to which users, based on their subscription status or purchase.” <<Response> this sounds like a requirement of CRM>

“The system needed to ensure that the CMS made the right content available to each user.” <<Response> this sounds like a requirement of AUDIENCE targeting>

“And it would support new content systems and content delivery networks as we decided to include them” <<Response> this requirement is about 1 million feet high>

The Business Requirement

“Scalable Solution to meet the increasing demands of a phased rollout across our news properties and could handle traffic spikes due to news events. Given the competitive pressures we also decided that fast time to market was crucial. And we wanted our solution to be cost-effective with low capital and operating costs consistent with razor thin margins of media business”

<Response>

Scalable Solutions will depend a lot on how your systems are architected with multiple CMS and multiple news sites & multiple other application one cannot scale to infinity . Scalability largely depends on what applications have moved to cloud with a lot of applications build with third party controls and software PaaS is not a direct fitment, a hybrid looks obvious.
Fast Time to Market: This is a misnomer your ability to market to fast depends
- Ability to integrate all CMS and use intelligent CRM to value added services for your customer
- Premium content – Cross Sell, Up Sell , this is derived benefit.
Cost Effective: This depends on how has the Azure components been structured and have we used the Calculator well enough to include all seasonality's.
Low Capital and Operating Cost—Don’t know but from what I seeing Azure has a lot of hidden costs in a real deployment. The budgets are generally 30% higher than the estimated cost. >

The Azure Value Proposition as stated

To avoid the delays inherent in deploying infrastructure, we focused on cloud-based solutions. When I think of the cloud, I really think of abstracting the management of the infrastructure, so we can focus on our application, our added value. We looked at Amazon, but we would still have had to manage our servers, install updates, and do everything we wanted to avoid. Windows Azure is more than infrastructure-as-a-service; it’s a fully managed platform-as-a-service. That’s the way we wanted to go.

“When I think of the cloud, I really think of abstracting the management of the infrastructure, so we can focus on our application, our added value”-

<Response> What I’m seeing is an increasing trend in adding monitoring functionality to the application on cloud example Signal R, New Relic- This is an additional compute cost attached to this. Abstracting the management of the infrastructure cannot be achieved completely if there exists a hybrid cloud model i.e on premise, cloud (PaaS), Azure VM’s </Response>

Windows Azure is more than infrastructure-as-a-service; it’s a fully managed platform-as-a-service. That’s the way we wanted to go.

<Response> Windows Azure has a lot good features and lot is evolving a classic is mobile services. The roadmap of Azure is required and MSFT needs to be more transparent to customers, this if MSFT wants to have more customers moving to Azure.</Response>

Existing Technology Stack

“Much of the technology we use runs on the LAMP stack: Linux, Apache, MySQL, and PHP. Much of the rest runs on Microsoft technologies. For the integration solution, we chose .NET, and Windows Azure SQL Database . For a solution that runs on Windows Azure, it only makes sense to use a consistent, end-to-end technology stack.”

<Response> The integration story is far more than .NET & Windows Azure SQL Database, its likely to be Windows Service Bus as that is the single most important component which will help integrate. In addition to this, Service Bus Queues, Topic & Subscription and transaction will be a candidate. Sql Azure may perhaps work as a storage that’s kind of expensive.</Response>

Time to get the Solution Out

Built in a month, the solution is a token-based entitlement database (EDB) that interacts with the mobile app, the subscription database, and the iTunes Store.

Figure 1. The entitlement database hosted in Windows Azure mediates between Hearst’s subscription and content systems and Apple iTunes.

When the Hearst mobile app connects to the CMS to download premium content, the CMS first determines if the app is entitled to that content. If the app cannot present a token confirming a consumer’s existing print or digital subscription, or content purchase, the CMS redirects the app to the EDB to obtain one. The EDB checks the subscription system. If the consumer has a subscription that includes digital content, the EDB provides a token that the app then uses to obtain that content from the CMS. The token is secured with common secret key and hash procedures, and is completely compatible across platforms including the Microsoft .NET Framework, .PHP and Perl, as well as Hearst’s content delivery network.
If the consumer does not have a subscription, the EDB offers the options of obtaining a trial subscription through an “in-app” transaction, or purchasing content from iTunes. If the consumer chooses the former, the EDB updates the subscription system and issues the token. If the consumer chooses the latter, the EDB redirects the app to the iTunes Store to make a purchase. The app then presents the iTunes receipt to the EDB; the EDB confirms the transaction with iTunes and issues the token. Once the app has the token, it presents it to the CMS to obtain content. This process makes it possible for the EDB to incorporate the security policies of external subscription systems, should we wish to add them.

<Response> The token based entitlement database is extension to the Identity Management & Federation Services. It allows the users to access premium content via mobile based on the token. The token acts as form of authorization i.e what content is the user authorized too. A lot of the Identity & Access Management is available out of box in Azure. It would more palatable if Service Bus Integration the value prop of Cross Sell and Up Sell via intelligent CRM will help realize the goal targeted content for audience.

What has been achieved in 1 month is more of a prototype of mobile services this is very typical of DPE division in MSFT to do this, The true value add is in doing something bigger.

</Response>

The technical and business benefits you’ve seen from using Windows Azure as stated by Hearst

First, Windows Azure enabled us to meet our goals to quickly create a high-quality solution to support our first mobile app, while preserving our flexibility to include additional devices, apps, and storefronts to expand our digital market share. Instead of the time—anywhere from a week to a month—that it would have taken to deploy servers at a traditional hosting provider, we deployed its Windows Azure instances in less than a day, a time savings of between 80 and 95 percent.

<Response> Given this is a prototype a day looks good. I have personally tried deploying HPC 20 instances over the internet on Azure it takes sometime. Ideally there should be a staging environment and the swap needs to be done to move into production.

</Response>

Faster testing enabled by Windows Azure helped the quality of the solution.

The tight interoperability between Windows Azure and the development process enabled us to do more testing than we normally do and a more thoroughly tested application is a more robust application. We also benefitted from the Pariveda Continuous Integration Server, which resulted in faster development and more stable code. This enables us to maintain a stronger focus on the application and on maximizing business value. You can maintain a greater development velocity. That’s what we gained by using Windows Azure.

<Response> Greater development velocity is something which needs more information.From live experience testing and debugging on windows azure is a new learning and slightly complex. The on premise emulator is good for unit testing only, any other form of testing better on staging cloud as the performance and other elements vary</Response>

Financial benefits have you seen with Windows Azure

Hosting our EDB in Windows Azure, we avoided the hundreds of thousands of dollars in capital costs and operating expenses associated with building an on-premise solution and operating it over a three-year period. We also avoided the high fees associated with a traditional hosting provider. I estimate those charges would likely have come to $10,000 per month—compared to the $2,000 per month that we incur for Windows Azure—a savings of 80 percent and, over three years, of $288,000. For Hearst subscribers, the news has to flow as readily as water or electricity. Anything less could cripple adoption and our move into mobile apps. That puts a heavy responsibility on the EDB—one that it is meeting successfully.

<Response>EDB is a very small part of the solution. Hosting EBD alone in the cloud doesn’t benefits in the long run, The long term strategy to move bigger parts to cloud would be a Benchmark statement for Microsoft </Response>

What’s next?

Our strategy is to preserve our options so we can continually explore what works best in the digital marketplace. We’re using Windows Azure as part of that strategy, and we expect to move more of our systems to it. We can grow our environment quickly and at low cost. We can swap in various subscription offers without having to put new apps through iTunes. We can add apps for other mobile devices and add storefronts—including our own websites—and just plug them all into Windows Azure. We can’t know where the future is going, but we’re sure we’re ready for it.

<Response>The customer is not too sure about Azure will they have taken there initial bets on mobile services and EDB in cloud there are not sure if they plan to use the Media Services provided by Azure and more over the architecture doesn’t looks to be well thought At the end of the day it looks like a good Prototype…….</Response>

Tuesday 28 August 2012

Windows Azure Queues–The Complete Works

My last post had concentrated on Service Bus Queues, I do get a lot of questions from the customer when to use Azure Queues vs. Service Bus Queues. This post I try to establish the decision which help one make better choices between the two.

Digging deeper into Azure Queues: Windows Azure Queue the expectation is it will be lower cost alternative to the Service Bus Queue’s. In principle Windows Azure Queue going ahead referred to as WAQ

Are asynchronous reliable delivery messaging construct.
Highly available , durable and performance efficient. The performance numbers to which the WAQ can manage is area of some research.
Ideally they are process At Least Once.
REST based interface support.
WAQ doesn’t have a limit on the number of messages stored in queue.
TTL for WAQ is 1 week , post that they will be garbage collected.
Meta data support in form name value pair exists
Maximum message size is 64KB
Message inside WAQ can be put in binary when read back it comes as XML
No guaranty on sequencing of the message
No support for duplicate messages identification.
Parameters of WAQ include
- MessageID: GUID
- Visibility Timeout: Default is 30 seconds maximum is 2 hours, Ideally use for read and process and then issue a delete.
- PopReceipt: On reading the queue there is visibility timeout associated with it, the receiver reads the messages tries to complete some processing and then may decide to issue a delete. The message which is read has a PopReceipt associated with it. PopReceipt is used while issuing a delete it goes with the MessageId.

PopReceipt, is

Property of CloudQueueMessage
Set every time a message is popped from the queue (GetMessage or GetMessages)
Used to identify the last consumer to pop the message
A valid pop receipt is required to delete a message
An exception is thrown if an invalid pop receipt is passed
PopReceipt is used in conjuction with the message id to issue a Delete of a message , for which a visibility timeout is set. We have the following scenarios
A Delete is issued within the visibility timeout the Delete the message is deleted from the queue, the assumption here is the message has been read and processing required has been done term it the happy path.

A Delete is issued post expiry of the visibility time, this assumed to be exception flow “ ex: the receiver process has crashed” and message is available in queue for re-processing. This failure recovery process rarely happens, and it is there for your protection. But it can lead to a message being picked up more than once. Each message has a property, DequeueCount, that tells you how many times this message has been picked up for processing. For example above, when receiver A first received the message, the dequeuecount would be 0. When receiver B picked up the message, after server A’s tardiness, the dequeuecount would be 1. This becomes a strategy to detect problem or poison message and route it to a log,repair and resubmit process.

Poison message is a message that is somehow continually failing to be processed correctly. This is usually caused by some data in the contents that causes the processing code to fail. Since the processing fails, the messages timeout expires and it reappears on the queue. The repair and resubmit process is sometimes a queue that is managed by a system management software. There is a need to check for and set a threshold for this dequeuecount for messages.

MessageTTL : This specifies the time-to-live interval for the message, in seconds. The maximum time-to-live allowed is 7 days. If this parameter is omitted, the default time-to-live is 7 days. If a message is not deleted from a queue within its time-to-live, then it will be garbage collected and deleted by the storage sytem.

Notes: It is important to note that all queue names must be lower case. The CreateIfNotExist() method will see if the queue really does exist in Windows Azure, and if it doesn’t it will create it for you.

Comparison of Azure Queues with Service Queues

A good post which covers that can be found here -http://preps2.wordpress.com/2011/09/17/comparison-of-windows-azure-storage-queues-and-service-bus-queues/

Design Consideration for Azure Queues

The messages are pushed into the queues the receiver will read the message process & delete. The general technique for reading messages from a queue used is Polling. The use of a classic queue listener with a polling mechanism may not be the optimal choice when using Windows Azure queues because the Windows Azure pricing model measures storage transactions in terms of application requests performed against the queue, regardless of if the queue is empty or not. If the number of messages increase in the queue “load leveling” will kick in and more receivers roles will spin off. These receivers will continue to run and accrue cost.

The costing of a single queue listener using polling mechanism

Assuming a hypothetical situation there is a single queue listener constantly polling for messages in the queue. The business transaction data arrives at regular intervals. However, let’s assume

The solution is busy processing workload just 25% of the time during a standard 8-hour business day.
That results in 6 hours (8 hours * 75%) of “idle time” when there may not be any transactions coming through the system.
Furthermore, the solution will not receive any data at all during the 16 non-business hours every day.

Total Idle time= 22 hours, there is dequeue work i.e GetMessage() called from Polling function that amounts

22 hrs X 60 min X 60 transaction/min – assuming polling at 1 second= 79,200 transaction/day

Cost of 100,000 transactions = $0.01

The storage transactions generated by a single dequeue thread in the above scenario will add approximately = 79,200 / 100,000 * $0.01 * 30 days = $0.238/ month for 1 queue listener in polling mode.

Architects will not plan for a single queue listener for the entire application and chances are number queue listeners will be high & there are going to different queues for different requirements. I’m assuming a total 200 queues used in an application with polling

200 queues X $0.238 $45. 720 per month - is the cost incurred when the solution was not performing any computations at all, just checking on the queues to see if any work items are available

Addressing The Polling Hell…

To address the polling hell following techniques can be used

Back off polling, a method to lessen the number of transactions in your queue and therefore reduce the bandwidth used. A good implementation can be found here http://www.wadewegner.com/2012/04/simple-capped-exponential-back-off-for-queues/
Triggering (push-based model): A listener subscribes to an event that is triggered (either by the publisher itself or by a queue service manager) whenever a message arrives on a queue. The listener in turn can initiate message processing thus not having to poll the queue in order to determine whether or not any new work is available. The implementation specifics of a Push Based Model is made easier with introduction of internal IP addresses for roles. An internal endpoint in the Windows Azure roles is essentially the internal IP address automatically assigned to a role instance by the Windows Azure fabric. This IP address along with a dynamically allocated port creates an endpoint that is only accessible from within a hosting datacenter with some further visibility restrictions. Once registered in the service configuration, the internal endpoint can be used for spinning off a WCF service host in order to make a communication contract accessible by the other role instances. A Publish Subscriber implementation based on this straightforward. The limitations of this approach are.

Note: Given that application is not a large scale application spreading across geo location the pub sub model can still be implemented using the above approach. The limitation hit hard in large scale geo distributed applications. In case we are to look at a large scale geo distributed application the idea would be go for service bus.

Look at Service Bus Queues as alternative after a complete cost analysis as the Pub Sub implementation on Service Bus is out of box.

Dynamic Scaling

Dynamic scaling is the technical capability of a given solution to adapt to fluctuating workloads by increasing and reducing working capacity and processing power at runtime. The Windows Azure platform natively supports dynamic scaling through the provisioning of a distributed computing infrastructure on which compute hours can be purchased as needed.

It is important to differentiate between the following 2 types of dynamic scaling on the Windows Azure platform:

Role instance scaling refers to adding and removing additional web or worker role instances to handle the point-in-time workload. This often includes changing the instance count in the service configuration. Increasing the instance count will cause Windows Azure runtime to start new instances whereas decreasing the instance count will in turn cause it to shut down running instances. It takes 10 minutes to add a new instance.

Process (thread) scaling refers to maintaining sufficient capacity in terms of processing threads in a given role instance by tuning the number of threads up and down depending on the current workload.

Dynamic scaling in a queue-based messaging solution would attract a combination of the following general recommendations:

Monitor key performance indicators including CPU utilization, queue depth, response times and message processing latency.
Dynamically increase or decrease the number of role instances to cope with the spikes in workload, either predictable or unpredictable.
Programmatically expand and trim down the number of processing threads to adapt to variable load conditions handled by a given role instance.
Partition and process fine-grained workloads concurrently using the Task Parallel Library in the .NET Framework 4.
Maintain a viable capacity in solutions with highly volatile workload in anticipation of sudden spikes to be able to handle them without the overhead of setting up additional instances.

Note: To implement a dynamic scaling capability, consider the use of the Microsoft Enterprise Library Autoscaling Application Block that enables automatic scaling behavior in the solutions running on Windows Azure. The Autoscaling Application Block provides all of the functionality needed to define and monitor autoscaling in a Windows Azure application. It covers the latency impact, storage transaction costs and dynamic scale requirements.

Additional Consideration for Queues

HTTP 503 Server Busy on Queue Operations

At present, the scalability target for a single Windows Azure queue is “constrained” at 500 transactions/sec. If an application attempts to exceed this target, for example, through performing queue operations from multiple role instance running hundreds of dequeue threads, it may result in HTTP 503 “Server Busy” response from the storage service. I have found Transient Fault Handling Application Block pretty handy in retry mechanism - http://msdn.microsoft.com/en-us/library/hh680905(v=pandp.50).aspx

Important References

Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity post on the Windows Azure Storage team blog.

Queue Read/Write Throughput study published by eXtreme Computing Group at Microsoft Research.

The Transient Fault Handling Framework for Azure Storage, Service Bus & Windows Azure SQL Database project on the MSDN Code Gallery.

The Autoscaling Application Block in the MSDN library.

Windows Azure Storage Transaction - Unveiling the Unforeseen Cost and Tips to Cost Effective Usage post on Wely Lau’s blog.

Saturday 25 August 2012

Windows Azure Service Bus- Messaging Features

The Service Bus is single most important component be it an Enterprise Integration scenario or a cloud (which by the way happens to be mass scale integration of massive number of applications). The expectation from Service Bus in the cloud are very many, when compared to an Enterprise scenario the Enterprise Service Bus does cater to a bare minimum of the following features

Messaging Services:
Management Services
Security Services
Metadata Services
Mediation Services
Interface Service

ESB is a messaging expert not to get into history of Traditional EAI, EAI broker, MOM architecture. Messaging is a feature which has seen significant improvement in past 2 decades in ESB. This post I’m specifically concentrating on Windows Azure Service Bus – Messaging capabilities and compare it with what is the standard ESB implementation. Before dwelling into the details of the Azure Service Bus Messaging setting the context on Messaging features on a standard ESB.

ESB – Messaging features – What to expect

The Message: A message is typically composed of 3 basic parts: the header, the properties and the message payload. The header is used by the messaging system and application developer to provide information such as the destination, reply to destination, message type & message expiration time. Properties section is generally a name-value pair. These properties are essentially a part of the message payload or body that get promoted to a special section of the message so that filtering can be applied to the message by consumer or specialized routers. The format of the message payload can vary across messaging implementation example plain text, binary or xml.

ESB is a messaging expert so that it can manage whatever type of messaging you can throw at it. The types of messaging which can potentially be exchanged in mid sized organization between different business and support application can be very many and cloud scale can be a very different playground. Obvious to the fact there will a standard set of messaging which will be supported by an ESB below are the following

Point to point Messaging : P2P messages can also be marked as persistent or non-persistent
Point to point request/response: Request/ Response Messaging Pattern for most ESB is synchronous , asynchronous in nature. Applications and services in fire and forget mode which allows an application to go about its business once a message is asynchronously delivered. A variant of this is the Reply Forward Pattern where by response of the message is send to another destination.
Broadcast message
Broadcast request/response
Publish subscribe: Pub Sub is self explanatory a common misconception regarding Pub Sub is lightweight compared to point to point. A pub sub message can be delivered just as reliably as a point to point message can. A message delivered on a point to point queue can be delivered with little additional overhead if it is not marked persistent. A reliable pub sub message is delivered using a combination of persistent message and durable subscriptions. When an application register to receiving message of a specific topic it can specify that the subscription is durable. A durable subscription will survive is the subscribing client fails. This means that if that intended receiver of a message becomes unavailable for any reason, the message server will continue to store the messages on behalf of the receiver until the receiver becomes available again.

Store and forward: ESB provides message queuing and guaranteed delivery semantics which ensure that “unavailable” application will get their data queued and delivered at a later time. The message delivery semantics can cover a range of options from exactly-once delivery to at-least once to at most once delivery. Message when marked as persistent will utilize store and forward mechanism.

In a ESB the concept of store and forward should be capable of being repeated across multiple servers that are chained together. In this scenario each message server uses store and forward and message acknowledgements to get the message to the next server in the chain. Each server to server handoff maintains minimum reliability and the QoS that are specified by the sender. It would be interesting to understand how Azure really manages this internally. MSFT has not given out the details on the same. This is were the idea of dynamic routing comes into play.

Transacted Messages an important aspect of messaging in simpler words “transactional messaging”. ESB is predominantly built around “loose coupled architecture”, introducing an idea of producers and consumers of message participate in one global transaction is defeating the purpose of purpose of loosely coupled architecture. What is effective in the ESB scenario is local transaction. The local transaction is in context of an individual sender or an individual receiver where multiple operations are grouped as a single transaction. An example is the grouping together of multiple messages in all or nothing fashion. The transaction follows the convention of separating send and receive operations. From a sender’s perspective the message are held by the message server until a commit command is issued in which case the messages are sent to the receiver. In case of a rollback the messages are discarded.

There are specific situation where sending or receiving of a local transaction with the update of another transactional resource, such as a database or transactional completion of workflow code. This typically involves an underlying transaction manager that takes care of coordinating the prepare commit or rollback operation, each resource participating in the transaction. ESB in general provides interfaces for accomplishing this, allowing a message producer or a consumer to participate in a transaction with any other resource that is compliant with the XOpen/ XA two phased commit transactional protocol. This ideally becomes a distributed transaction.

Having covered enough on standard ESB messaging dwelling into what Azure Service Bus has to offer is next.

Azure Service Bus Messaging

Azure Service Bus consist of a bare minimum of following features. The focus of this post is Service Bus Messaging

On July 16, 2012 Microsoft released the beta of Microsoft Service Bus 1.0 for Windows Server. This release has been tightly kept under wraps for several months and my team was fortunate enough to have the opportunity to evaluate the early bits and help shape this release. A separate blog post on the same will out soon.

The service bus server component mentioned above is clear replacement to MSMQ.

Azure Service Bus supports the following Messaging Patterns, not getting too overwhelmed with earlier discussion of messaging types , there is a direct comparison of the same towards the end of this post.

At a high level Azure Service supports the following types of Messaging Patterns

Relayed Messaging: Message Session Relay Protocol used in the computer networking world is a protocol for transmitting a series of related messages in the context of a communications session. MSRP messages can also be transmitted by using intermediaries. Relay Messaging Pattern is similar in many ways to MSRP . Service Bus in Windows Azure provides a highly load balanced relay service that supports a variety of transport protocols and WS standards. This includes SOAP, WS-* and even REST. The relay service supports the following messaging types

One way messaging
Request/ Response
Point to Point
Publish / Subscribe scenarios
Bidirectional socket communication for increased point to point efficiency.

In a relay messaging pattern an on premise service connects to the relay service through an outbound port and creates a bidirectional socket for communication tied to a particular rendezvous address. The client can then communicate to the on premise service by sending messages to the relay service targeting the rendezvous address. The relay service will relay messages to the on premise service through the bidirectional socket already in place. The client does not need a direct connection to the on premise service nor is it required to know where the service resides and the on premise service does not need any inbound ports open on the firewall. To support this at a code level the .NET framework WCF supports relay bindings. Relay Service require a server and client components to be online at the same time. So essentially the persistent and durable messaging is not something which the relay can outright support in its vanilla form. Looking the Jan 2012 release of Azure “it supported only relay messaging” which in my personal opinion was “half baked ESB messaging”. HTTP style communications in which the requests may not be typically long lived, the clients that connect only occasionally , such as browsers, mobile applications don’t fit the bill for relay messaging.

Does relay messaging only support synchronous behavior is something which needs more discussion?

July 2012 MSFT decided correct the mistake with introduction of brokered messaging.

Brokered Messaging

Brokered message is the asynchronous option for messaging or temporal decoupled. Producers(senders) and consumers(receivers) do not have to be online at the same time. The messaging infrastructure reliably stores messages until the consuming party is ready to receive. This allows the components of distributed applications to be disconnected, and connect whenever desired and download the messages. The core components of Service Bus brokered infrastructure are Queues, Topics, Subscription. These components enable new asynchronous messaging scenarios such s

Temporal decoupling
Publish/Subscribe.

Brokered Messaging essentially filled the gap for persistent durable messaging.

Service Bus Queues

Service Bus Queues are decoupled messaging construct. In the service bus they have the following characteristics

FIFO- delivery of messages to one of more consumers in a sequenced order.

Load leveling is a perceived benefit which is standard benefits of using a queue. Since the sender and receivers are decoupled the message sending and consumption strategies can be many offline receive. Fan out receivers in case of messages in the queue are too many.

Note: Rolling out more receivers instances on Windows Azure will take the order to 10 minutes.

At a feature level Queue has the following functionality

Receive and Delete- allows the options for the receivers to receive and message and later issue a delete.

PeekLock- receive operation is two stage which makes it possible to support application that cannot tolerate missing messages. When Service Bus receives the request it finds the next message to be consumed, locks it to prevent other consumer from receiving it and then returns it to the application. After the application is finishes processing the message it completes the second stage of receive process by calling Complete on the received message, this will mark the message as being consumed. In cases where the application is unable process the message it can call abandon the Service Bus will unlock the message and make it available to be received by other applications. Usually a timeout is associated with PeekLock beyond which the Service Bus unlocks the message.

In case of message being read and no Complete issued the Service Bus considers this as Abandon situation. Going back the standard implementation of Store and Forward it supports all At Least Once.

If the scenario cannot tolerate duplicate processing, then additional logic is required in the application to detect duplicates which can be achieved based upon the MessageId property of the message which will remain constant across delivery attempts. This is known as Exactly Once processing.

Topics & Subscription
A deliberate move to support Publish Subscribe in more structured manner topics had been introduced. In normal queue based communication we see a single sender and a single receivers, Topics and subscription provide one to many communication in a pure pub sub manner. Useful for scaling to very large numbers recipients, each published message is made available to each subscription registered within the topic. Messages are sent to a topic and delivered to one or more associated subscription depending on the filter rules that can be set on a per subscription basis. The subscription can use additional filters to restrict that they want to receive. Message are sent to topic the same manner as the queue apparently received from subscription.

While the topics receives all the messages , each subscription picks a subset of the messages based on the subscription need. There is still a requirement filter the messages coming down to the subscription. The volumes of the messages can be large so filters ideally give you another chance to apply a where cause and have more targeted messaging. The filter expression is where clause on one of the properties and based on Sql 92 standards example given below
namespaceManager.CreateSubscription(("Dashboard", new SqlFilter("StoreName = 'Store1'");
Important notes

Filters are SQL 92 expressions , Correlation filter & Tagging Filter
Support for 2000 rules per subscription.
Each matched rule yields a message copy

What is additional overhead of having subscription and filter from a compute standpoint is something which one needs to understand?

Partitioning is one more targeted messaging construct which allows an additional rule to the filter by which the incoming message can be logically sub divided.

Example below

Composite Patterns of Messaging on Service Bus
CQRS have written about this in one of earlier post , In relation to messaging it makes perfect sense what they call it in the Messaging World is “Update Read Separation”.

· Reads on partitioned stores
· All writes through messages
· Distribution via fan-out
· Trades timeliness and instant feedback for robustness and scale

Diagnostics and Statistics

In the cloud world diagnostics and statistics is pretty much at a reset with new tools and new challenges. If one were to use messaging in service bus “diagnostic” deserve a special mention where the messaging can be some assistance.

The strategy for this could be to have

Flow diagnostics events from backend services to the diagnostic queues.
Vary the TTL by the severity, verbose errors short lived, fatal error reports long lived.
Filter by severity or needs of different audience

Correlation Pattern

If there is need to set the reply paths between a sender and receiver. A sender needs to receive back a response on a different queue. The Sender sends in a correlation id with the Queue Name (Response Queue) where it wishes to receive the response. The receivers queue which receives the message gets picked up by an application which in turn post processing sends a responds to senders correlated information queue.

3 correlation models supported in Service Bus are

Message Correlation
Subscription Correlation
Session Correlation

N to 1 Correlation : This is a scenario where multiple Senders will send in the same correlation id or queue. What this ideally means multiple senders and the response to that needs to go back to a single response queue.

N to M Correlation : This is a scenario where multiple Senders will send in the different correlation id or queue. What this ideally means multiple senders and the response to that needs to go back to a multiple response queue.

Correlation in Service Bus

Message Correlation (Queues)

Originator sets Message or CorrelationId, Receiver copies to the reply
Reply sent to Originator owned Queue indicated by ReplyTo
Originator receives and dispatches on CorrelationId

Subscription Correlation (Topics)

Originator sets Message or CorrelationId, Receiver copies to the reply
Originator has Subscription on shared reply Topic w/ rule covering Id
Originator receives and dispatches on CorrelationId

Session Correlation

Originator sets some SessionId, on outbound session
Receivers reuses SessionId for reply session
Originator filters on known SessionId using session receiver.

Additional features

Local Transaction support exists in Service Bus Messaging
Message Scheduling
Dead Lettering
Duplicate Detection
Prefetching

Summary

In principle the Service Bus Messaging supports pretty much all messaging type via relay or brokered messaging. In addition to it has lot more. Next Post comparing Azure Queues to Azure Service Bus Queue.

Codebase for All Supported Messaging on Services Bus can be found at my github here – still working - https://github.com/ajayso/Azure-Service-Bus---Messaging-Samples.git