Monday, April 27, 2015

Introduction to Windows Azure Automation and Azure Run Books

One of the cool thing which Microsoft introduced recently to Powershell or Automation community is Azure Automation which is nothing but infrastructure or framework required to develop and execute Powershell scripts (cmdlts). In a nutshell the we can define Azure automation as "Task Automation and Configuration Management Framework on Windows Azure Cloud" .

For what we need Azure Automation?
To automate the activities which we perform from management portal like Creating, upgrading, removing, monitoring, backing up resources on azure environment. With Azure Automation, now Dev/Ops team can run their windows power-shell workflows to integrate, orchestrate, and automate IT business processes.

What is Runbook?
A run book is Powershell cmdlts which can be scheduled and can be executed as a workflow. Just to give an example when you create a Runbook in Azure automation you will be following areas:
1>Author: this is the workflow powershell scripts which does some specific task such as taking backup of a database to which you can pass parameters.
2>Schedule: this is a scheduler which helps to run the powershell scripts which we created under Author on specific date and time.
3>Jobs: Are the scheduled tasks which are either completed or in queue or running or failed.

Make a note that, Runbooks in Azure Automation cannot access resources in your data center that are not accessible from the public cloud. They also have no way to access Orchestrator or SMA runbooks.

Azure Automation runbooks can access any external resources that can be accessed from a Windows PowerShell Workflow.

Following example will give fair idea about how to create and run a "Helo world" Runbook from azure automation gallery.

http://azure.microsoft.com/en-us/documentation/articles/automation-create-runbook-from-samples/


Sunday, January 4, 2015

The Anatomy of "No SQL"

Continuing from where i stopped about NO SQL in my previous post, in this article we will have a closer look at what options we have, when it comes to storing data for Big Data storage/processing. Well, the preference is obviously for NoSQL databases.

Defination of NoSQL:
the following five characters of NoSQL gives better understanding before the definition

"Non-Relational", "Open-Source", "Cluster-Friendly", "21st Century Web" and "Schema-Less"

NoSQL databases are a way of persisting data in non-relational way. Here, the data is no longer stored in rigid schemas of tables and columns distributed across various tables. Instead, related data is stored together in a fluid schema-less fashion. NoSQL databases tend to be schema-less (key-value stores) or have structured content, but without a formal schema (document stores).

How to Data Model in NOSQL:
Yet another complicated question asked by people worked on RDBMS technologies. in a broad sense NOSQL Data Modeling falls under four categories to choose from. This also helps us to choose the NO SQL under laying technologies.

Different Types of NoSQL Data Modeling
Broadly it has four types of Data Modeling
1>Document type - MongoDB CouchDB and Raven DB
Apache Cassandra, Google Big Table, HBase
2>Column family - Apache Cassandra, Google Big Table, HBase
3>Graph -Neo4j
4>Key-Value - Redis and Riak

In this article we will focus in detail about the key-value pair and document oriented database, as these are the most commonly used ones

Cassandra
Used by NetFlix, eBay, Twitter, Reddit and many others, is one of today’s most popular NoSQL-databases in use. According to the website, the largest known Cassandra setup involves over 300 TB of data on over 400 machines. Cassandra provides a scalable, high-availability data store with no single point of failure. Interestingly, Cassandra forgoes the widely used Master-Slave setup, in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master-server which, when faced with lots of requests or when breaking, would render all of its slaves useless. Any number of commodity servers can be grouped into a Cassandra cluster. There are only two ways to query, by key or by key-range.
Data Modeling in Cassandra
Data storage in Cassandra is row-oriented, meaning that all content of a row is serialized together on disk. Every row of columns has its unique key. Each row can hold up to 2 billion columns [²]. Furthermore, each row must fit onto a single server, because data is partitioned solely by row-key.
  • The following layout represents a row in a Column Family (CF):
Column Family
  • The following layout represents a row in a Super Column Family (SCF):
Super Column Family
  • The following layout represents a row in a Column Family with composite columns. Parts of a composite column are separated by ‘|’. Note that this is just a representation convention; Cassandra’s built-in composite type encodes differently, not using ‘|’. (Btw, this post doesn’t require you to have detailed knowledge of super columns and composite columns.)
Column Family with composite columns
Use cases
Now if we quickly discuss the use cases where you would use the Key Value kind of database is probably where you would only have a query based on the key. The database does not care what is stored as the value. The indexes are only on the key and you always retrieve and insert values as one big chunk of black box.
- See more at: http://blog.aditi.com/data/what-why-how-of-nosql-databases/#sthash.2szaRw8d.dpuf
Cassandra

Cassandra
Used by NetFlix, eBay, Twitter, Reddit and many others, is one of today’s most popular NoSQL-databases in use. According to the website, the largest known Cassandra setup involves over 300 TB of data on over 400 machines. Cassandra provides a scalable, high-availability data store with no single point of failure. Interestingly, Cassandra forgoes the widely used Master-Slave setup, in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master-server which, when faced with lots of requests or when breaking, would render all of its slaves useless. Any number of commodity servers can be grouped into a Cassandra cluster. There are only two ways to query, by key or by key-range.

Data Modeling in Cassandra Data storage in Cassandra is row-oriented, meaning that all content of a row is serialized together on disk. Every row of columns has its unique key. Each row can hold up to 2 billion columns [²]. Furthermore, each row must fit onto a single server, because data is partitioned solely by row-key.

 

MongoDB

This is a NoSQL database which supports the notion of documents. Documents are JSON structures, to be precise in case of MongoDB it is BSON (Binary equivalent of JSON).

Below is the terminology used in Mongo DB and its analogy with respect to normal RDBS:-

TABLE — > Collection

ROW — > Document

Primary Key — > _id

A sample document looks like the picture below, which is nothing but key value pairs, but unlike key-value database, here you can index and have a query individual key within the document.

{ “item”: “pencil”, “qty”: 500, “type”: “no.2″ }

For document stores, the structure and content of each “document” are independent of other documents in the same “collection”. Adding a field is usually a code change rather than a database change: new documents get an entry for the new field, while older documents are considered to have a null value for the non-existent field. Similarly, “removing” a field could mean that you simply stop referring to it in your code rather than going through the trouble of deleting it from each document (unless space is at a premium, and then you have the option of removing only those with the largest content). Contrast this to how an entire table must be changed to add or remove a column in a traditional row/column database.

Documents can also hold lists as well as other nested documents. Here’s a sample document from MongoDB (a post from a blog or other forum), represented as JSON:

{ _id : ObjectId(“4e77bb3b8a3e000000004f7a”), when : Date(“2011-09-19T02:10:11.3Z”), author : “alex”, title : “No Free Lunch”, text : “This is the text of the post. It could be very long.”, tags : [ “business”, “ramblings” ], votes : 5, voters : [ “jane”, “joe”, “spencer”, “phyllis”, “li” ], comments : [ { who : “jane”, when : Date(“2011-09-19T04:00:10.112Z”), comment : “I agree.” }, { who : “meghan”, when : Date(“2011-09-20T14:36:06.958Z”), comment : “You must be joking. etc etc …” } ] }

Note how “comments” is a list of nested documents with their own independent structure. Queries can “reach into” these documents from the outer document, for example to find posts that have comments by Jane, or posts with comments from a certain date range.

Some of the notable advanced features of MongoDB include, automatic master slave replication, auto sharing of data, very rich query language, supports 2nd level of indexes on documents ensuring efficient retrievals, in-built support for Map-Reduce. It also offers very fine grained control over the reliability and durability for someone who does not like the auto pilot mode.



Saturday, December 15, 2012

Difference between Windows Azure Service Bus Queue Messaging versus Service Bus Topic

Windows Azure Service Bus is ESB kind of solution which provides connectivity for disparate components within a distributed application. Azure Service Bus comes with two offerings, Service Bus Relay which enables clients and services to communicate independently of network address translation (NAT) and firewall obstacles via a cloud-hosted relay service and other one is Service Bus Brokered Messaging which provides robus messaging infrastructure that enables communication via durable queues.
 Let us move on to the orginal topic of what is Service Bus Queue and Service Bus Topic

Service Bus Queue:
 A Service Bus queue provides a first-in-first-out (FIFO) structure for transmitting messages between one (or more) message Producers and one or more message receivers (Clients). Figure 2 shows a single Producer transmitting a number of messages to a single queue. Each message is retrieved at most once, by a single receiver -- shown as Client 1 getting Msg 1, Client 2 getting Msg 2, and so on. The messages themselves contain both headers (also called message properties) and a body, where the body is simply a UTF-8 encoded byte array.

Service Bus Topic:


Topics are heavy duty big brothers of Queues. They recieve messages from producers just as queues do; where topics differ is in how clients receive messages from them. Instead of receiving messages directly from the topic, clients receive messages from something akin to a virtual queue (or -- for readers with database-inclined minds -- a "view" on top of the topic) known as a subscription. Just as for a queue, a single subscription can have multiple clients retrieving messages from it, as shown in Figure 2. In addition, subscriptions support both Retrieve and Delete as well as the Peek/Lock and Complete retrieval patterns.

Your comments are most welcome.

Friday, August 3, 2012

Read and Write on Restful Services using Atompub



The Atom Publishing Protocol is the most popular model for read/write RESTful services. If you have developed WCF Data Services then, it supports oData which is nothing but Atompub support. It is used for Publishing and editing web resources using HTTP and XML 1.0. The protocol supports the creation of Web Resources and provides facilities for:
  • Collections: Sets of Resources, which can be retrieved in whole or in part.
  • Services: Discovery and description of Collections.
  • Editing: Creating, editing, and deleting Resources.
Protocol Model
The Atom Protocol specifies operations for publishing and editing Resources using HTTP. It uses Atom-formatted representations to describe the state and metadata of those Resources. It defines how Collections of Resources can be organized, and it specifies formats to support their discovery, grouping and categorization.

There are many popular Atompub protocol supported extension are availale in market out of which GDATA by google and oData by Microsoft stand tall.

Many of Google’s web sites expose an APP extension called GData (
http://code.google.com/apis/gdata/). Blogger, Google Calendar, Google Notebook, and other web applications also expose resources that conform to the GData protocol. Restful Services developed using Microsoft WCF data servcies by default support ATOMpub extension called oDATA which can feed applications like Excel, Silverlight, Lightswitch and other microsoft tools. 

Microsoft lightswitch, Business Centric Application development tool

Lightswitch is another tool under the umbrella of Visual studio used for building rapid business applications. In short, it automatically generates the user interface (Silverlight) for a data source without you having to write a single line of code.

Why Lightswitch?
  1. Lightswitch is a pefect way of creating MVVM pattern which seperates the presentation and business logic.
  2. Creating standard or complicated forms with cutom validations or custom rules which can be handled by desinger.
  3. It provides the ability to write more complex business logic for various scenaios.
Business Scenarios

Visual Studio Lightswitch comes with pre-defined Form Templates and Data Object Templates, which means you can literally generate UI Form with basic validations for any entities. It is just a matter of Drag and Drop.

If you are looking for more advanced customized UI Forms, advanced validations for Data Objects and more advanced features then you have to create Custom Lightswitch Templates, in fact, most of the time you end up working on creating these custom templates which can reused at organization level having cusotmized look and feel.

you can download and get more information about creating sample application and few more advanced topics on Lighswitch here http://msdn.microsoft.com/en-us/library/ff851953

Sunday, July 15, 2012

The Rise of "No SQL"

One of the most discussed topic every architect participates now a days is on "No SQL", the views expressed during these discussion remains in favor of using RDMS and still Big Data remains somewhat technology which is unclear for many of them. In fact, i had similar argument with my team and it took my hard effort to convince them regarding advantages of using Big Data.

Computing Layer(Application Layer) is getting transformed, why not Database technology?
Application Computing Layer has changed in fundamental ways over the last 10 years, in fact this transformation has happened so rapidly from Mainframe systems to Desktop applications to Web Technologies to current Mobile Application trend. One of the main reason for this rapid transformation is growing online business needs and million and million users moving towards web and mobile way. In fact A modern Web application can support millions of concurrent users by spreading load across a collection of
application servers behind a load balancer. Changes in application behavior can be rolled out
incrementally without requiring application downtime by gradually replacing the software
on individual servers. Adjustments to application capacity are easily made by changing the
number of application servers.

Now comes Database technology which has not kept race. This age old "Scale Up" technology still in widespread use today, was optimized for the applications, users and infrastructure of that era. Because it is a technology designed for the centralized computing model, to handle more users one must get a bigger server (increasing CPU, memory and I/O capacity) Big servers tend to be highly complex, proprietary, and disproportionately expensive pieces of engineered machinery, unlike the low-cost, commodity hardware typically deployed in Web- and cloud-based architectures. And, ultimately, there is a limit to how big a server one can purchase, even given an unlimited willingness and ability to pay.

Upgrading a server is an exercise that requires planning, acquisition and application
downtime to complete. Given the relatively unpredictable user growth rate of modern
software systems, inevitably there is either over- or under-provisioning of resources. Too
much and you’ve overspent, too little and users can have a bad application experience or
the application can outright fail. And with all the eggs in a single basket, fault tolerance and
high-availability strategies are critically important to get right.

Also, not to forget how rigid the Database schema is and how difficult it is to change the schema after inserting records.Want to start capturing new information you didn’t previously consider? Want to make
rapid changes to application behavior requiring changes to data formats and content? With RDBMS technology, changes like these are extremely disruptive and therefore are frequently avoided – the opposite behavior desired in a rapidly evolving business and market environment.

Some ways to fool around saying Still RDMS works!
In an effort to argue saying still RDMS works when used with current application layer, here goes few tactics.

Sharding
One of the technique where we spit data across the servers by doing horizontal portioning. For example, we will store 1 lakh records related to users who belong to india in server 1 and remaining records (rows) in the server 2. so when ever there is a need to fetch records which belong to india, get it from server 1.

Well, this approach has serious problems when it comes to joins and normalization techniques. Also, when You have to create and maintain a schema on every server. If you have new information you want to collect, you must modify the database schema on every server, then normalize, retune and rebuild the tables. What was hard with one server is a nightmare across many. For this reason, the default behavior is to minimize the collection of new information.

Denormalizing
A normalized database farm is hard to implement. That is, if you are planning to "Scale Out" then it is highly impossible to achieve this on normalized database which also results in lot of concurrency issues.

To support concurrency and sharding, data is frequently stored in a denormalized form when
an RDBMS is used behind Web applications. This approach potentially duplicates data in the
database, requiring updates to multiple tables when a duplicated data item is changed, but it
reduces the amount of locking required and thus improves concurrency.

Now denormalizing a data base defeats the purpose of being RDBMS.

Distributed caching
Another tactic used to extend the useful scope of RDBMS technology has been to employ
distributed caching technologies, such as Memory Cache.

Although performance wise this technique works well, it falls flat on Cost wise. Also, for me this looks like another tier to manage.

Now comes the rise of  "No SQL"
The techniques used to extend the useful scope of RDBMS technology fight symptoms but not the disease itself. Sharding, denormalizing, distributed caching and other tactics all attempt to paper over one simple fact: RDBMS technology is a forced fit for modern interactive software systems. Already technology giants like Google, Facebook, Amazon etc are moving away from RDBMS. Also, with windows Azure Table Storage, Microsoft is also serious about Big Data.

Although implementation differs in a big way compared to using RDMS, NoSQL Database management system offers these common set of characteristics:

1>No Schema: Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.

2>Auto-sharding (also called as “elasticity”)A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer
without application downtime, with data (and I/O) automatically spread across the servers.

3>Distributed query support. “Sharding” an RDBMS can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power.

4>Integrated caching. To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the separate team.

What more, it is also available at free of cost under open source license.
Unlike Google, Amazon and Microsoft, A number of commercial and open source database technologies such as Couchbase (a database combining the leading NoSQL data management technologies CouchDB, Membase and Memcached), MongoDB, Cassandra, Riak and others are now available and increasingly
represent the “go to” data management technology behind new interactive Web applications.


Saturday, June 30, 2012

Working with Custom Identity Provider (Intuit OpenId Provider) for ACS

Before we get into the details about various ways of adding custom Identity Provider into Azure ACS, Let me give some background about OpenId and Intuit OpenId Provider Details. I will be using Intuit OpenId Provider for this demonstration.

What is OpenId?
OpenID is a decentralized authentication protocol that makes it easy for people to sign up and access web accounts, in a simple sentence it is an open standard for logging onto various web accounts with a single digital identity. 

Intuit OpenId Provider
Intuit uses OpenId Standars for Authenticating user to access any of their services across all their web sites. In fact, this can be used by any web site including Intuit partners websites to access this user information as this is decentralized and doesn't require any approval from any authority. User can freely choose his choice of provider to access any website including Intuit Provider.
Intuit Provider URL: https://openid.intuit.com/OpenId/Provider

Now let us come back to the original topic on working with Custom Identity Provider for ACS. By default Azure ACS provides Google and MS Live Id providers as Identity providers which can be used majority of the time for learning and demonstration purpose and for all the practical scenarios users has to add their own using ACS Management Service. The management service makes it possible for you to manage ACS programmatically, in fact after getting the Management Service Namespace and Access Certificates you hardly need ACS portal for any management, You can manage everything through a simple Console application.

Now let us see how to add Intuit Identity Provider using Management Service. Before we start, you need to note the ACS Management Service Namespace and Access Certificate details. By default Azure ACS gives you "ManagementClient" as Service Identity Name and also Identity Key which is generated.

Having above details from ACS portal, you can add Custom Identity Provider in two Ways:
1>Using Powershell Scripts (Which is best approach on any given Day) and
2>Pragmatically (by Writing a small C# code and executing it on your local box).

1>Using Powershell Scripts
This is very simple and easy approach, just 2 to 3 Powershell commands and we are done with adding Custom Identity Provider, Adding Identity Rule, Adding Relying Party etc.

Following command will add Intuit Identity Provider into ACS

PS C:\Users\Manj\Desktop> Add-IdentityProvider –Namespace “intuitopenid” –ManagementKey “XXXXXXXX” -Type "Manual" -Name "Intuit" -Protocol OpenId –SignInAddress “https://openid.intuit.com/OpenId/Provider"

Management key is the important one here which I first obtain with Get-AcsManagementToken, assigned it to a variable and passed it along for all subsequent commands.

 Next step will create rules that will add some claims, Other wise, ACS won’t even send a token back. Luckily, that’s just another line of PowerShell code:

 PS C:\Users\Manj\Desktop> Add-Rule -MgmtToken $mgmtToken -GroupName "Default Rule Group for Intuit" -IdentityProviderName "IntuitOpenID"

Thats it, we are ready to use ACS as Identity Federation website for Intuit OpenID Provider.

2> Programatically  (Create a Sample Console Application and write some code)
Now this has 4 steps , First one to collect the Configuration settings,  Collect details about Management service (Web Service) End Points, Implement Management Service Client and Create Open Id provider. Looks Simple right, let us see each step in detail.

Step 1>
change the App.Config file to have the following settings in your console Application.

<appSettings>
    <add key="ServiceNamespace" value="intuitopenid"/>
    <add key="AcsHostUrl" value="accesscontrol.windows.net"/>
    <add key="AcsManagementServicesRelativeUrl" value="v2/mgmt/service/"/>
    <add key="ManagementServiceIdentityName" value="ManagementClient"/>
    <add key="ManagementServiceIdentityKey" value="AddyourKeyhere"/>
    <add key="RelyingPartyName" value="Intuit Provider Sample Application"/>
    <add key="RelyingPartyRealm" value="http://localhost:62000/"/>
    <add key="ReplyTo" value="http://localhost:62000/"/>
    <add key="RuleGroupName" value="Default rule group for ASPNET Simple Forms Sample"/>
    <add key="IntuitProviderName" value="Intuit"/>
    <add key="OpenIDProviderURL" value="https://openid.intuit.com/OpenId/Provider"/>
  </appSettings>

Step 2>
Add a reference into System.Web.Extensions and Service reference into Management Service, remember the ACS Management Service URL, this should look similar to one which is in your configuration file like
https://intuitopenid.accesscontrol.windows.net/v2/mgmt/service

now your Code file should be having this declaration:
Using System.Web
.......
Using ConsoleApplication1.ServiceReference1

 Step 3>
 We need to Implement the Management Service client, following code will do this.
/// <summary>
        /// Creates and returns a ManagementService object.
        /// </summary>
        /// <returns>An instance of the ManagementService.</returns>
        internal static ManagementService CreateManagementServiceClient()
        {
            serviceNamespace = ConfigurationManager.AppSettings["ServiceNamespace"];
            acsHostUrl = ConfigurationManager.AppSettings["AcsHostUrl"];
            acsManagementServicesRelativeUrl = ConfigurationManager.AppSettings["AcsManagementServicesRelativeUrl"];
            managementServiceIdentityName = ConfigurationManager.AppSettings["ManagementServiceIdentityName"];
            managementServiceIdentityKey = ConfigurationManager.AppSettings["ManagementServiceIdentityKey"];
            string managementServiceEndpoint = String.Format(CultureInfo.InvariantCulture, "https://{0}.{1}/{2}", serviceNamespace, acsHostUrl, acsManagementServicesRelativeUrl);
            ManagementService managementService = new ManagementService(new Uri(managementServiceEndpoint));
            managementService.SendingRequest += GetTokenWithWritePermission;
            return managementService;
        }

Step 4>
 Add Piece of code to create Intuit Identity Provider to ACS and Create Rules.

string relyingPartyName = ConfigurationManager.AppSettings["RelyingPartyName"];
       string relyingPartyRealm = ConfigurationManager.AppSettings["RelyingPartyRealm"];
       string replyTo = ConfigurationManager.AppSettings["ReplyTo"];
       string intuitProviderName = ConfigurationManager.AppSettings["IntuitProviderName"];
       string openIDProviderURL = ConfigurationManager.AppSettings["OpenIDProviderURL"];
       Uri openIDProviderUri = new Uri(openIDProviderURL);
        string ruleGroupName = ConfigurationManager.AppSettings["RuleGroupName"];
        ManagementService svc = AcsManagementServiceHelper.CreateManagementServiceClient();
        svc.DeleteRelyingPartyByRealmIfExists(relyingPartyRealm);
        svc.DeleteRuleGroupByNameIfExists(ruleGroupName);
        svc.DeleteIdentityProviderIfExists(intuitProviderName);
        svc.SaveChangesBatch();
        IdentityProvider intuit = svc.CreateOpenIdIdentityProvider(intuitProviderName, openIDProviderUri);
        IdentityProvider[] associatedProviders = new[] { intuit };

        // Create the relying party. In this case, the Realm and the ReplyTo are the same address.
        RelyingParty relyingParty = svc.CreateRelyingParty(relyingPartyName, relyingPartyRealm, replyTo, RelyingPartyTokenType.SAML_2_0, false);
        svc.AssociateIdentityProvidersWithRelyingParties(associatedProviders, new[] { relyingParty });
        RuleGroup ruleGroup = svc.CreateRuleGroup(ruleGroupName);
        svc.AssignRuleGroupToRelyingParty(ruleGroup, relyingParty);

        // Create simple rules to pass through all claims from each issuer.
        foreach (IdentityProvider identityProvider in associatedProviders)
        {
            string ruleDescription = String.Format(CultureInfo.InvariantCulture, "Pass through all claims from '{0}'", identityProvider.Issuer.Name);
            svc.CreateRule(identityProvider.Issuer, null, null, null, null, ruleGroup, ruleDescription);
        }

        svc.SaveChangesBatch();
        Console.WriteLine("Intuit Provider Successfully Configured. Press ENTER to continue ...");
        Console.ReadLine();

This should add IntuitIdentityProvider into ACS now, which is ready to be used by Relaying Parties in their websites.

Now Intuit Identity Provider is ready to be consumed. In my next blog, i will demonstrate how to use this using Node.Js website.