Sunday, July 15, 2012

The Rise of "No SQL"

One of the most discussed topic every architect participates now a days is on "No SQL", the views expressed during these discussion remains in favor of using RDMS and still Big Data remains somewhat technology which is unclear for many of them. In fact, i had similar argument with my team and it took my hard effort to convince them regarding advantages of using Big Data.

Computing Layer(Application Layer) is getting transformed, why not Database technology?
Application Computing Layer has changed in fundamental ways over the last 10 years, in fact this transformation has happened so rapidly from Mainframe systems to Desktop applications to Web Technologies to current Mobile Application trend. One of the main reason for this rapid transformation is growing online business needs and million and million users moving towards web and mobile way. In fact A modern Web application can support millions of concurrent users by spreading load across a collection of
application servers behind a load balancer. Changes in application behavior can be rolled out
incrementally without requiring application downtime by gradually replacing the software
on individual servers. Adjustments to application capacity are easily made by changing the
number of application servers.

Now comes Database technology which has not kept race. This age old "Scale Up" technology still in widespread use today, was optimized for the applications, users and infrastructure of that era. Because it is a technology designed for the centralized computing model, to handle more users one must get a bigger server (increasing CPU, memory and I/O capacity) Big servers tend to be highly complex, proprietary, and disproportionately expensive pieces of engineered machinery, unlike the low-cost, commodity hardware typically deployed in Web- and cloud-based architectures. And, ultimately, there is a limit to how big a server one can purchase, even given an unlimited willingness and ability to pay.

Upgrading a server is an exercise that requires planning, acquisition and application
downtime to complete. Given the relatively unpredictable user growth rate of modern
software systems, inevitably there is either over- or under-provisioning of resources. Too
much and you’ve overspent, too little and users can have a bad application experience or
the application can outright fail. And with all the eggs in a single basket, fault tolerance and
high-availability strategies are critically important to get right.

Also, not to forget how rigid the Database schema is and how difficult it is to change the schema after inserting records.Want to start capturing new information you didn’t previously consider? Want to make
rapid changes to application behavior requiring changes to data formats and content? With RDBMS technology, changes like these are extremely disruptive and therefore are frequently avoided – the opposite behavior desired in a rapidly evolving business and market environment.

Some ways to fool around saying Still RDMS works!
In an effort to argue saying still RDMS works when used with current application layer, here goes few tactics.

Sharding
One of the technique where we spit data across the servers by doing horizontal portioning. For example, we will store 1 lakh records related to users who belong to india in server 1 and remaining records (rows) in the server 2. so when ever there is a need to fetch records which belong to india, get it from server 1.

Well, this approach has serious problems when it comes to joins and normalization techniques. Also, when You have to create and maintain a schema on every server. If you have new information you want to collect, you must modify the database schema on every server, then normalize, retune and rebuild the tables. What was hard with one server is a nightmare across many. For this reason, the default behavior is to minimize the collection of new information.

Denormalizing
A normalized database farm is hard to implement. That is, if you are planning to "Scale Out" then it is highly impossible to achieve this on normalized database which also results in lot of concurrency issues.

To support concurrency and sharding, data is frequently stored in a denormalized form when
an RDBMS is used behind Web applications. This approach potentially duplicates data in the
database, requiring updates to multiple tables when a duplicated data item is changed, but it
reduces the amount of locking required and thus improves concurrency.

Now denormalizing a data base defeats the purpose of being RDBMS.

Distributed caching
Another tactic used to extend the useful scope of RDBMS technology has been to employ
distributed caching technologies, such as Memory Cache.

Although performance wise this technique works well, it falls flat on Cost wise. Also, for me this looks like another tier to manage.

Now comes the rise of  "No SQL"
The techniques used to extend the useful scope of RDBMS technology fight symptoms but not the disease itself. Sharding, denormalizing, distributed caching and other tactics all attempt to paper over one simple fact: RDBMS technology is a forced fit for modern interactive software systems. Already technology giants like Google, Facebook, Amazon etc are moving away from RDBMS. Also, with windows Azure Table Storage, Microsoft is also serious about Big Data.

Although implementation differs in a big way compared to using RDMS, NoSQL Database management system offers these common set of characteristics:

1>No Schema: Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.

2>Auto-sharding (also called as “elasticity”)A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer
without application downtime, with data (and I/O) automatically spread across the servers.

3>Distributed query support. “Sharding” an RDBMS can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power.

4>Integrated caching. To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the separate team.

What more, it is also available at free of cost under open source license.
Unlike Google, Amazon and Microsoft, A number of commercial and open source database technologies such as Couchbase (a database combining the leading NoSQL data management technologies CouchDB, Membase and Memcached), MongoDB, Cassandra, Riak and others are now available and increasingly
represent the “go to” data management technology behind new interactive Web applications.


No comments:

Post a Comment