Comparison of Popular NoSql databases (MongoDb,CouchDb,Hbase,Neo4j,Cassandra)

There are many SQL databases so far.But i personally feel the 15 years history of SQL coming to an end as everyone is moving to an era of BigData. As experts say SQL databases are not a best fit for Big Data No Sql databases came into picture as a best fit for this which provides more flexibility in storing data.
I just want to compare few popular NoSql databases that are available at this point of time.Few well known NoSql databases are

NoSql databases differ each other more than the way Sql databases differ from each other.I think its one's responsibility to choose the appropriate NoSql database for their application based on their use case.Lets do a quick comparison of these databases.

MongoDb

Written in : c++
Main point : Retains some friendly properties of SQL (Query, Index)
Licence : AGPL(Drivers : Apache)
Protocol : BSON (Binary JSON)
Replication : Master/Slave Replication and automatic failover via Replica Sets
Sharding : Built-in
Queries are javascript expressions.
Runs arbitary javascript function server side.
Better Update-in-place than CouchDb.
Uses memory mapped files for data storage.
Performance over features.
Journaling (with --journal ) option turned on starting th mongod server.
Has Geospatial Indexing.
On 32-bit systems limited to 2.5GB.
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Cassandra

Written in: Java
Main point: Best of BigTable and Dynamo
License: Apache
Protocol: Custom, binary (Thrift)
Tunable trade-offs for distribution and replication (N, R, W)
Querying by column, range of keys
BigTable-like features: columns, column families
Has secondary indices
Writes are much faster than reads (!)
Map/reduce possible with Apache Hadoop
All nodes are similar, as opposed to Hadoop/HBase
Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")
For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.

HBase

Written in: Java
Main point: Billions of rows X millions of columns
License: Apache
Protocol: HTTP/REST (also Thrift)
Modeled after Google's BigTable
Uses Hadoop's HDFS as storage
Map/reduce with Hadoop
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
HTTP supports XML, Protobuf, and binary
Cascading, hive, and pig source and sink modules
Jruby-based (JIRB) shell
Rolling restart for configuration changes and minor upgrades
Random access performance is like MySQL
A cluster consists of several different types of nodes
Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.
For example: Analysing log data.

CouchDB

Written in: Erlang
Main point: DB consistency, ease of use
License: Apache
Protocol: HTTP/REST
Bi-directional (!) replication,
continuous or ad-hoc,
with conflict detection,
thus, master-master replication. (!)
MVCC - write operations do not block reads
Previous versions of documents are available
Crash-only (reliable) design
Needs compacting from time to time
Views: embedded map/reduce
Formatting views: lists & shows
Server-side document validation possible
Authentication possible
Real-time updates via _changes (!)
Attachment handling
thus, CouchApps (standalone js apps)
jQuery library included
Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

Neo4j

Written in: Java
Main point: Graph database - connected data
License: GPL, some features AGPL/commercial
Protocol: HTTP/REST (or embedding in Java)
Standalone, or embeddable into Java applications
Full ACID conformity (including durable data)
Both nodes and relationships can have metadata
Integrated pattern-matching-based query language ("Cypher")
Also the "Gremlin" graph traversal language can be used
Indexing of nodes and relationships
Nice self-contained web admin
Advanced path-finding with multiple algorithms
Indexing of keys and relationships
Optimized for reads
Has transactions (in the Java API)
Scriptable in Groovy
Online backup, advanced monitoring and High Availability is AGPL/commercial licensed
Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.
For example: Social relations, public transport links, road maps, network topologies

Reffered Sources : kkovacs , wikipedia

Noooooo...SQL ...!!

Search This Blog

Comparison of Popular NoSql databases (MongoDb,CouchDb,Hbase,Neo4j,Cassandra)

Labels

Comments

Post a Comment

Popular posts from this blog

How MongoDB survives From SQL or Query Injection

Three Database Revolutions

GraphDatabase - The future for Facebook Recommendations