Category: NoSQL Database – HBase, Kafka, Cassandra, Elasticsearch, Solr

What is Elasticsearch?

Elasticsearch is the most popular, open-source, cross-platform, distributed and scalable search-engine based on Lucene. It is written in Java and released under the terms of the Apache License. Elasticsearch is developed alongside Logstash and Kibana. Logstash is a data-collection and log-parsing engine while Kibana is an analytics and visualization platform. The three products combined are referred to as the Elastic Stack (formerly known as the ELK stack). They are..

Read more

How to setup Apache Hadoop/Apache HBase Cluster in Amazon Webservice(AWS) ?

In HBase, tables are split into regions and are served by the region servers. Regions are vertically divided by column families into “Stores”. Stores are saved as files in HDFS. Shown below is the architecture of HBase. Note: The term ‘store’ is used for regions to explain the storage structure.   HBase has three major components:..

Read more

Why do we need Hbase when we have the Hive?

Thanks for A2A. This is analogous to question “ Why do we need database when we have data warehouse” HBase is for OLTP and hive for OLAP Let me give you simple example : Have you ever edited /updated Facebook comments on particular post. This update is done in real time , that’s where HBase comes in to..

Read more

What is Elasticsearch used for?

In short, search and log analytics. Elastic has been implemented in lots of instances that require very fast large-scale search such as Stack Overflow, Soundcloud, Rijksmuseum in Amsterdam, Fog Creek (40 billion lines of code), Verizon Business (50 billion documents), etc. The ELK stack, (Elastic, Logstash, Kibana) is now developed enough to use it for..

Read more

Apache Solr – Overview

Solr is an open-source search platform which is used to build search applications. It was built on top of Lucene (full text search engine). Solr is enterprise-ready, fast and highly scalable. The applications built using Solr are sophisticated and deliver high performance. It was Yonik Seely who created Solr in 2004 in order to add search capabilities to the company website..

Read more

What is Apache Cassandra?

Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database. A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve..

Read more