Sponsored Links
-->

Saturday, April 7, 2018

Apache Hadoop ZooKeeper - Chapter 1 Intro into ZooKeeper - YouTube
src: i.ytimg.com

Apache ZooKeeper is a software project of the Apache Software Foundation. It is essentially a centralized service for distributed systems to a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems. ZooKeeper was a sub-project of Hadoop but is now a top-level Apache project in its own right.

ZooKeeper's architecture supports high availability through redundant services. The clients can thus ask another ZooKeeper leader if the first fails to answer. ZooKeeper nodes store their data in a hierarchical name space, much like a file system or a tree data structure. Clients can read from and write to the nodes and in this way have a shared configuration service. ZooKeeper can be viewed as an atomic broadcast system, through which updates are totally ordered. The ZooKeeper Atomic Broadcast (ZAB) protocol is the core of the system.

ZooKeeper is used by companies including Rackspace, Yahoo!, Odnoklassniki, Reddit, NetApp SolidFire and eBay as well as open source enterprise search systems like Solr.

ZooKeeper was originally developed at Yahoo for streamlining the processes running on Big Data cluster by storing the status in local log files on the ZooKeeper servers. These servers communicate with the client machines to provide them the information. ZooKeeper was developed in order to fix the bugs occurred while deploying distributed big data applications. Some of the prime features of Apache ZooKeeper are:

  • Reliable System: This system is very reliable as it keeps working even if a node fails.
  • Simple Architecture: The architecture of ZooKeeper is quite simple as there is a shared hierarchical namespace which helps coordinating the processes.
  • Fast Processing: Zookeeper is especially fast in "read-dominant" workloads (i.e. workloads in which reads are much more common than writes).
  • Scalable: The performance of ZooKeeper can be improved by adding nodes.


Video Apache ZooKeeper



Apache ZooKeeper Architecture

The basic terminologies that you need to know before knowing the architecture are:

  • Node: The systems installed on the cluster
  • ZNode: The nodes where the status is updated by other nodes in cluster
  • Client Applications: The tools that interact with the distributed applications
  • Server Applications: Allows the client applications to interact using a common interface

The services in the cluster are replicated and stored on a couple of machines and each one maintains an image of in-memory data tree and transaction logs. A server is contacted to several client applications and they establish a TCP Link for sending and receiving requests and responses and monitoring the events as well.


Maps Apache ZooKeeper



Typical use cases

  • Naming service
  • Configuration management
  • Synchronization
  • Leader election
  • Message Queue
  • Notification system

Hadoop Component Zookeeper, Online Hadoop Course
src: s3.amazonaws.com


See also

  • Hadoop

2 Why We Need Apache Zookeeper | Problem and Solution - YouTube
src: i.ytimg.com


References


Apache ZooKeeper - LogicMonitor
src: www.logicmonitor.com


External links

  • Official website
  • Article in highscalability.com
  • Software Development Times article of ZooKeeper moving to Apache
  • Eclipse ECF Discovery based on Apache ZooKeeper

Source of the article : Wikipedia

Comments
0 Comments