Cheat Sheet

recommended AMI / EC2 image

ami-0affd4508a5d2481b
https://aws.amazon.com/marketplace/pp/B00O7WM7QW

AWS regions valid for pbao.de users

pbao.de AWS users can only use these two regions

Only the two following zones can be used by pbao.de users:

  • us-east-1
  • us-east-2

Zookeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

https://zookeeper.apache.org/

HUE

Hue is an open-source SQL Assistant for querying Databases & Data Warehouses and collaborating. Its goal is to make self service data querying more widespread in organizations.
Hue is also present in the (Cloudera Data Platform and the Hadoop services of the cloud providers Amazon AWS, Google Cloud Platform, and Microsoft Azure.
Hue connects to any database or warehouse via native or SqlAlchemy connectors.

https://docs.gethue.com/administrator/configuration/connectors/

YARN

Yet Another Resource Negotiator
YARN is essentially a system for managing distributed applications. It consists of a central ResourceManager, which arbitrates all available cluster resources, and a per-node NodeManager, which takes direction from the ResourceManager and is responsible for managing resources available on a single node.

https://blog.cloudera.com/apache-hadoop-yarn-concepts-and-applications/

YARN ermöglicht es, die Ressourcen eines Clusters für verschiedene Jobs dynamisch zu verwalten. So ermöglicht es YARN, durch Queues die Zuteilung der Kapazitäten des Clusters an einzelne Jobs festzulegen. Neben CPU und Speicher wird ab Version 3.1.0 auch die Verwaltung von GPU- und FPGA-Ressourcen unterstützt, die vornehmlich für maschinelles Lernen relevant sind. Dies kann für Anwendungen und Benutzer konfiguriert werden.
https://de.wikipedia.org/wiki/Apache_Hadoop