Hadoop Architecture - YARN and MapReduce
Last modified : 20 November, 2017
Here is a non-exhaustive list of things of things about Hadoop YARN and MapReduce that I’ve found useful to know about:

Here’s a link to the original HadoopArchitectureYARN.dia file in case you would like to modify it. GPLv3 licensed.
Architecture of YARN
- ClientRMService (Github), NMClientImpl (Github) (ContainerLaunchContext), ResourceTrackerService (Github)
- NodeHealthChecker
- ApplicationMaster
- ResourceRequests
- How an application starts
- ContainersMonitorImpl
- Log Aggregation (Hortonworks Blog)
- DeletionService
- Container Executor
- Distributed Cache + Localization: Public / Private / User . ResourceLocalizationService
- State Machine (Ravi’s Blog)
- AsyncDispatcher
- CapacityScheduler (Hadoop doc) : ParentQueue (Github), LeafQueue (Github)
- FairScheduler
- Reservation . Please be aware that the term is overloaded
- NodeLabels
- TimelineServer
- RMStateStore, NMStateStore
- AMRecovery
Architecture of MapReduce
- MRv1 MRv2
- Intermediate data
- Shuffle, sort.
- AuxiliaryService
- Counters
Project split unplit
All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.