Hadoop Architecture - HDFS
Last modified : 20 November, 2017
Here is a non-exhaustive list of things of things about HDFS that I’ve found useful to know about:

Here’s a link to the original HadoopArchitectureHDFS.dia file in case you would like to modify it. GPLv3 licensed.
Architecture of HDFS
- Namenode fsimage (Github:FSImage.java), edit log (Github:FSEditLog.java) : Hadoop doc: The Persistence of File System Metadata
- Heartbeats (Github:NameNodeRpcServer.java)
- Block reports (Github:NameNodeRpcServer.java), full, IBRs (Apache JIRA HDFS-395)
- FSNameSystem (Github:FSNamesystem.java)
- BlockManager (Github:BlockManager.java)
- Secondary Namenode (Hadoop doc: Secondary Namenode)
- How does a file write happen: create (Github:NameNodeRpcServer::create()) , addBlock (Github:NameNodeRpcServer::addBlock()), complete (Github:NameNodeRpcServer::complete())
- Recovery (Yongjun’s great Blog)
- Write pipeline (Github:DataStreamer.java) . DataXceiver (Github: DataXceiver.java)
- Block vs Replica
- Leases, LeaseManager (Github: LeaseManager.java)
- Federation (Hadoop docs: ViewFs) + Block Pools (Github:ViewFileSystem.java)
- hflush and hsync : Mailing List
- HA (High-Availability). QJM (Quorum Journal Manager). (Hadoop docs: HA with QJM)
- Erasure Coding (Hadoop doc: HDFSErasureCoding)
All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.