Q2. What does the statement "HDFS is block structured file system" means It means that in HDFS individual files are broken into blocks of a fixed size. These blocks are stored across a cluster of one or more machines with data storage capacity
Q3. What does the term "Replication factor" mean Replication factor is the number of times a file needs to be replicated in HDFS
Q4. What is the default replication factor in HDFS 3
Q5. What is the typical block size of an HDFS block 64Mb to 128Mb
Q6. What is the benefit of having such big block size (when compared to block size of linux file system like ext) It allows HDFS to decrease the amount of metadata storage required per file (the list of blocks per file will be smaller as the size of individual blocks increases). Furthermore, it allows for fast streaming reads of data, by keeping large amounts of data sequentially laid out on the disk
Q7. Why is it recommended to have few very large files instead of a lot of small files in HDFS This is because the Name node contains the meta data of each and every file in HDFS and more files means more metadata and since namenode loads all the metadata in memory for speed hence having a lot of files may make the metadata information big enough to exceed the size of the memory on the Name node
Q8. True/false question. What is the lowest granularity at which you can apply replication factor in HDSF
- You can choose replication factor per directory
- You can choose replication factor per file in a directory
- You can choose replication factor per block of a file
- True
- True
- False
Q9. What is a datanode in HDFS ndividual machines in the HDFS cluster that hold blocks of data are called datanodes
Q10. What is a Namenode in HDSF The Namenode stores all the metadata for the file system
Q11. What alternate way does HDFS provides to recover data in case a Namenode, without backup, fails and cannot be recovered There is no way. If Namenode dies and there is no backup then there is no way to recover data
Q12. Describe how a HDFS client will read a file in HDFS, like will it talk to data node or namenode ... how will data flow etc To open a file, a client contacts the Name Node and retrieves a list of locations for the blocks that comprise the file. These locations identify the Data Nodes which hold each block. Clients then read file data directly from the Data Node servers, possibly in parallel. The Name Node is not directly involved in this bulk data transfer, keeping its overhead to a minimum.
Q13. Using linux command line. how will you
- List the the number of files in a HDFS directory
- Create a directory in HDFS
- Copy file from your local directory to HDSF
hadoop fs -ls hadoop fs -mkdir hadoop fs -put localfile hdfsfile
No comments:
Post a Comment