Hadoop Interview Questions and Answers for beginners
Which type of hardware is best for Hadoop?
A Hadoop can run on dual processor or dual core machines with 4-8 GB RAM utilizing ECC memory. It depends on the workflow needs.
Which platform and Java version is required to run Hadoop?
The Java 1.6.x or higher version is best for Hadoop technology, preferably from the Sun. The Windows and Linux are the supported operating system for Hadoop, but BSD, Mac OS/X and Solaris and so on.
Who is invented by Hadoop?
Hadoop invented by Doug and Mike Cafarella.
How well does Hadoop scale?
A Hadoop has been considered in clusters of up to 4000 nodes. Sort performance on 900 nodes is best and developing using these non-default configuration values:
dfs.block.size = 839810502 dfs.namenode.handler.count = 50 mapred.reduce.parallel.copies = 40 mapred.child.java.opts = -Xmx512m fs.inmemory.size.mb = 300 io.sort.factor = 200 io.sort.mb = 300 io.file.buffer.size = 170441603
Mapred.job.tracker.handler.count = 70The sort performance on 1400nodes and 2000 nodes are good too-sorting 14TB of data on a 1400-node cluster takes 3.2 hours; sorting 20TB on a 2000 node cluster takes 3.5 hours. The post to the above configuration being:
Mapred.reduce.parallel.copies = 60 Tasktracker.http.threads = 60 Mapred.child.java.opts = -Xmx1024m
Suppose you have a Research Marketing and Finance teams funding 70%, 40% and 20% continuously of your Hadoop Cluster. How will you define only 70% of cluster resources to Research, 40% to Finance and 20% to Marketing during peak load?
Given the properties will be assigned in capacity-scheduler.xml
<property> <name>meraj.scheduler.capacity.root.queuse</name> <value>research, finance,marketing</value> </property> <property> <name>meraj.scheduler.capacity.research.capacity</name> <value>70</value> </property> <property> <name>meraj.scheduler.capacity.finance.capacity</name> <value>40</value> </property> <property> <name>meraj.scheduler.capacity.marketing.capacity</name> <value>20</value> </property>
What is the main benefit of using counters in Hadoop?
The counters are a useful for collecting statistics about the job.