Consider the following two relations, A and B.
A Pig JOIN statement that combined relations A by its first field and B by its second field would produce what output?
A. 2 Jim Chris 2 3 Terry 3 4 Brian 4
B. 2 cherry 2 cherry 3 orange 4 peach
C. 2 cherry Jim, Chris 3 orange Terry 4 peach Brian
D. 2 cherry Jim 2 2 cherry Chris 2 3 orange Terry 3 4 peach Brian 4
Review the following data and Pig code: What command to define B would produce the output (M,62,95l02) when invoking the DUMP operator on B?
A. B = FILTER A BY (zip = = '95102' AND gender = = M");
B. B= FOREACH A BY (gender = = 'M' AND zip = = '95102');
C. B = JOIN A BY (gender = = 'M' AND zip = = '95102');
D. B= GROUP A BY (zip = = '95102' AND gender = = 'M');
In Hadoop 2.2, which TWO of the following processes work together to provide automatic failover of the NameNode? Choose 2 answers
A. ZKFailoverController
B. ZooKeeper
C. QuorumManager
D. JournalNode
To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?
A. Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper.
B. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.
C. Place the data file in the DataCache and read the data into memory in the configure method of the mapper.
D. Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper.
What data does a Reducer reduce method process?
A. All the data in a single input file.
B. All data produced by a single mapper.
C. All data for a given key, regardless of which mapper(s) produced it.
D. All data for a given value, regardless of which mapper(s) produced it.
Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
E. mapred
Indentify which best defines a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.
You want to run Hadoop jobs on your development workstation for testing before you submit them to your production cluster. Which mode of operation in Hadoop allows you to most closely simulate a production cluster while using a single machine?
A. Run all the nodes in your production cluster as virtual machines on your development workstation.
B. Run the hadoop command with the -jt local and the -fs file:///options.
C. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine.
D. Run simldooop, the Apache open-source software for simulating Hadoop clusters.
What is the term for the process of moving map outputs to the reducers?
A. Reducing
B. Combining
C. Partitioning
D. Shuffling and sorting
Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?
A. ResourceManager
B. NodeManager
C. ApplicationMaster
D. ApplicationMasterService
E. TaskTracker
F. JobTracker