Installing the Hadoop Client Libraries

Hadoop client libraries are required for processing the Hadoop-related DataMover, Hive, MapReduce, and Sqoop jobs.

Note: As of TA 6.3, Hadoop libraries are not included with TA. Instead, we provide a Maven script (POM.xml) to install the required libraries.

If you do not already have Maven, you must download and install it. Obtain the POM.xml file from the folder/directory named "Hadoop" in the CD and run the file script to download the required Hadoop client libraries.

Note: The instructions here are for Windows.

Installing Maven

Note: If you do not have Maven installed, follow the instructions below.

Maven Prerequisites

  • JDK is installed.

  • JAVA_HOME environment variable is set and pointed to your JDK.

To download and install Maven:

  1. Download maven 3 or above from https://maven.apache.org/download.cgi.

  2. Unzip apache-maven-<3 or above>-bin.zip.

  3. Add the bin directory of the created directory (for example, apache-maven-3.3.9) to the PATH environment variable

  4. Confirm a successful Maven installation by running the mvn -v command in a new shell.

Downloading the Hadoop Client Library

With Maven installed, you can now download the Hadoop client library.

Maven scripts (POM.xml) are provided for these distributions of Hadoop:

Hadoop Distribution Type

Versions

Cloudera

CDH5

Hortonworks

HDP 2.4.x

MapR

5.1.0

Note: Refer the TA Compatibility Matrix for the most current version information.

To download and install the Hadoop client library:

  1. Download the POM.zip file. This file is provided in the /Hadoop directory in the TA 6.3 distribution package.

  2. Unzip the POM.zip.

    The POM xml files needed by Maven are saved in the directory structure shown here:

  3. Open a Windows command prompt and navigate to the directory for the Hadoop distribution in which you are interested.

    Example: Navigate to the CDH directory if you want to download Hadoop client libraries for Cloudera.

  4. Edit the POM.xml file to mention exact versions of MapR, Hadoop, Hive, and Sqoop that you are using. For Cloudera the required properties could be edited as shown below:

    <properties>
    <Hadoop.version>2.6.0-cdh5.6.0</Hadoop.version>
    <Hive.version>1.1.0-cdh5.7.0</Hive.version>
    <Sqoop.version>1.4.6-cdh5.6.0</Sqoop.version>
    </properties>

    For MapR it is also necessary to mention the version of MapR used, as shown here:

    <properties>
    <Hadoop.version>2.7.0-mapr-1602</Hadoop.version>
    <Hive.version>1.2.0-mapr-1605</Hive.version>
    <Sqoop.version>1.4.6-mapr-1601</Sqoop.version>
    <Mapr.version>5.1.0-mapr</Mapr.version>
    </properties>
  5. Execute this command from the directory containing the Hadoop distribution you want:

    mvn dependency:copy-dependencies -DoutputDirectory=<directory to which you want to download the jars>

    Example: Running this command from the CDH directory: mvn dependency:copy-dependencies -DoutputDirectory=C:\CDHlib would insert the Cloudera Hadoop client libraries to the “C:\CDHlib” directory.