Installing the Hadoop Client Libraries

Hadoop client libraries are required for processing the Hadoop-related DataMover, Hive, MapReduce, and Sqoop jobs. As of TA 6.5.5, Hadoop libraries are not included with TA . Instead, we provide a Maven script (POM.xml) to install the required libraries.

If you do not already have Maven, you must download and install it. Obtain the POM.xml file from the folder/directory named "Hadoop" in the CD and run the file script to download the required Hadoop client libraries.

Note: The instructions here are for Windows.

Installing Maven

If you do not have Maven installed, follow the instructions below.

Maven Prerequisites:

  • JDK must be installed.

  • The JAVA_HOME environment variable must be set and point to your JDK.

To download and install Maven:

  1. Download Maven 3 or above from https://maven.apache.org/download.cgi.

  1. Unzip apache-maven-<3 or above>-bin.zip.

  1. Add the bin directory of the created directory (for example, apache-maven-3.3.9) to the PATH environment variable.

  1. Confirm a successful Maven installation by running the mvn -v command in a new shell. The result should look similar to this:

Downloading the Hadoop Client Library

With Maven installed, you can now download the Hadoop client library.

Maven scripts (POM.xml) are provided for these distributions of Hadoop:

Hadoop Distribution Type

Versions

Cloudera

CDH5

Hortonworks

HDP 2.4.x

MapR

5.1.0

Note: The TA Compatibility Matrix contains the most current version of the information.

To download and install the Hadoop client library:

  1. Download the POM.zip file. This file is in the /Hadoop directory in the TA 6.5.5 distribution package.

  2. Unzip the POM.zip. The POM xml files needed by Maven are saved in the directory structure.

  1. Open a Windows command prompt and navigate to the directory for the Hadoop distribution you are interested in. For example, navigate to the CDH directory to download Hadoop client libraries for Cloudera.

  2. Edit the POM.xml file to mention the exact versions of MapR, Hadoop, Hive, and Sqoop that you are using. For example, for Cloudera the required properties could be edited as shown below:

    <properties>
    <Hadoop.version>2.6.0-cdh5.6.0</Hadoop.version>
    <Hive.version>1.1.0-cdh5.7.0</Hive.version>
    <Sqoop.version>1.4.6-cdh5.6.0</Sqoop.version>
    </properties>

    For MapR it is also necessary to mention the version of MapR used, as shown:

    <properties>
    <Hadoop.version>2.7.0-mapr-1602</Hadoop.version>
    <Hive.version>1.2.0-mapr-1605</Hive.version>
    <Sqoop.version>1.4.6-mapr-1601</Sqoop.version>
    <Mapr.version>5.1.0-mapr</Mapr.version>
    </properties>
  1. Execute this command from the directory containing the Hadoop distribution you want:

    mvn dependency:copy-dependencies -DoutputDirectory=<directory to which you want to download the jars>

    Running this command from the CDH directory:

    mvn dependency:copy-dependencies -DoutputDirectory=C:\CDHlib

    would insert the Cloudera Hadoop client libraries into the “C:\CDHlib” directory.