Installing Hadoop Client Libraries

Installing Hadoop Client Libraries

Hadoop client libraries are required for processing the Hadoop-related DataMover jobs. As of TA 6.5.5, Hadoop libraries are not included with TA. Instead, Tidal provides a Maven script (POM.xml) to install the required libraries.

If you do not already have Maven:

  • Download and install the software.

  • Obtain the POM.xml file from the directory named Hadoop.

  • Run the file script to download the required Hadoop client libraries.

Installing Maven

Before installing Maven, your system must meet these prerequisites:

  • The Java Development Kit (JDK) must be installed.

  • The JAVA_HOME environment variable must be set and point to your JDK.

To download and install Maven:

  1. Download maven 3 or above.

  1. Unzip apache-maven-<3 or above>-bin.zip.

  1. Add the bin directory of the created directory (for example, apache-maven-3.3.9) to the PATH environment variable.

  1. Confirm a successful Maven installation by running the mvn -v command in a new shell.

Downloading the Hadoop Client Library

With Maven installed, you can now download the Hadoop client library. Maven scripts (POM.xml) are provided for the distributions of Hadoop:

Hadoop Distribution Type

Versions

Cloudera

CDH5

Hortonworks

HDP 2.4.x

MapR

5.1.0

Note: The Tidal Automation Compatibility Matrix contains the most current version information.

To download and install the Hadoop client library:

  1. Download the POM.zip file. This file is provided in the /Hadoop directory in the TA 6.5.6 (or later) distribution package.

  1. Unzip the POM.zip file.

  1. Open a terminal window and navigate to the directory for the Hadoop distribution in which you are interested. For example, navigate to the CDH directory if you want to download Hadoop client libraries for Cloudera.

  2. Edit the POM.xml file to mention exact versions of MapR, Hadoop, Hive, and Sqoop that you are using. For example, for Cloudera the required properties could be edited as shown below:

    <properties>
    <Hadoop.version>2.6.0-cdh5.6.0</Hadoop.version>
    <Hive.version>1.1.0-cdh5.7.0</Hive.version>
    <Sqoop.version>1.4.6-cdh5.6.0</Sqoop.version>
    </properties>

    For MapR, you must also include the version of MapR used, as shown in the example:

    <properties>
    <Hadoop.version>2.7.0-mapr-1602</Hadoop.version>
    <Hive.version>1.2.0-mapr-1605</Hive.version>
    <Sqoop.version>1.4.6-mapr-1601</Sqoop.version>
    <Mapr.version>5.1.0-mapr</Mapr.version>
    </properties>
  1. Execute the command from the directory containing the Hadoop distribution you want:

    mvn dependency:copy-dependencies -DoutputDirectory=<jar-download-directory>

    Example: Running the command from the CDH directory would insert the Cloudera Hadoop client libraries to the /CDHlib directory.mvn dependency:copy-dependencies -DoutputDirectory=/CDHlib