Installing the Hadoop Client Libraries
Hadoop client libraries are required for processing the Hadoop-related DataMover, Hive, MapReduce, and Sqoop jobs.
Note: As of TA 6.3, Hadoop libraries are not included with TA. Instead, we provide a Maven script (POM.xml) to install the required libraries.
If you do not already have Maven, you must download and install it. Obtain the POM.xml file from the folder/directory named "Hadoop" in the CD and run the file script to download the required Hadoop client libraries.
Note: The instructions here are for Windows.
Installing Maven
Note: If you do not have Maven installed, follow the instructions below.
Maven Prerequisites
-
JDK is installed.
-
JAVA_HOME environment variable is set and pointed to your JDK.
To download and install Maven:
-
Download maven 3 or above from https://maven.apache.org/download.cgi.
-
Unzip apache-maven-<3 or above>-bin.zip.
-
Add the bin directory of the created directory (for example, apache-maven-3.3.9) to the PATH environment variable
-
Confirm a successful Maven installation by running the mvn -v command in a new shell.
Downloading the Hadoop Client Library
With Maven installed, you can now download the Hadoop client library.
Maven scripts (POM.xml) are provided for these distributions of Hadoop:
Hadoop Distribution Type |
Versions |
---|---|
Cloudera |
CDH5 |
Hortonworks |
HDP 2.4.x |
MapR |
5.1.0 |
Note: Refer the TA Compatibility Matrix for the most current version information.
To download and install the Hadoop client library:
-
Download the POM.zip file. This file is provided in the /Hadoop directory in the TA 6.3 distribution package.
-
Unzip the POM.zip.
The POM xml files needed by Maven are saved in the directory structure shown here:
-
Open a Windows command prompt and navigate to the directory for the Hadoop distribution in which you are interested.
Example: Navigate to the CDH directory if you want to download Hadoop client libraries for Cloudera.
-
Edit the POM.xml file to mention exact versions of MapR, Hadoop, Hive, and Sqoop that you are using. For Cloudera the required properties could be edited as shown below:
<properties>
<Hadoop.version>2.6.0-cdh5.6.0</Hadoop.version>
<Hive.version>1.1.0-cdh5.7.0</Hive.version>
<Sqoop.version>1.4.6-cdh5.6.0</Sqoop.version>
</properties>
For MapR it is also necessary to mention the version of MapR used, as shown here:
<properties>
<Hadoop.version>2.7.0-mapr-1602</Hadoop.version>
<Hive.version>1.2.0-mapr-1605</Hive.version>
<Sqoop.version>1.4.6-mapr-1601</Sqoop.version>
<Mapr.version>5.1.0-mapr</Mapr.version>
</properties>
-
Execute this command from the directory containing the Hadoop distribution you want:
mvn dependency:copy-dependencies -DoutputDirectory=<directory to which you want to download the jars>
Example: Running this command from the CDH directory: mvn dependency:copy-dependencies -DoutputDirectory=C:\CDHlib would insert the Cloudera Hadoop client libraries to the “C:\CDHlib” directory.