Hadoop Big Data Environments
Hadoop is an ecosystem of storage and processing components that provide a scalable, fault-tolerant, software framework for the distributed storage and processing of very large datasets on computer clusters.
WPS is able to operate with third party Hadoop big data environments including the major distributions Cloudera, Hortonworks, MapR and native Apache Hadoop. WPS is certified for use with Cloudera version 5 and later.
The WPS Engine for Hadoop provides access to Hive and Impala data sources in a Hadoop environment via standard or pass through SQL.
|Type of Access||Supported?|
|Creating New Tables|
|Implicit Pass Through Support|
|Explicit Pass Through Support|
The WPS engine for Hadoop connects to a Hadoop cluster using the JDBC interface.
Interoperating with Hadoop Big Data Environments
The WPS Interop for Hadoop module provides additional language support for interoperating with a Hadoop environment. This includes a FILENAME statement for direct HDFS connections and a HADOOP procedure for executing Pig and MapReduce commands directly within a Hadoop cluster.
Dependencies and Usage
The WPS Engine for Hadoop can only be used on the supported platforms indicated in the table below.
|AIX on IBM Power|
|Linux on ARM|
|Linux on IBM Power LE (Little Endian)|
|Linux on x86|
|macOS on x86|
|Windows on x86|
|z/OS on an architecture 7 machine|