Hadoop Big Data Environments
Hadoop is an ecosystem of storage and processing components that provide a scalable, fault-tolerant, software framework for the distributed storage and processing of very large datasets on computer clusters.
There are many different third party Hadoop environments available ranging from the native Apache open source version up to different commercial variants of the Apache version. WPS is capable of operating with native Apache Hadoop and commercial variants that remain close to the Apache standard, including Hortonworks, MapR and in particular, WPS is certified by Cloudera for use with their Hadoop environment version 5 and above.
Supported Hadoop Features
The WPS Interop for Hadoop module provides language support to interoperate with third party Hadoop big data environments.
- HDFS, Pig, MapReduce: the HADOOP procedure provides support for HDFS commands, executing Pig Scripts and MapReduce commands.
- File Types: the FILENAME statement provides support for the Hadoop file access method.
Additional support is provided by the Hadoop data engine module.
- Hive, Impala: the WPS engine for Hadoop provides access to Hive and Impala data sources via standard or pass through SQL.
Dependencies and Usage
WPS Interop for Hadoop is supported in WPS version 3.2 and above.
A third party Hadoop environment needs to be installed, configured and fully operational before considering the installation and use of WPS with that environment.
WPS Interop for Hadoop can be used on platforms where third party Hadoop environments are supported including Windows and UNIX.
The document listed below will provide you with details about configuring and using WPS and Hadoop.
|WPS-Configuration-for-Hadoop-Syntax-Diagram.pdf (989 KB)||User guide and lookup for the language support in the WPS Interop for Hadoop module (SYNTAX DIAGRAM version)|
|WPS-Configuration-for-Hadoop.pdf (935 KB)||User guide and lookup for the language support in the WPS Interop for Hadoop module (TEXTUAL SYNTAX version)|