WPS Interop For Hadoop

SAS Language Modules

Hadoop Big Data Environments

Hadoop is an ecosystem of storage and processing components that provide a scalable, fault-tolerant, software framework for the distributed storage and processing of very large datasets on computer clusters.

There are many different third party Hadoop environments available ranging from the native Apache open source version up to different commercial variants of the Apache version. WPS is capable of operating with native Apache Hadoop and commercial variants that remain close to the Apache standard, including Hortonworks, MapR and in particular, WPS is certified by Cloudera for use with their Hadoop environment version 5 and above.

Supported Hadoop Features

The WPS Interop for Hadoop module provides language support to interoperate with third party Hadoop big data environments.

  • HDFS, Pig, MapReduce: the HADOOP procedure provides support for HDFS commands, executing Pig Scripts and MapReduce commands.
  • File Types: the FILENAME statement provides support for the Hadoop file access method.

Additional support is provided by the Hadoop data engine module.

  • Hive, Impala: the WPS engine for Hadoop provides access to Hive and Impala data sources via standard or pass through SQL.

Dependencies and Usage

WPS Interop for Hadoop is supported in WPS version 3.2 and above.

A third party Hadoop environment needs to be installed, configured and fully operational before considering the installation and use of WPS with that environment.

WPS Interop for Hadoop can be used on platforms where third party Hadoop environments are supported including Windows and UNIX.

More Information

The document listed below will provide you with details about configuring and using WPS and Hadoop.

Language Syntax Description
WPS-Configuration-for-Hadoop-Syntax-Diagram.pdf (990 KB) User guide and lookup for the language support in the WPS Interop for Hadoop module (SYNTAX DIAGRAM version)

 

Other SAS language modules

WPS Core

Support for core language, macros, output and standard data file formats (datasets, sequential files, transport files)

WPS Graphing

Graphing and charting language support

WPS Statistics

Statistical analysis language support

WPS Time Series

Time Series analysis language support

WPS Matrix Programming

Language syntax for advanced matrix manipulation

WPS Machine Learning

Language support for machine learning algorithms

WPS Interop For R

Language of R support

WPS Interop For Python

Language of Python support

WPS Interop For Hadoop

Language support to interact with Hadoop big data environments

WPS Communicate

Programatically execute parts of a script on remote server installations of WPS and upload/download data to/from the remote servers

WPS Language SDK

Develop your own custom SAS language items

Have a question?

Get in touch with our sales team

Try or buy

Standard Edition
Academic Edition
Community Edition