HDInsight

Azure HDInsight is a service that deploys and provisions Apache™ Hadoop™ clusters in the cloud, providing a software framework designed to manage, analyze and report on big data. It makes the HDFS/MapReduce software framework and related projects such as Pig, Hive, and Sqoop available in a simpler, more scalable, and cost efficient environment. The HDInsight SDK also provides the Microsoft Avro Library for data serialization.

The main conceptual documentation that outlines how to get started with the Azure HDInsight Service is available at Azure HDInsight Documentation.

Azure HDInsight PowerShell

The HDInsight Service uses Azure PowerShell to configure, run, and post-process Hadoop jobs. The documentation for the Azure PowerShell Management cmdlets used to manage HDInsight is available at Azure HDInsight Cmdlets.

Azure HDInsight .NET SDK

The HDInsight Service has a .NET SDK that provides classes related to the creation, configuration, submission, and monitoring of Hadoop jobs managed by an Azure HDInsight Service. In addition, it provides classes used to manage Azure subscriptions using the HDInsight Service and to configure the clusters, storage accounts, MapReduce programs, and the Hive and Oozie components associated with the HDInsight clusters that are managed by an Azure subscription. HDInsight provides two kinds of .NET SDK:

  1. HDInsight SDK Reference (Service Management): Service-management based, which uses the Azure Service Management model.

  2. HDInsight SDK Reference (Resource Manager): Azure Resource Manager or ARM-based model.

The HDInsight .NET SDK also provides the Microsoft Avro Library, an implementation of the Avro data serialization system which employs rich, JSON-defined data structures and an object container to store persistent data. The Avro data format can be processed by many languages: C, C++, C#, Java, PHP, Python, and Ruby are currently supported. For instructions on how to use the Microsoft Avro Library to serialize objects and other data structures into streams, see Serialize data with the Microsoft Avro Library.

The documentation for the .NET SDK, including the Avro Library, is available at HDInsight SDK Reference Documentation.

See Also

Other Resources

Welcome to Apache Hadoop!
Apache Avro™ 1.7.6 Documentation