Event Tracing

Improve Debugging And Performance Tuning With ETW

Dr. Insung Park  and Ricky Buch

This article discusses:

  • Understanding the ETW architecture
  • What's new for ETW in Windows Vista
  • Programming with the event provider APIs
This article uses the following technologies:
Windows Vista

Contents

Event Tracing for Windows
The Unified Event Provider Model and APIs
Design and Implementation Guidelines
Tools and Support
Conclusion

The increasing size and complexity of today's software systems make software development and management a tough challenge. Accounting for all execution states is nearly impossible, and applications often exhibit behaviors developers did not expect. Furthermore, an enormous number of hardware combinations and constantly varying workload characteristics add to the difficulty of diagnosing various software problems. It is not surprising then that reliability and manageability are becoming important features themselves. These features, in turn, create a need for instrumentation.

Clever instrumentation added for a few key error states in software execution can drastically reduce the time spent debugging problems. Instrumentation is also useful in other cases. Certain undesirable situations, such as software and hardware failures and low resource conditions, must be monitored and acted upon in a managed business environment for a large group of computers. Moreover, instrumentation can be very helpful for resolving performance problems, which can be hard to diagnose since they are sensitive to external workload, configuration parameters, and underlying hardware and software status. Traces from a production machine during a period of performance degradation may enable developers and administrators to pinpoint poorly performing components or services or identify bottlenecks that were not foreseen during development. Finally, IT professionals use various management tools to compute resource usage statistics from transaction traces for capacity planning and trend analysis.

Event Tracing for Windows® (ETW) is a general-purpose, high-speed tracing facility provided by the operating system. Using a buffering and logging mechanism implemented in the kernel, ETW provides a tracing mechanism for events raised by both user-mode applications and kernel-mode device drivers. Additionally, ETW gives you the ability to enable and disable logging dynamically, making it easy to perform detailed tracing in production environments without requiring reboots or application restarts. The logging mechanism uses per-processor buffers that are written to disk by an asynchronous writer thread. This allows large-scale server applications to write events with minimum disturbance.

ETW was first introduced on Windows 2000. Since then, various core OS and server components have adopted ETW to instrument their activities, and it's now one of the key instrumentation technologies on Windows platforms. A growing number of third-party applications are using ETW for instrumentation as well, and some take advantage of the events provided by Windows itself. ETW has also been abstracted into the Windows preprocessor (WPP) software tracing technology, which provides a set of easy-to-use macros for tracing "printf" style messages for debugging during development.

On Windows Vista™, ETW has gone through a major upgrade, and one of the most significant changes is the introduction of the unified event provider model and APIs. In short, the new unified APIs combine logging traces and writing to the Event Viewer into one consistent, easy-to-use mechanism for event providers. At the same time, several new features have been added to improve developer and end user experience. In this article, we will introduce the new ETW provider model and describe how developers can adopt the new model in their Windows Vista-based applications.

We start by presenting an overview of ETW architecture and usage model, then describe the new event model and the APIs. We'll follow with a short guide to event instrumentation design and implementation and, finally, look at in-box tools for controlling ETW sessions, processing logged events, and analyzing them for higher-level report generation.

Event Tracing for Windows

The core architecture of ETW is illustrated in Figure 1. As shown, there are four main types of components in ETW: event providers, controllers, consumers, and event trace sessions. Buffering and logging take place in event tracing sessions, which accept events and create a trace file. There are a number of logging modes available for ETW sessions. For instance, a session can be configured to deliver events directly to consumer applications or to overwrite old events in a file by wrapping around when a certain size is reached. A separate writer thread created for each session flushes them to a file or to real-time consumer applications. To enable high-performance, per-processor buffers are used to eliminate the need for a lock in the logging path.

Figure 1 ETW Architecture

Figure 1** ETW Architecture **(Click the image for a larger view)

An event provider is a logical entity that writes events to ETW sessions. Any recordable activity of significance can be an event, and each is represented by an event logged to ETW. An event provider can be a user-mode application, a managed application, a driver, or any other software entity. The only requirement is that the event provider must register a provider ID with ETW through the registration API. A provider first registers with ETW and writes events from various points in the code by invoking the ETW logging API. When a provider is enabled dynamically by the ETW controller application, calls to the logging API send events to a specific trace session designated by the controller. Each event sent by the event provider to the trace session consists of a fixed header that includes event metadata and additional variable user-context data. Due to the growing event instrumentation in many OS components, even a simple application for Windows Vista will already contain several components that are event providers.

When an event is logged to a session, ETW adds a few extra data items along with the user-provided data. They include timestamp, process and thread ID, processor number, and CPU usage data of the logging thread. These data items are recorded in the ETW event header and passed on to event consumers along with the variable event content given by the provider. Many trace consumers find these data fields to be essential in their analysis.

A controller starts and stops ETW sessions and enables providers to them. In some scenarios, such as debugging and diagnosis, a controller tool is invoked as needed to collect in-depth traces. In contrast, for events such as admin-targeted events (we will define these in a later section) that need to flow to the Event Viewer at all times, providers are enabled automatically by the event log service when they register. A controller must have ETW permission on Windows Vista to control sessions, which is given only to a small group of privileged users by default.

Lastly, a consumer is an application that reads log files or listens to a session for real time events and processes them. Event consumption is callback-based; a consumer registers an event callback, which ETW calls with one event at a time. Events are delivered to the ETW consumer in chronological order. There are general-purpose event consumer tools that dump the events into various formats. Figure 2 shows an XML dump of a "Process" event logged by the kernel provider as generated by the tracerpt.exe tool on Windows Vista. This event indicates the start of a Notepad process. Since events contain custom user content logged by the provider, some type of metadata is needed to decode them correctly. The providers using the new APIs are expected to supply an event manifest-an XML file-that defines all events that providers write along with their layout information. A general-purpose consumer application uses Trace Data Helper (TDH) APIs to retrieve the event metadata, decode the events, and display them.

Figure 2 XML Dump of Process Start Event

<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">
    <System>
        <Provider Guid="{9e814aad-3204-11d2-9a82-006008a86939}" />
        <EventID>0</EventID>
        <Version>2</Version>
        <Level>0</Level>
        <Task>0</Task>
        <Opcode>1</Opcode>
        <Keywords>0x0</Keywords>
        <TimeCreated SystemTime="2006-12-18T12:26:27.887309500Z" />
        <Correlation 
            ActivityID="{00000000-0000-0000-0000-000000000000}" />
        <Execution ProcessID="3396" ThreadID="3260" ProcessorID="0" 
            KernelTime="390" UserTime="195" />
        <Channel />
        <Computer />
    </System>
    <EventData>
        <Data Name="UniqueProcessKey">0xFFFFFA800143FA80</Data>
        <Data Name="ProcessId">0x10EC</Data>
        <Data Name="ParentId">0xD44</Data>
        <Data Name="SessionId">1</Data>
        <Data Name="ExitStatus">0</Data>
        <Data Name="UserSID">guest</Data>
        <Data Name="ImageFileName">notepad.exe</Data>
        <Data Name="CommandLine">notepad</Data>
    </EventData>
    <RenderingInfo Culture="en-US">
        <Opcode>Start</Opcode>
        <Provider>MSNT_SystemTrace</Provider>
        <EventName xmlns=
            "https://schemas.microsoft.com/win/2004/08/events/trace">
            Process</EventName>
        </RenderingInfo>
        <ExtendedTracingInfo xmlns="
            https://schemas.microsoft.com/win/2004/08/events/trace">
        <EventGuid>{3d6fa8d0-fe05-11d0-9dda-00c04fd7ba7c}</EventGuid>
        </ExtendedTracingInfo>
</Event>

To many, tracing means collecting events from certain providers of interest. In this way of thinking, an event trace session is tied to one or more providers in a conceptual collection as a whole, and a session itself (the logging engine) is often overlooked. The ETW architecture allows for more dynamic and flexible trace and event management. Here, sessions and providers exist in different spaces. A controller is the one that starts and stops ETW sessions and enables providers to sessions dynamically. Thus, a controller can choose to enable a group of providers to a session, disable some of them after a while, and enable another provider to that same session later. Sessions operate in the kernel and are not statically tied to providers. Likewise, providers typically are not aware of which sessions their events are being logged to. There are large scale applications and services that are providers, controllers, and consumers all at the same time. APIs are provided for all operations for providers, controllers, and consumers, and applications may assume any combination of roles. In general, however, developers implement only event providers and use in-the-box tools to collect traces and view them.

One of the advantages of the separation of providers and trace sessions is that tracing becomes immune to application problems such as crashes or hangs. Events logged by providers before a crash will be in the kernel memory, if not in a trace file already, which makes this particularly useful in debugging application anomalies.

As mentioned earlier, events are used by developers, IT administrators, and management tool developers for debugging, monitoring, diagnosis, and capacity planning. The usual analysis methodologies based on events can be categorized into the following techniques.

Scanning Users scan through the event dump to find a single important event or a small pattern of known events. This is typically done when events are used to debug for failure cases corresponding to end-user problems or to search for significant failures in the event log.

Delta Analysis Since ETW captures timestamp and CPU usage numbers for each event, such as simple delta analysis of the form

Property (Event B)-Property (Event A)

allows for response time and CPU usage statistics of application activities. If two events mark a beginning and end of an activity, a large set of events collected from production mode applications can be processed in this manner to produce a summary of response time and CPU usage statistics.

Statistical Analysis Sometimes, just counting certain events provides extra insight into software behavior.

State Machine and Resource Tracking A sufficient set of events enables the construction of a state machine and in turn, a simulation based on traces. For instance, since the majority of core OS activities are instrumented with ETW events, OS traces can be used to build a state machine that keeps track of scheduler, memory, I/O activities, and so forth.

End-to-End Tracing Large applications often consist of a number of distributed components integrated via complicated interconnections. They frequently encompass multiple machines, each serving a different role. Instrumentation geared toward request tracking is one of the approaches to overcoming the increasing difficulty of diagnosing problems in such environments. In this framework, instrumentation points are added throughout applications that record activities along with the unique ID for the request currently being served. After traces are collected, events that correspond to the same request are correlated during event consumption such that its activity and progress can be tracked. Later, specific requests of interest can be looked at individually for problems in different service stages, or a group of requests can be summarized through a statistical analysis. Server Performance Advisor (SPA), available from the Microsoft Download Center, is a server management and diagnosis tool that employs server instrumentation designed for end-to-end tracing.

The Unified Event Provider Model and APIs

On Windows Vista, we introduce a new set of event provider APIs that are easier to use while offering more features and enhanced security options. The new APIs are also used to write to the Event Viewer, uniting trace and event logs into one consistent API set. This section describes the model and the APIs in more detail. The differences between the new APIs and the existing ones will be pointed out as necessary.

In order to become an ETW provider, a software component needs to register with ETW through the EventRegister API. EventRegister requires a GUID called ProviderId that uniquely identifies the provider. Any software entity (application, shared DLL, or driver) can register as a provider since a provider does not have to be bound to any OS entity. Registration is typically done at the entry point into the component: a DLL attach routine or a driver entry, for example. A registration handle is returned to be used in subsequent logging API calls. Finally, an EventUnregister call is made at the end of the provider execution.

With the existing provider APIs, providers must supply a callback for enable/disable notification. That is, if a controller enables the provider, the registered callback function is invoked with enable settings. One of the parameters to the callback is a handle to the session that the provider is enabled to. Upon receiving an enable callback, the provider sets a global variable (such as TracingOn) to indicate whether tracing is on or off and stores the session handle as well. Then it uses this session handle, which it obtained from the callback, in logging API calls (conditionally based on the value of TracingOn).

In the new ETW provider model, ETW remembers the enable settings on behalf of providers. In other words, providers register and invoke logging calls without having to check whether they are currently enabled or not. Within the logging API, ETW quickly looks at the enable settings and sends the events to sessions only if they are enabled. If they're not, the logging call is discarded. Hence, enable/disable callback is optional in the new model. However, an enable callback may still be necessary in some scenarios. For instance, instrumentation intended for a state machine construction often requires a snapshot or state rundown events at the beginning and end of traces.

The logging API, EventWrite, takes a registration handle (as opposed to a session handle in the old model). Using a registration handle in the logging calls allows enable settings to be transparent to providers. Since the handle used in logging is an opaque handle in the new model, ETW is now capable of multicasting events. That is, a provider can be enabled to more than one ETW session, which was not possible in the old model where a session handle is used.

ETW provides separate APIs, EventEnabled and EventProviderEnabled, that test if the provider is enabled. Although the logging APIs already check the enable setting before writing events, sometimes providers may require additional work when tracing is enabled. One such scenario involves gathering and constructing informational event data that is not necessarily required for program execution. With these APIs, a provider can find out at any time whether tracing is enabled or not, if there is a need.

As stated previously, users can add variable context data to each event. The logging API employs a scatter/gather mechanism for picking up event-specific data items. Callers pass in additional event data items by constructing an array of data descriptors. A data descriptor is a struct with pointer and size fields. Thus, users add one data descriptor for each data item to log. A macro (EventDataDescCreate) is provided for easy construction of a data descriptor. ETW then copies user-provided content into its session buffers during logging.

In the corresponding event manifest, the layout of an event should be specified through a <Template> tag. A template describes user-specified context data that each event includes. The template can define the layout, which may contain individual data fields, such as integers and strings, or complex data structures such as an array of structs. A template is not required for all events; if no template is specified, that event is expected to have no user-provided data. In the manifest, a template may be shared by multiple events, such as a Start and Stop event having the same context information. When a consumer application encounters an event, it locates an event template through the TDH APIs and decodes the variable event data accordingly. Providers using the old APIs supply layout information through Windows Management Instrumentation (WMI) Managed Object Formats (MOFs).

Figure 3 contains sample provider code that uses the new event APIs for registration and logging. This is a user-mode provider that employs user-mode provider APIs, but a corresponding set of kernel-mode provider APIs are also available. The first event in Figure 3 writes two user-provided data items. One is of type ULONG and the other a NULL terminated WCHAR string. EventDataDescCreate is invoked to construct the appropriate data descriptor array. In addition to the EventWrite API shown here, there are two more logging APIs: EventWriteString and EventWriteTransfer. EventWriteString allows simple logging of a non-manifested string. When EventWriteString is called, ETW marks the header denoting that the event data is a single NULL terminated WCHAR string. If a consumer sees this in the header, it treats the user data as a string, without having to search for the event schema through TDH. EventWriteString enables a quick logging of strings without changing the manifest.

Figure 3 ETW Provider

#include <myevents.h>   // Header generated from manifest. 
                        // Contains MyProviderId and event descriptors.

REGHANDLE MyProvRegHandle;
ULONG MyInteger;
PWCHAR MyString;
ULONG MyStringLength;
EVENT_DATA_DESCRIPTOR DataDescriptor[2];

...

// Register the ETW provider.
Status = EventRegister(&MyProviderId,      // ProviderId (GUID)
                       NULL,               // Optional Callback 
                       NULL,               // OPtioanl Callback Context
                       &MyProvRegHandle);  // Registration Handle

...

// Construct DataDescriptor and write an event with 
// MyInteger and MyString.
EventDataDescCreate(&DataDescriptor[0],    // DataDescriptor
                    &MyInteger,            // Pointer to the data
                    sizeof(ULONG));        // Size of data
EventDataDescCreate(&DataDescriptor[1], &MyString, MyStringLength);

Status = EventWrite(MyProvRegHandle,       // Registration Handle
                    MyEventDescriptor1,    // EventDescriptor
                    2,                     // DataDescriptor array size
                    DataDescriptor);       // DataDescriptor array

...

// Write another event with no user data.
if (EventEnabled(MyProvRegHandle, MyEventDescriptor2)) {
    // Do extra work if enabled and write event.
    ...

    Status = EventWrite(MyProvRegHandle, MyEventDescriptor2, 0, NULL);
}

...

// Unregister the ETW provider. 
Status = EventUnregister(MyProvRegHandle);  

The EventWriteTransfer and EventActivityIdControl APIs cater to the need of end-to-end tracing instrumentation. As described earlier, end-to-end tracing is an instrumentation methodology targeting server applications that perform different activities simultaneously for a number of user requests. For instance, a request for a script execution on a Web page leaves a client machine and arrives at the network layer on a server. Then it travels through the HTTP driver, IIS, the ASP.NET engine, and maybe an Exchange Server on another machine. The goal of end-to-end tracing is to record all the relevant activities regarding this request through ETW events for later debugging and performance analysis. This requires a unique ID that can identify individual requests. Correlation can be performed during consumption with the help of this unique activity ID.

ETW addresses this need by introducing an ActivityId in each event. Every event logged with the new API automatically picks up the current activity ID stored in the executing thread. The activity ID is displayed in the XML dump in <Correlation ActivityId> tag in the <System> section. The provider can get, set, and create an activity ID for the executing thread using the EventActivityIdControl API. The activity ID can travel with the request across multiple components. Unfortunately, it may not be possible to propagate the activity ID in some cases due to public protocol restriction, design restriction, and so on. The EventWriteTransfer API writes a transfer event, indicating a transfer of activity IDs. In addition to all the parameters of EventWrite, EventWriteTransfer takes two more arguments-ActivityId and RelatedActivityId.

Every event is stamped with the provider ID and assigned an entity called the event descriptor, which defines standard event information and provides further identification and semantics to it. Developers determine the event descriptors for instrumentation points during the instrumentation design phase and write corresponding entries in the event manifest. Then the Message Compiler in a development environment generates event descriptors in a header file from the given event manifest, which in turn are included and used in source files. Programmatically, event descriptor is a struct consisting of the following fields: Id, Version, Channel, Level, Opcode, Task, and Keywords:

typedef struct _EVENT_DESCRIPTOR {
    USHORT Id;
    UCHAR Version;
    UCHAR Channel;
    UCHAR Level;
    UCHAR Opcode;
    USHORT Task;
    ULONGLONG Keyword;
} EVENT_DESCRIPTOR, *PEVENT_DESCRIPTOR;

An event ID is used to uniquely identify an event in a provider. When an event is defined in the manifest, the event ID is the only mandatory entry. Upon encountering an event, a consumer uses its provider ID (GUID) and event ID (USHORT) to locate the manifest for it. Likewise, a version offers events to be changed and augmented in later releases while retaining the same semantics and event ID. Therefore, the event ID and version together with the provider ID can uniquely identify an event.

A channel defines a group of events for a target audience. A channel belongs to one of the four types: admin, operational, analytic, and debug. Events raised into an admin channel are actionable events; upon receiving an event, the administrator should immediately know why the event was raised and what to do about it. Events raised into an operational channel are targeted at high-level monitoring tools and support staff; they offer more detailed context and are more frequent than admin channel events. Events grouped in admin and operational channels are sent to the event log automatically and displayed in the Event Viewer. An analytic channel is for traditional traces that are targeted at expert-level support professionals or detailed diagnosis and troubleshooting tools. A debug channel is to be used for debug messages and contains events that are meant to be consumed by developers. Analytic and debug channel events are not enabled by default. Through channels, events intended for different purposes and audiences can be added with one set of APIs.

When enabling a provider, a controller can specify a level (1-byte integer) and keywords (8-byte bit masks). The level and the keywords are used to add dimensions to ETW instrumentation. The level is designed to enable filtering based on the severity or verbosity of events. The keyword is designed to indicate sub-components in a provider. For instance, developers may divide events into informational events and critical-error events. Also, they can assign different keywords to sub-components of the application. By enabling selectively with different level and keywords, the trace controller can enable providers to log only error events from sub-component B, or all events from sub-components A and C, and so forth. For providers using the old APIs, a keyword is 4 bytes, and filtering by level and keyword has to be done explicitly in the provider code. It is important to note that when a controller enables a particular level, all events with a level value less than or equal (higher or equal severity) to what the controller specified are also enabled. Although levels and keywords can be custom designed and assigned to events by developers, there are predefined levels, as shown in Figure 4.

Figure 4 Severity Levels

<levels>
    <level name="win:LogAlways" symbol="WINEVENT_LEVEL_LOG_ALWAYS" 
        value="0" message="$(string.level.LogAlways)"> Log Always 
    </level>
    <level name="win:Critical" symbol="WINEVENT_LEVEL_CRITICAL" value="1" 
        message="$(string.level.Critical)"> Only critical errors </level>
    <level name="win:Error" symbol="WINEVENT_LEVEL_ERROR" value="2" 
        message="$(string.level.Error)"> All errors, includes win:
        Critical </level>
    <level name="win:Warning" symbol="WINEVENT_LEVEL_WARNING" value="3" 
        message="$(string.level.Warning)"> All warnings, includes
        win:Error </level>
    <level name="win:Informational" symbol="WINEVENT_LEVEL_INFO"
        value="4" 
        message="$(string.level.Informational)"> All informational 
        content, including win:Warning </level>
    <level name="win:Verbose" symbol="WINEVENT_LEVEL_VERBOSE" value="5" 
        message="$(string.level.Verbose)"> All tracing, including 
        previous levels </level>
</levels>

Task and opcode are used to attach additional information to each event. A task specifies a common logical component or task being instrumented. The task often represents the key high-level steps that a component takes to achieve its purpose. An opcode signifies the specific operation being performed at the time the event is written. For example, the Windows kernel provider groups all file I/O operation events into a "FileIO" task. Opcode indicates what the operation was, such as Create, Open, Read, and Write. Unlike ID, version, channel, level, and keyword, task and opcode are only used for adding information; they do not have any impact on controlling instrumentation or locating metadata.

The event descriptors and layouts are specified in the event manifest. Developers write the event manifest when instrumentation is designed. An event manifest is written in XML, in which user-defined channels, tasks, opcodes, levels, and keywords are specified in appropriate XML tags. ETW predefined channels, levels, and opcodes can also be used; the predefined channels, however, refer to global channels and should only be used for admin-targeted events. The different metadata fields are combined for each event and defined in an <Event> tag that is uniquely associated with an event ID. An event can also have a message string that will be displayed with substituted values from the event data when the event is being read by a consumer. Figure 5 shows an XML fragment from a sample event manifest. Later we'll present a step-by-step guideline to event instrumentation design and instrumentation.

Figure 5 Event Manifest Fragment

<provider name="Microsoft-Windows-Kernel-Registry" 
    guid="{70eb4f03-c1de-4f73-a051-33d13d5413bd}" 
    symbol="RegistryProvGuid"
    resourceFileName="%SystemRoot%\System32\advapi32.dll" 
    messageFileName="%SystemRoot%\System32\advapi32.dll">
  <channels>
    <channel name="Microsoft-Windows-Kernel-Registry/Analytic" 
        chid="RegistryEvents" symbol="REG_Events" type="Analytic" 
        isolation="System">This channel contains registry 
        events.</channel>
  </channels>
  <opcodes>
    <opcode value="32" name="CreateKey" symbol="" />

    ...

  </opcodes>
  <keywords>
    <keyword name="CreateKey" symbol="" mask="0x1000" />

    ...

  </keywords>
  <templates>
    <template tid="tid_RegOpenCreate">
      <data name="BaseObject" inType="win:Pointer" 
         outType="win:HexInt64" />
      <data name="KeyObject" inType="win:Pointer" 
          outType="win:HexInt64" />
      <data name="Status" inType="win:UInt32" 
          outType="win:HexInt32" />
      <data name="Disposition" inType="win:UInt32" />
      <data name="BaseName" inType="win:UnicodeString" 
          outType="xs:string" />
      <data name="RelativeName" inType="win:UnicodeString" 
          outType="xs:string" />
    </template>

    ...

  </templates>
  <events>
    <event value="1" symbol="ETW_REGISTRY_EVENT_CREATE_KEY" 
        template="tid_RegOpenCreate" opcode="CreateKey" 
        channel="RegistryEvents" level="win:Informational" 
        keywords="CreateKey" 
        message="$(string.Registry.RegOpenCreate)"/>

        ...

      </events>
    </provider>

      ...

  <localization>
    <resources culture="en-US">
      <stringTable>
        <string id="Registry.RegOpenCreate" 
            value="Registry key %6 was created with status %3." />
      </stringTable>
    </resources>
  </localization>

Managed ETW provider APIs will also be available in the Microsoft® .NET Framework 3.5, code-named "Orcas." Managed consumer and controller APIs are currently in the planning stage. System.Diagnostics.Eventing contains the EventProvider class. EventProvider offers methods for all the functionality described above for the native applications. Users need to instantiate the class with a provider GUID and use the class instance to log events. EventDescriptor is a struct equivalent to the event descriptor in the native APIs. Unlike the native case, EventDescriptors are not generated for managed applications. However, a tool capable of generating code aimed at better performance and validation is under consideration. A sample usage of EventProvider is shown in Figure 6.

Figure 6 Managed ETW Provider

using System.Diagnostics.Eventing;
...

static void Main(string[] ArgRead)
{
    int MyInteger;
    string MyString;

    ...

    // Construct event descriptor.
    EventDescriptor Event1 = new EventDescriptor(5, 0, 0, 2, 0, 0, 0);

    // Instantiate event provider.
    EventProvider etwProvider = new EventProvider(
        new Guid("d58c126f-b309-11d1-969e-0000f875a5bc"));

    ...

    // Write an event with MyInteger and MyString.
    etwProvider.WriteEvent(ref Event1, MyInteger, MyString);

    ...
}

In addition, ETW on Windows Vista offers improved security options for providers. By default, any provider can register and write events. However, developers can place restrictions on a GUID so that only authorized users can register a provider with that GUID. A provider can also specify who can enable it. Furthermore, ETW allows a controller to declare a session to be secure, meaning that it receives events only from a certain group of users.

Once the binary with the event provider instrumentation is compiled, the provider is installed. Then users can collect events from the provider using the logman tool, an in-the-box controller application. The logman.exe command below issues two ETW control API calls that start an ETW session called mysession and enable a provider:

> logman start mysession -p <provider name> -o mytest.etl -ets

Session mysession writes events to a file called mytest.etl. There are a number of logman.exe options that customize logging modes, buffer configuration, and so on. Here, <provider name> can be the provider name in the manifest or the GUID that the provider uses to register with using the EventRegister API. If the provider GUID is 11223344-5566-7788-99aa-bbccddeeff00, the actual command line becomes the following:

> logman start mysession 
-p {11223344-5566-7788-99aa-bbccddeeff00} 
-o mytest.etl -ets

ETW allows a controller to pre-enable a provider before it registers and executes. This lets users collect startup traces for dlls and drivers for which the loading of binary images is not easily controlled. A controller starts a session and enables a provider before the provider begins execution. As soon as the provider registers, it is enabled.

After events are collected as needed, the session can be stopped by using the stop option in logman.exe.

> logman stop mysession -ets

A trace file called mytest.etl should now exist in the same directory in which the logman command was executed. Tracerpt.exe can be used to get the dump of events from this file:

> tracerpt mytest.etl

This command creates a dump file (dumpfile.xml by default) and a text summary file (summary.txt by default). The summary file contains a short summary of event statistics. The dump file contains XML dumps of events, such as the one shown in Figure 2. The data fields from the event header are presented in the <System> section. Process ID, thread ID, processor ID, and CPU usage are given in the <Execution> tag within <System>. Also, the <System> section shows the values of the event descriptor fields of the event. Output file names can be specified by the user as options to the tracerpt.exe command. There are other GUI tools on Windows Vista capable of collecting and viewing traces, which we'll discuss later.

There are already hundreds of event providers installed on Windows Vista. Users can see the list of these providers from PerfMon or the Event Viewer or through a logman.exe command:

> logman query providers

Design and Implementation Guidelines

In this section, we discuss the steps to create the manifest file that defines the events, instrument the code with ETW API calls, and successfully build and install the event provider.

Design the Instrumentation Manifest The instrumentation manifest is an XML-format file that defines the event provider and the events that the event provider logs to ETW. Each event has a standard metadata and a variable event data section. The manifest can either be created by hand or using the manifest generator tool available in the SDK (ecmangen.exe). Here are the steps:

  1. Create an event provider and determine its provider ID, which is a GUID.
  2. Define the logical tasks of the component and the specific instrumentation points within those logical tasks. This defines the tasks and opcodes for the events that the provider writes.
  3. Define the subcomponents that will be logging events and assign a keyword for each subcomponent. This allows controllers to selectively enable events.
  4. Assess the target audiences of events being designed and define channels for each target audience.
  5. Create the different constructs that will define the variable user context data. This is done in the <templates> section.
  6. Put everything together and define events in the <events> section of the manifest. Assign to each event the symbolic identifier, severity, keywords, channel, task, opcode, and template. A localizable event-specific message can be added as well.

Instrument the Code with the ETW API After the events have been defined in the manifest, use the unified APIs to write events to ETW. For native applications, first generate a header file that contains the event definitions from the manifest so it can be included in the provider code. Then use the ETW APIs to write the events designed previously. This is the process:

  1. Run the Message Compiler (mc.exe) on the manifest created above. This generates the header file that contains event descriptor structs for the events, with each struct instance having the same name as the event symbolic identifier. The Message Compiler also produces the resource string file. The generated header file needs to be included in the provider code.
  2. Before writing code to raise events, a provider needs to register with the provider ID defined in the manifest by calling the EventRegister function. A registration handle returned from EventRegister should be stored in order to be used for logging API calls and the corresponding EventUnregister call.
  3. Call EventWrite to log the events defined in the manifest. The event-specific data is passed in through a data descriptor array as depicted in Figure 3. (Developers must make sure that the data fields exactly match the corresponding event templates specified in the manifest.) Pass in the right event descriptor struct defined in the header file with the event.
  4. After logging all the necessary events, call EventUnregister and pass in the registration handle.

Compiling the Instrumented Binary The manifest information that defines the events should be compiled during the build process and attached as a resource into the provider executable binary. All the localizable strings will be taken out of this compiled manifest and placed in a separate resource file that must be compiled and linked separately. When the provider binary is deployed, the manifest information needs to be installed on the system so that both controllers and consumers can find the event provider and event definitions. All the required tools are available natively on Windows Vista or from the SDK. The steps involved follow:

  1. If there are localizable message strings for the events, it may be necessary to run rc.exe to compile the resource file into a .res.
  2. If Visual Studio® is used, the developer can include a pre-build command to run rc.exe to generate the .res compiled resource and include this compiled resource file in the project.
  3. Link the compiled resource file using either cl.exe or link.exe (this is done in Step 2 for users of Visual Studio) and compile the application or driver.
  4. When installing the provider, run the wevtutil.exe tool with the -im option to install the manifest that defines the events. This allows controllers to discover the event providers and also lets consuming tools find the information needed to decode events properly.

Tools and Support

So far we have used logman.exe, a command-line controller, and tracerpt.exe, a general consumer tool. Logman.exe also has an option to schedule collections on remote machines. And Tracerpt.exe is capable of dumping events in CSV format, and it can generate resource-consumption reports using the -report option for some of the known core OS events. Figure 7 shows the hot file table generated from tracerpt.exe after processing and correlating some known core OS events based on a state machine-building technique.

Figure 7 Hot File Report Generated by Tracerpt

Figure 7** Hot File Report Generated by Tracerpt **(Click the image for a larger view)

Reliability and Performance Monitor (RPM) (which includes PerfMon on Windows Vista) offers a graphical interface for collecting traces. Using the Event Trace Sessions interface, users can set up a trace session, pick providers to enable, and start and stop ETW sessions. Figure 8 shows RPM when a user is looking at the list of providers to enable. RPM introduces the Data Collector Set concept, in which all the necessary information regarding event traces, performance counters, and configuration (registry and WMI classes) data collection is combined into a single collection set. Users can create a data collection set with providers enabled for trace and counter data. Once created, the collection set settings are stored in RPM so that users can easily start and stop data collection without having to specify sources every time. This feature supports common diagnostic scenarios, where users can specify the providers to collect traces and counters from for some known diagnostic scenarios.

Figure 8 Event Trace Providers in RPM

Figure 8** Event Trace Providers in RPM **(Click the image for a larger view)

The Event Viewer offers another means for collecting and viewing events. The new event APIs were designed for events directed to the event log as well, and admin and operational channel events are automatically routed to and displayed in the Event Viewer, to be monitored for important software status. The Event Viewer also expands its functionality for analytic and debug events. The "View" option allows displaying all available analytic and debug channels, and users can right-click to enable trace collection. In Figure 9, a user right-clicked an analytic channel of the registry event provider to enable trace collection.

Sample code is available in software development kits for provider, controller, and consumer APIs. Sample providers using the new provider APIs include manifests for the events that they log. A driver sample is available as well.

Figure 9 Event Viewer Interface for Analytic Event Collection

Figure 9** Event Viewer Interface for Analytic Event Collection **(Click the image for a larger view)

Conclusion

We have presented the unified event provider model and APIs for ETW newly available on Windows Vista. Our goal is to introduce the APIs to developers and provide them with a usage example and general guidelines. Tracing is still an unfamiliar technology to many developers, and we hope this article provides insight into one of the key instrumentation and diagnostic infrastructures on Windows platforms. Within a few years since its introduction as a general tracing technology on the Windows 2000 operating system, a number of components and applications have employed ETW as a basis for their tracing and manageability features, and this trend is likely to continue.

We anticipate a greater number of developers will begin to adopt event instrumentation technology in order to deliver more reliable, better performing, and more easily manageable applications. Instrumentation will not only help these developers debug their applications, but will also help the people who deploy and manage the applications. Furthermore, instrumentation will aid in capacity planning, trend analysis, and finding performance bottlenecks as well. ETW provides a useful tracing infrastructure to enable applications to provide instrumentation to address all of the above scenarios, and we expect this technology to play a fundamental role as the core building block in the endeavor to make applications more manageable and more diagnosable.

Dr. Insung Park is a Development Lead for the Windows Instrumentation Platform Team. He has published a dozen papers on performance analysis, request tracking, instrumentation technology, and programming methodology and support. His e-mail address is insungp@microsoft.com.

Ricky Buch is a Program Manager for the Windows Instrumentation Platform Team. He works on both Event Tracing for Windows and the Performance Counter Library technologies. He can be reached at ricky.buch@microsoft.com.