Update: Includes information on the Microsoft SDK below
Installing Drivers and Middleware Properly
People have been asking me more about my demonstration of the Kinect and how to get started. Many are having problems installing and getting drivers working, etc. So here’s my easy how to guide to getting started.
Many of the code examples out there are lacking proper installation. I have mentioned Brekel Kinect before a tool for motion capture based on OpenNI.
Brekel has came up with a tool to auto-update your system that will also install the latest OpenNI drivers and fetch them from their place of archive. I highly recommend you install the Code Laboratories NUI package including their Kinect Motor Driver and Kinect NUI Audio device and uninstalling the rest of their NUI drivers before installing the OpenNI drivers using the Brekel OpenNI Auto Installer so you have full access to the Kinect positioning motor and their Kinect audio device driver. If you do it in this both Code Laboratories and the OpenNI drivers will play nicely together.

These can be downloaded at:
Brekel OpenNI Kinect Auto Installer.exe
Brekel OpenNI Kinect Auto Installer – Developer Edition.exe
Another product you may need to install is Code Laboratories Kinect Motor Driver. See "Kinect Motor Driver not found. LED & Motor control disabled" below for installation instructions.
Frequent Error Messages
"Failed to open XML, Error: Unknown USB Device Speed"
You need to use the power adapter for the Kinect camera when using a regular USB port on a PC
"error initializing NITE"
NITE isn't installed properly, make sure you have used the correct license key. When it requests for the license key use the one provided by Primesense : 0KOIk2JeIBYClPWVnMoRKn5cdY4=
"Can't create any node of the requested type!"
NITE isn't installed properly, make sure you have used the correct license key provided
I keep getting a message telling me I need .NET Framework 4.0.
This error can be fixed by installing with the .exe installer first then reinstalling with the .msi installer. If you use the auto-installer above you should not see this error.
"Kinect Motor Driver not found. LED & Motor control disabled"
You need to manually install the motor/ LED driver for the Kinect
- Remove OpenNI's primesense driver.
- Install the CL-NUI platform, http://codelaboratories.com/downloads/
and driver - it will show camera, motor, and audio components in
device manager as NUI-components. - Remove all the NUI devices in device manager
- Re-install the PrimeSense drivers, device manager will show a camera
and motor driver, but no audio component. - Select the PrimeSense motor device in device manager and update its
driver, but point it to the CL-NUI driver which is located in the CL-
NUI install directory (..\Code Laboratories\CL NUI Platform\Driver) -
The NUI driver for the motor will now be installed and device manager
will reflect this. - Repeat step 5 for the audio component install too. The device
manager will show the PrimeSense camera driver and the NUI motor &
audio drivers side by side.
How the Kinect with OpenNI works

The Kinect sensor is a streaming device. It basically sends via high speed USB communication the following types of data:
- RGB Data
- Depth Data
- Audio Capture
Why isn’t it wireless ? This is probably because the amount of information it has to stream at high speed to do it’s work is monumental.
What you need to know about OpenNI to get started with this standard..
Using OpenNI means you will have to be accessing Prime Sense's NITE middleware to get meaningful information from the device, including motion tracking support, user recognition and gesture support..
The middleware contains the following components that allow the device to make sense of the scene the sensor is observing. Note this is not like a mouse where you have limited directional information to read or events to fire. You have to create your own “events” for your applications to understand and use what the Kinect is sending you..
NITE MIDDLEWARE *KEY* COMPONENTS
Full body component: A software component that processes sensory data and generates body related information (typically data structure that describes joints, orientation, center of mass, and so on).
Hand point analysis component: A software component that processes sensory data and generates the location of a hand point
Gesture detection component: A software component that identifies predefined gestures (for example, a waving hand) and alerts the application.
Scene Analyzer component: A software component that analyzes the image of the scene in order to produce such information as:
- The separation between the foreground of the scene (meaning, the figures) and the background
- The coordinates of the floor plane
- The individual identification of figures in the scene.
A Top Down View of the Sensor and Interaction with the Middleware

Thanks to PrimeSense for this graphic.
As you can see the sensor would be nothing without the middleware working between your application. You can write your own highly specialized middleware components depending on the specificity of what you need to track.
Why OpenNI is a very well designed standard
OpenNI supports sensors and natural interaction devices from multiple vendors.

Thanks to PrimeSense for this graphic.
In fact with OpenNI you could have multiple sensors working within your application from different vendors and extended middleware to provide enhanced tracking. There are other other Natural UI devices out there such as cameras and other devices including the Asus Xtion Pro .
In fact there is sensor support or different types of data not just 3D sensors.
Teaching the Sensor how to see and what to observe
How do we use the NITE Middleware to get back meaning information back about what our sensor/sensors are observing.
Meaningful Data Points
"Meaningful" data is defined as data that can comprehend, understand and translate the scene. Creating meaningful 3D data is a complex task. Typically, this begins by using a sensor device that produces a form of raw output data. Often, this data is a depth map, where each pixel is represented by its distance from the sensor. NITE middleware is used to process this raw output, and produce a higher-level output, which can be understood and used by the application.
Production Nodes
OpenNI defines Production Nodes, which are a set of components that have a productive role in the data creation process required for Natural Interaction based applications. Each production node encapsulates the functionality that relates to the generation of the specific data type. These production nodes are the fundamental elements of the OpenNI interface provided for the applications. However, the API of the production nodes only defines the language. The logic of data generation must be implemented by the modules that plug into OpenNI.
For example, there is a production node that represents the functionality of generating hand-point data. The logic of hand-point data generation must come from an external middleware component that is both plugged into OpenNI, and also has the knowledge of how to produce such data.
Sensor-Specific Production Node Types
Device: A node that represents a physical device (for example, a depth sensor, or an RGB camera). The main role of this node is to enable device configuration.
Depth Generator: A node that generates a depth-map. This node should be implemented by any 3D sensor that wishes to be certified as OpenNI compliant.
Image Generator: A node that generates colored image-maps. This node should be implemented by any color sensor that wishes to be certified as OpenNI compliant
IR Generator: A node that generates IR image-maps. This node should be implemented by any IR sensor that wishes to be certified as OpenNI compliant.
Audio Generator: A node that generates an audio stream. This node should be implemented by any audio device that wishes to be certified as OpenNI compliant.
Middleware-Specific Production Node Types

Gestures Alert Generator: Generates callbacks to the application when specific gestures are identified.
Scene Analyzer: Analyzes a scene, including the separation of the foreground from the background, identification of figures in the scene, and detection of the floor plane. The Scene Analyzer’s main output is a labeled depth map, in which each pixel holds a label that states whether it represents a figure, or it is part of the background. The scene analyzer is probably the busiest of all the middleware components.
Hand Point Generator: Supports hand detection and tracking. This node generates callbacks that provide alerts when a hand point (meaning, a palm) is detected, and when a hand point currently being tracked, changes its location.
User Generator: Generates a representation of a (full or partial) body in the 3D scene. Remember more than one user can be recognized and logged in at one time. For a user to be recognized and calibrated he normally must appear first in the “PSI” position extending his limbs as the Greek alphabet symbol
. In other software like the X-Box Kinect a hand waiving gesture is used to recognize the user.
Recording the following are supported:.
Recorder: Implements data recordings
Player: Reads data from a recording and plays it
Codec: Used to compress and decompress data in recordings
What is a Production Chain ?

In order to produce body data, this production node uses a lower level depth generator, which reads raw data from a sensor.
The sequence of nodes (user generator => depth generator), is reliant on each other in order to produce the required body data, and is called a production chain.
In order to produce body data, this production node uses a lower level depth generator, which reads raw data from a sensor.
The sequence of nodes (user generator => depth generator), is reliant on each other in order to produce the required body data, and is called a production chain.
Typically, an application is only interested in the top product node of each chain. This is the node that outputs the required data on a practical level,.
For example, a hand point generator. OpenNI enables the application to use a single node, without being aware of the production chain beneath this node. For advanced tweaking, there is an option to access this chain, and configure each of the nodes.
Speech Recognition
One of the cool things you can do with if you don’t have access to Microsoft Research’s own SDK (which was supposed to arrive in the spring), yet we haven’t seen it in the wild yet is to to use the Audio driver to record audio and use a web service over the internet to do speech recognition with your device. AT&T offers Watson for speech recognition apps and web mash-ups. You could use this with your own applications today. Many iPhone apps use this and it’s available as a service called Vlingo which is popular on Blackberry Phones as well. As far as I know as of this writing, currently TellMe isn’t publicly available.

How it works with OpenNI Today
This example diagram works both with Silverlight or WPF..


The speech recognition engine here is called as a cloud service and uses VoiceXML to do all of the processing..
To the right is a simple example of using VoiceXML and creating your own “vocabulary” of to use with your application using simple scripting.
You can visit my blog entry on VoiceXML for more information on using it in your own applications.
http://uxmagic.com/blog/post/2010/11/02/And-now-for-something-completely-different.aspx
Getting Started with AT&T Speech Recognition with your Kinect
For more information on getting started today with using AT&T’s Watson Service for speech apps/web mashups check out:
http://www.research.att.com/projects/SpeechMashup/index.html
Speech mashups provide an easy way for web developers to incorporate a speech interface into their web apps so their users can use voice commands and receive back spoken responses. All speech and language processing, from automatic speech recognition, text-to-speech conversion, and natural language processing, is performed on AT&T servers so web apps get the advantage of AT&T's expertise in speech and language processing.
Speech mash-ups work as follows: audio or text from a mobile device or a web browser is relayed over the cell network to the speech mashup manager, which manages the entire process by accessing AT&T servers where the speech and language processing takes place, and then relaying the result (interpreted into programming language) to the web application. If the application result is to be spoken, the speech mashup manager sends it for TTS conversion before relaying the spoken response back to the user.
All processing steps are tightly integrated to minimize the number of round trips in the mobile network and reduce latency to achieve a better user experience.
Building a speech mashup for any network-enabled device with audio input) requires the following:
- Register at the speech mashup portal (http://service.research.att.com/smm/) for an account on AT&T servers, and creating a directory for the web app and related files (grammars, log files, etc.).
- Creating and uploading grammars or using a built-in or shared grammar (ASR applications only).
- Building a speech mash-up client in any suitable programming language (Java, JavaScript, .Net, etc.).
- A developer's guide with instructions and examples is available from the portal.
MIX 11 Kinect Audio Presentation

For more information on the Microsoft Research’s work into Audio check out this video above from Mix 11. While it’s a great technical overview I didn’t find this very helpful for you and me.. It seems kind of like stuff that’s already been done (maybe that was the point)
What is Code Laboratories and their contributions to NUI ?
Code Laboratories is an research & development firm focused on NUI based solutions for real-time interactive systems.
Their current focus is on emerging computer interaction technologies and their applications in the fields of digital security, surveillance, signal processing and forensics.
They were one of the first companies out there to hack the Kinect and provide drivers and a working solution. They also have experience with other NUI devices as well including the PlayStation 3 Eye device.. Which they have gotten working before the Kinect was made available.

Code Laboratories products include:
- CL Eye Platform: A multi-camera development framework for high performance sensor arrays.
- CL Studio Live: A real-time GPU accelerated visual creation and streaming suite.
- CL NUI Platform - a stable platform for the NUI Audio, Camera and Motor devices and provide useful samples and documentation. The potential uses are numerous including HCI, robotics, educational use, surveillance, motion capture, people/object tracking, 3D scanning, etc.

Kinect and Natural UI Education Offerings
Business-to-Business Consulting
If you are interested in mentoring or consulting services in these areas for your own business I offer both individualized assistance for teams and individuals for developing your own applications. Contact UXMagic for further details and availability.

Community Based Education Opportunities
Currently in association with Washtenaw Community College’s Lifelong Learning Program (Continuing education non-credit community outreach courses), Training on both WPF/Silverlight , Expression Blend and creating Kinect enabled kiosks is available for those with no experience to extremely skilled. Ghe WCC Lifelong Learning Program’s WPF Design Challenge Course. Courses are over for summer but will begin again in the fall semester. You can check out the courses at http://tinyurl.com/wpfatwcc.
The course teams up students as user experience teams and goes through all aspects of user experience design (including client interviews), introduces them to XAML and Expression Blend and as a class project creates a Kinect-enabled Kiosk.

Our first course has just completed and we will be hopefully (fingers crossed) showing off the students work at Maker Faire at the Henry Ford Museum (if we get accepted). The class was small and had five students total in it.
Team “Design Challenge” did a wonderful job of getting this together. Stop by Maker Faire and hopefully our project gets accepted (as of this writing) and you can play with the results) and talk to the team. A group of very talented people. Great work Greg, Larry, Joanie, Robert and Matt you folks did a great job.

http://makerfaire.com/detroit/2011/
Michigan Silverlight Designers and Developers NUI SIG

http://michiganinteractivedesigners.org/
The NUI SIG meets saturday mornings.
See our website for details..
UPDATE: Microsoft Kinect for Windows SDK beta Released
Download Link:
http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/download.aspx
How the Kinect works from Windows with the SDK (from the Microsoft Documentation)

Kinect for Windows SDK Beta includes:
· Drivers, for using Kinect sensor devices on a computer that is running Windows 7.
· APIs and device interfaces, together with technical documentation for developers.
· Source code samples.
Supported Languages/IDEs
· Visual Studio 2010
To work with the SDK samples and to develop your own applications with the SDK, you can use Visual Studio C# 2010 Express, Visual Studio C++ 2010 Express, or any other version of Visual Studio 2010. Visual Studio Express is available at no cost from the Microsoft website. If you are new to Visual Studio, see the Visual Studio 2010 Getting Started Guide.
· C# or C++ languages
Samples are available in both C# and C++. For information about programming in these languages, see Visual C# Developer Center and Visual C++ Developer Center.
To create a C# application
1. Reference Microsoft.Research.Kinect.dll.
This assembly is in the global assembly cache (GAC) and appears on the .NET tab of the Add Reference dialog box. This DLL calls unmanaged functions from managed code.
2. Include using directives for the following namespaces:
For the NUI API, include:
using Microsoft.Research.Kinect.Nui
For the Audio API, include:
using Microsoft.Research.Kinect.Audio
Samples with the SDK
- SkeletalViewer Walkthrough—Capturing Data with the NUI API (C++ and C#)
Uses the NUI API to access and render data from the Kinect sensor’s depth and video cameras as depth, video, and skeletal images on the screen. The managed SkeletalViewer sample uses Windows Presentation Foundation (WPF) to render captured images, and the native application uses DirectX and the Windows graphic display interface (GDI). - AudioCaptureRaw Walkthrough—Capturing the Raw Audio Stream (C++)
Uses the Windows Audio Session API (WASAPI) to capture the raw audio stream from the Kinect sensor’s four-element microphone array and write it to a .wav file. - MFAudioFilter Walkthrough—Capturing Streams with a Media Foundation Audio Filter (C++) Shows how to capture an audio stream from the Kinect sensor’s microphone array by using the KinectAudio DirectX Media Object (DMO) in filter mode in a Windows Media® Foundation topology.
- MicArrayEchoCancellation Walkthrough—Capturing Audio Streams with Acoustic Echo Cancellation and Beamforming (C++) Shows how to use the KinectAudio DMO as a DirectShow® source to access the Kinect sensor’s microphone array. This sample uses acoustic echo cancellation to record a high-quality audio stream and beamforming and source localization to determine the selected beam and the direction of the sound source.
- RecordAudio Walkthrough—Recording an Audio Stream and Monitoring Direction (C#) Demonstrates how to capture an audio stream from the Kinect sensor’s microphone array and how to monitor the selected beam and the direction of the sound source.
- Speech Walkthrough—Recognizing Voice Commands (C#)
Demonstrates how to use the Kinect sensor’s microphone array with Microsoft.Speech API to recognize voice commands.