
I have been experimenting with the first beta of the Microsoft Research SDK now for a bit and I am really caught between using it and OpenNI.
Frankly I think OpenNI is a better standard and a better way to go at this time. I am frankly happy that Microsoft is finally building in “non-commercial” support for Windows, but I have many concerns and frankly the community apps and OpenNI stuff that is built so far is superior to what Microsoft Research is showing.
Beta 1 a year late ??
Remember Microsoft Research isn’t the commercial support that Microsoft may be planning in the future. I am left with the feeling Microsoft is playing catch-up here and really doesn’t need to be if they would be smart and extend to include some things from the already available community standards. I hope the good stuff is built into commercial Windows so far because they aren’t showing this yet.
Fractured and it didn’t need to be..
Why ? When are they going to learn to embrace “not invented here” ? I see this as the major weakness of what is going on. Where Microsoft tries to create their own stuff versus embracing what their own user community is doing with it.
The community stuff built on the OpenNI standard for NUI is almost a year ahead from the stuff that Microsoft has been showing even with Academia. So why has it taken so long for Microsoft to release and what do I see as an advantage of using OpenNI based devices and support over Microsoft’s official academic research SDK for Windows ?
Let’s look at the advantages of the official SDK versus using the OpenNI SDK and where the advantages lie in EACH of them..
CAVEAT: The official SDK is still at Beta 1 so Microsoft could build in more support for things in another “beta” but usually “beta” means feature complete with Microsoft..
Notable products out there using OpenNI already..
BrekelKinect

http://www.brekel.com
BrekelKinect is an amazing motion capture app that in many cases can replace studio quality motion capture (literally millions of dollars worth of equipment) and interfaces directly with 3D standards for motion capture like BVH files that import and are editable in Autodesk’s MotionBuilder product. BrekelKinect currently supports 3DMax’s Biped and Second Life type BVH file generation. BrekelKinect’s option for saving positions (rotations are always saved) usually improves the quality as it allows joints to scale (which is what happens with the PrimeSense NITE middleware algorithm).
You probably couldn’t do the more than two users that this app supports today with the Microsoft Research SDK as it appears today. Commercials (like that one where Angels fall to earth- which is part CG) are already being done with BrekelKinect and Autodesk MotionBuilder. Why the two user limitation ?
Also checking out the KinectEmote product , I am left with the question of what Microsoft is really bringing to the party here with this new SDK that PrimeSense/OpenNI doesn’t already provide more robustly..

http://kinemote.net/
Unity 3D *REAL-TIME support with OpenNI
Another example of a Unity3D project with Kinect and the OpenNI drivers and standard.
Minority Report Style Interfaces with OpenNI
X-BOX MEDIA CENTER with Kinect and Voice Control (More Kinect E-Mote support)
Angry Birds with Kinect E-mote)
<Channel 9 SDK Presentation Fact Check>
Seeing BrekelKinect and KinectEmote apps in action and knowing about the HumanIK built into the OpenNI NITE middleware (from PrimeSense) support built into openNI, I highly think Microsoft’s folks claims on the Channel 9 presentation were a bit optimistic that they have a better “algorithm” built into their official SDK for motion tracking. Even if they do OpenNI still wins and here’s why..
From the Microsoft Documentation..

Kernel Mode Drivers vs Socket Servers Advantage: OpenNI
Microsoft Official API Block Diagram

Multi-User/Multi-Device and Finger Tracking Support
OpenNI’s approach is different than Microsoft. Microsoft’s SDK support is for only their brand of device ( just the Kinect) with only two users.
Quoting the official Microsoft Kinect SDK:
“The NUI Skeleton API provides information about the location of up to two players standing in front of the Kinect sensor array, with detailed position and orientation information.
The data is provided to application code as a set of points, called skeleton positions, that compose a skeleton, as shown in Figure 3. This skeleton represents a user’s current position and pose. Applications that use skeleton data must indicate this at NUI initialization and must enable skeleton tracking.”
Why OpenNI is Better:
More than TWO USERS and MORE Than One Device tracking a “Scene”
By contrast the Open NI standard supports more information in larger scenes and more than 2 users.
OpenNI: Multiple Devices from Multiple Vendors and More Scene complexity
Frankly it supports multiple devices even from different vendors, should something better than the Kinect or something to augment it shows up.

Brand Two: Asus
No it doesn’t plug into your X-Box, however it does work great with your PC..
Fact Check Continued
Frankly even if their claims that the tracking is better with their SDK they have limited it two two users (and humans at that) and a less complex world. Seems to me that’s going down the wrong path personally.
OpenNI supports multiple NUI devices of a diverse nature including non Microsoft devices and ones we haven’t thought of which overcome the depth and size and the richness of the environment and size of scene and more than two users being tracked. Giving you a more complex scene that can be analyzed in a more complex world. Which works better ? What is your use case scenario ? It seems Microsoft’s solution is only for the one device and a small non-complex world like a human playing a game...
..
Another Point of Contention HAND TRACKING
Frankly I don’t see anything anywhere about HAND TRACKING support in the Microsoft SDK right now, just skeletal tracking.. Microsoft why are you RE-INVENTING the wheel here why not just support OpenNI ? It’s all there. You guys worked with PrimeSense to make your Kinect sensor. Why not now? They were open with you. Don’t split the community on this. We all deserve a unified standard. We deserve not to have to be installing and uninstalling drivers..

It’s the right thing to do.. You will support bigger scenes more professional functionality with multiple devices. Support for cross-platform behaviors for gestures and more than just Windows.
Hopes Dashed..
I really hoped one of the hallmarks of the Microsoft Research SDK would be support for third party additions for gestures, non-human tracking etc. like OpenNI already supports and third party extensions.
This is important for instance if you were developing a security application for your home. Not just to see if anyone was there, but to recognize non biped animals etc.. Where is support for this in your SDK Microsoft Research ? This seems to be more of a “game centric” UI and it even slips out in the documentation with the use of the word “player” versus a professional API you’d expect with a science-based approach that Microsoft Research usually puts out. With OpenNI a “user” not a “player” is “recognized and logged into as a session” and can recognized if they walked back into a scene after leaving it.
OpenNI Middleware which works with *MULTIPLE VENDORS* already has built in:
From the OpenNI documentation found at http://www.openni.org/documentation:
- Gestures Alert Generator: Generates callbacks to the application when specific gestures are identified.
- Scene Analyzer: Analyzes a scene, including the separation of the foreground from the background, identification of figures in the scene, and detection of the floor plane. The Scene Analyzer’s main output is a labeled depth map, in which each pixel holds a label that states whether it represents a figure, or it is part of the background.
- Hand Point Generator: Supports hand detection and tracking. This node generates callbacks that provide alerts when a hand point (meaning, a palm) is detected, and when a hand point currently being tracked, changes its location.
- User Generator: Generates a representation of a (full or partial) body in the 3D scene.
Pan User Support:
OpenNI supports multiple users and very large scenes over multiple device (even non Kinect ones) over a socket server interface (not kernel mode drivers).. This right away makes OpenNI scenes more rich in gathering information about the world around it. Why scenes can include information from multiple devices across one environment making the data more rich. The middleware also supports third party add-ons making it WAY more extensible way to work with Kinect and NaturalUI, versus kernel mode drivers with devices that are enumerated. Really OpenNI’s socket server approach is a better solution to me from a coding and architectural perspective..
I am left with the question of WHY bother with this SDK at all ??
The speech support is easily duplicated elsewhere and frankly it’s another “me too effort” when will Microsoft actually learn to embrace community stuff with this. The only time I will probably bother to start supporting this as a developer is when Microsoft does a “commercial SDK” not this Microsoft Research thing.. It’s really time to be open and embrace the community that’s obviously ahead with an open standard that takes advantage of more scenarios.
NaturalUI or NUI is *TOO* important to be closed about. after all our world is open and we live in an open world. I hope the computer’s vision for the world is not as narrowly focused as the Microsoft SDK is at this point.. But it is first steps in a new (NUI) world. So I can’t be too harsh here, I was just expecting something more..
Bottom line there is plenty of community support for things you want to do without this Microsoft SDK.. Why fracture this community. Isn’t it time you folks be more open and bring this *TOGETHER* . I see no advantages (besides easy driver install) to using this at all and there are things SERIOUSLY missing here in the Microsoft Research SDK (for a beta 1 in my humble opinion)..
Why doesn’t Microsoft embrace it’s own community more. Just because it’s their product doesn’t make that answer good enough anymore. especially *IF* they want to be seen ahead of Google and others who seem to already embrace this philosophy. “Invented here” isn’t always necessary especially with such an opportunity here with NaturalUI.. Perceptions would be much different here by NUI enthusiasts and the whole community had management took a different less old-school approach with this.
After all they own the patents. All this can do is open Windows to markets they don’t already have instead of just the .net programmer “wolfpack”..Microsoft needs to be a pragmatic leader and open this architecture instead of being left behind as a proprietary NUI solution. Subscribing to a better standard that is already more capable counts here instead of just saying “we open sourced it”..
More non-Microsoft camp folks would see Microsoft as a leader by doing this instead of being the company that some market folks have the perception of being “behind” Apple and Google. Get with the program Microsoft support OpenNI with your own implementation just like you do with your web browser and HTML 5.. This is too important to the future to try to be “exclusive”..
If you want to understand a bit more of what you get with the OpenNI middle where check out:
Also I suggest you check out Vangos’ OpenNI library and Skeletal Tracking Example using OpenNI:
http://www.studentguru.gr/blogs/vangos/archive/2011/03/15/kinect-and-wpf-complete-body-tracking.aspx
It supports the following body parts:
- Head
- Neck
- LeftShoulder
- LeftElbow
- LeftHand
- RightShoulder
- RightElbow
- RightHand
- Torso
- LeftKnee
- LeftHip
- LeftFoot
- RightKnee
- RightHip
- RightFoot
</Channel 9 Fact Check>
<CONCLUSIONS>
<BRING COMMUNITY TOGETHER/>
<EXTEND, EMBRACE, GIVE US A BETTER NON DEVICE SPECIFIC SOLUTION/>
</CONCLUSIONS>
We live in a complex world, shouldn’t Kinect officially support that complexity too ? It’s more than a game controller and should be looked on from that perspective..