visionOS 2 HandMatcher HandVector Update to v2 and Add New FingerShape Function

9/1/2024

HandVector

After the release of Vision Pro, the elegant gesture interaction caused a lot of heat, but unfortunately Apple did not provide a 3D gesture recognition function, not even static. So I used cosine similarity algorithm to write my own HandVector framework to achieve this function: for example, using hands to make ✌️ gestures to trigger a certain effect, and also wrote an article introducing “I open-sourced a gesture matching framework, allowing you to debug visionOS hand tracking function in emulator!”

HandVector not only calculates the similarity between different static gestures, but also comes with a macOS utility class that allows you to use gesture tracking on visionOS emulator for basic gesture testing.

This HandVector 2.0 version is a big update, bringing better cosine similarity Cosine Similarity matching effect and finger shape parameter FingerShape function, more convenient for customization.

Note: HandVector 2.0 has significant API changes and is not compatible with older versions.

Cosine similarity calculation update

Perhaps you may wonder: how can the cosine similarity algorithm be improved? Indeed, the algorithm itself is difficult to improve, but I improved the reference coordinate system during calculation, which brought about a huge overall improvement.

Let’s first take a look at the biggest problem with the old version of the algorithm: all the vectors used in the calculation are based on the Anchor space of the hand (wrist), which will cause the direction difference of the finger joint during the gesture change process to be too large, resulting in errors in similarity calculation.

For example, in the above figure, there are two gestures, one with a half-bent finger and the other with a completely bent finger. Although they have a certain similarity, the direction of the fingertip vector is almost perpendicular to the coordinate origin, resulting in their cosine similarity being close to 0, which is obviously unreasonable.

So, in version 2.0, I improved this by using the matrix space of the parent node to calculate the vector direction, so that the similarity of the fingertip vector is greatly improved.

After the update, the fingertip postures of the two gestures in the above picture are highly similar. However, this brings up a new problem: coordinate conversion requires the calculation of the Inverse Matrix, will it have a performance impact? The answer is no, because ARKit provides this matrix, as shown in the figure below.

parentFromJointTransform Actually, it is the posture matrix of the current joint point relative to the parent joint, and anchorFromJointTransform is the posture matrix relative to the Anchor (wrist). We can directly save and use it without additional calculation and consumption.

Naming formula: A From B Transform = B Relative to A transform parentFromJointTransform = Joint Relative to Parent transform anchorFromJointTransform = Joint Relative to Anchor transform

Add FingerShape

The cosine similarity algorithm has high accuracy, but the parameters are difficult to understand and almost impossible to adjust. Therefore, I refer to Unity’s XRHands framework and add the FingerShape function.

It simplifies the shape of the finger into 5 parameters such as curl and separation, which are easy to understand, control, and adjust.

Finger root curl baseCurl : the curl of the finger root joint, the thumb is the IntermediateBase joint, the rest of the fingers are the Knuckle joint, range 0~ 1

Fingertip curl tipCurl : the curl of the upper joint of the finger, the thumb is the IntermediateTip joint, and the rest of the fingers are the average of the IntermediateBase and IntermediateTip joints, ranging from 0 to 1

Overall curl fullCurl : the average value of baseCurl and tipCurl, range 0~ 1

(With the thumb) pinch : calculate the distance from the tip of the thumb, range 0~ 1, the parameter of the thumb is nil

(Adjacent fingers on the outside) separation spread : only calculate the angle in the horizontal direction, range 0~ 1, the parameter of the little finger is nil

Regarding the differences between the three different curvatures, please refer to the following figure.

SimHand Update

The emulator debugging function has always been a highlight feature of HandVector , but after the 2.0 update changed the vector coordinate system, it brought huge trouble: Google’s MediaPipes implemented 3D gesture tracking, only the position of the joint point, no matrix information!!!

This means I need to use a lot of math “black magic” to create matrix information out of nothing! These complex codes took nearly a week to write and debug, and I was ready to give up several times in the middle…

// 令人头秃的“黑魔法”运算代码
private static func calculateJointTransform(jointDict: [HandSkeleton.JointName: Joint], rootTransform: simd_float4x4, isLeft: Bool) -> [HandSkeleton.JointName: simd_float4x4] {
    var worldTransforms: [HandSkeleton.JointName: simd_float4x4] = [:]

    // thumb transform need refer palm center
    let knukleCenter = (jointDict[.indexFingerKnuckle]!.position + jointDict[.middleFingerKnuckle]!.position + jointDict[.ringFingerKnuckle]!.position + jointDict[.littleFingerKnuckle]!.position) / 5.0
    let palmCenter = knukleCenter + jointDict[.wrist]!.position / 5.0
    let palmOffset = palmCenter + rootTransform.columns.1.xyz * (isLeft ? 0.035 : -0.035)

    // thumbIntermediateBase
    let thumbInterBaseX = normalize(jointDict[.thumbIntermediateTip]!.position - jointDict[.thumbIntermediateBase]!.position) * (isLeft ? 1 : -1)
    let thumbInterBaseY = normalize(palmOffset - jointDict[.thumbIntermediateBase]!.position) * (isLeft ? 1 : -1)
    let thumbInterBaseZ = cross(thumbInterBaseX, thumbInterBaseY)
    worldTransforms[.thumbIntermediateBase] = simd_float4x4(SIMD4(thumbInterBaseX, 0), SIMD4(thumbInterBaseY, 0), SIMD4(thumbInterBaseZ, 0), SIMD4(jointDict[.thumbIntermediateBase]!.position, 1))

        ......

    let rootZ = rootTransform.columns.2.xyz

    // indexFingerMetacarpal
    let indexMetaX = normalize(jointDict[.indexFingerKnuckle]!.position - jointDict[.indexFingerMetacarpal]!.position) * (isLeft ? 1 : -1)
    let indexMetaY = cross(rootZ, indexMetaX)
    let indexMetaZ = cross(indexMetaX, indexMetaY)
    worldTransforms[.indexFingerMetacarpal] = simd_float4x4(SIMD4(indexMetaX, 0), SIMD4(indexMetaY, 0), SIMD4(indexMetaZ, 0), SIMD4(jointDict[.indexFingerMetacarpal]!.position, 1))

        ......

}

Fortunately, with my excellent knowledge of geometry, I successfully reproduced the matrix from its position using code. Although it is not precise enough, it can barely support debugging. One of the major features of HandVector has been preserved.

Other

In addition to the above three major functions, I have also adjusted the project structure and 3D joint style, making the code structure clearer.

In terms of joint visualization, changing from a small sphere to a small cube makes it easier to display the differences between different orientations: the positive direction of the XYZ axis is RGB, and the negative direction is dark RGB.