In a new study, Apple taught its artificial intelligence model to recognize hand gestures that were not part of the original training dataset. Here are the details.
What is EMG?
Apple published a new study titled "EMBridge: Cross-Modal Representation Learning to Enhance Hand Gesture Generalization from EMG Signals" on its Machine Learning Research blog. This study will be presented at the ICLR 2026 Conference in April.
In the study, researchers explain how an AI model recognizes hand gestures even though specific hand movements are not part of the original dataset.
To achieve this, they developed EMBridge, a cross-modal representation learning framework that bridges the modality gap between EMG and pose.
EMG, or Electromyography, measures the electrical activity produced by muscles during contraction. Its practical applications range from medical diagnosis to physical therapy and prosthetic limb control.
Recently (this is certainly not a new field), it has been more widely researched in wearable devices and AR/VR systems.
For instance, Meta's Ray-Ban Display glasses utilize EMG technology through a device worn on the wrist, referred to as a Neural Band by Meta. This device is described as "allowing you to navigate the features of the Meta Ray-Ban Display by interpreting your muscle signals."
In Apple's study, the EMG signals used for training were not captured by a wrist device. Instead, researchers used two datasets:
- emg2pose: "A large-scale open-source EMG dataset containing 370 hours of sEMG and synchronized hand pose data among 193 participant users. It includes various discrete and continuous hand movements across 29 different behavior groups, such as making a fist or counting to five. Hand pose labels were generated using a high-resolution motion capture system. The full dataset contains over 80 million pose labels and is comparable in scale to the largest computer vision equivalents. Each user performed four recording sessions for each hand movement category, each with a different EMG band placement. Each session lasted 45–120 seconds, and users performed 3–5 similar movements or freeform gestures. We use non-overlapping 2-second windows as input sequences. EMG is sampled normalized, band-pass filtered (2–250 Hz), and notch filtered at 60 Hz."
- NinaPro DB2: "For a more comprehensive EMBridge evaluation, we used two NinaPro EMG datasets. Specifically, NinaPro DB2 contains matched EMG-pose data from 40 participants. It includes 49 hand movements (including basic finger flexions, functional grasps, and combined movements) performed by 40 healthy participants. EMG signals are recorded with 12 electrodes placed on the forearm at a sampling rate of 2 kHz, and hand kinematic data is captured using a data glove. For future hand movement classification, we use NinaPro DB7, which contains data from 20 non-amputee participants collected using the same EMG device and movement set as DB2."
Considering all this, it is easy to see how Apple's EMBridge could pave the way for a future Apple Watch model (or other wearables) to control Apple Vision Pro, Macs, iPhones, and other wearable devices, especially the rumored upcoming smart glasses.
In practice, the possibilities could be significant, ranging from new interaction methods to accessibility improvements.
Of course, the study itself does not specify a particular Apple product or application, but it does state:
"One potential practical application of our framework is wearable Human-Computer Interaction. In scenarios like VR/AR and prosthetic control applications, a device worn on the wrist needs to continuously extract hand movements from EMG."
What is EMBridge?
EMBridge was the way for researchers to bridge the gap between real EMG muscle signals and structured hand pose data.
The model trained using the cross-modal framework initially received pre-training on EMG and hand pose data separately.
Then, researchers enabled the EMG encoder to learn from the pose encoder by aligning the two representations. This allowed EMBridge to learn to recognize hand movement patterns from EMG signals.
After this process was completed, the system was trained by hiding some parts of the pose data and asking the model to reconstruct them using only the information derived from EMG signals.
The result was explained by the researchers as follows:
"To our knowledge, EMBridge is the first cross-modal representation learning framework that performs zero-shot hand gesture classification from wearable EMG signals and shows potential for real-world hand gesture recognition in wearable devices."
To reduce training errors caused by the negative evaluation of similar movements, researchers taught the model to recognize that poses represent similar hand configurations. This allowed the model to create soft targets for these poses and structure them rather than evaluate them completely independently.
This helped structure the model's representation space and increased its ability to generalize to movements it had never seen before.
The authors evaluated EMBridge against two benchmark tests, emg2pose and NinaPro, and found that it consistently outperformed existing methods, particularly in zero-shot (or unseen) hand gesture recognition. Importantly, it achieved this with only 40% of the training data.
A significant limitation of the article is that the model relies on datasets containing both EMG signals and synchronized hand pose data. This means that their training still depends on specialized datasets that can be difficult to collect.
Nevertheless, the study is intriguing, especially during a time when EMG-based device control is on the rise.
For full technical details regarding EMBridge, including components like Q-Former, MPRL, and CASCLe, follow this link.
Products Worth Checking Out on Amazon
- David Pogue’s 'Apple: The First 50 Years' book
- Logitech MX Master 4
- AirPods Pro 3
- AirTag (2nd Generation) – 4 Pack
- Apple Watch Series 11
- Wireless CarPlay Adapter
Comments
(5 Comments)