Self-Supervised Learning Improves Accuracy and Data Efficiency for IMU-Based Ground Reaction Force Estimation
Self-Supervised Learning Improves Accuracy and Data Efficiency for IMU-Based Ground Reaction Force Estimation
Tan, T.; Shull, P.; Hicks, J.; Uhlrich, S.; Chaudhari, A.
AbstractObjective: Recent deep learning techniques hold promise to enable IMU-driven gait assessment; however, they require large extents of marker-based motion capture and ground reaction force (GRF) data to serve as labels for supervised model training. We thus propose a self-supervised learning (SSL) framework to leverage large IMU datasets for pre-training deep learning models, which can improve the accuracy and data efficiency of IMU-based vertical GRF (vGRF) estimation. Methods: To pre-train the models, we performed SSL by masking a random portion of the input IMU data and training a transformer model to reconstruct the masked portion. We systematically compared a series of masking ratios across three pre-training datasets that included real IMU data, synthetic IMU data, or a combination of the two. Finally, we built models that used pre-training and labeled data to estimate vGRF during three prediction tasks: overground walking, treadmill walking, and drop landing. Results: When using the same amount of labeled data, SSL pre-training improved the accuracy of vGRF estimation during walking compared to baseline models trained by conventional supervised learning. The correlation coefficients for vGRF estimation improved from 0.92 to 0.95 for overground waking and from 0.94 to 0.97 for treadmill walking. Also, using 1-10 of walking data to fine-tune pre-trained models yielded comparable accuracy to the baseline model that was trained on 100 of walking data. Conclusion: The proposed SSL framework leveraged large real and synthetic IMU datasets to increase the accuracy and data efficiency of deep-learning-based vGRF estimation, reducing the need of labels. Significance: This work may unlock broader use cases of IMU-driven assessment where only small labeled datasets are available.