Computer Vision Assistive Technologies
for The Visually Impaired

Overview:

Our long-term goal is to reverse the increased morbidity and mortality resulting from low vision and blindness by providing wearable technology solutions with pan-applicability to visual deficits from ALL etiologies, leading to true functional independence. Our proposed solution, the Visually Impaired “Smart” Service System for Spatial Intel & On-board Navigation VIS4ION, provides real-time situational and obstacle awareness in one’s immediate environment, allowing the individual to travel more safely in three-dimensional (3D) space, paying particular attention to low-body, mid-body and highbody/head hazards. The device remedies the cane’s shortcomings, and, in addition, further augments the ability of visually impaired persons to both maintain balance and to localize objects in their environment.

The core of this technology is based on 4 components (see Figure below): (1) a wearable vest with several distinct range and image sensors embedded. These sensors extract pertinent information about obstacles and the environment, which are conveyed to (2) a haptic interface (belt) that communicates this spatial information to the end user in real-time via an intuitive, ergonomic and personalized vibrotactile re-display along the torso. (3) a smart phone serves as a connectivity gateway and coordinates the core components through WiFi, bluetooth, and/or 4G LTE, (4) a headset that contains both binaural bone conduction speakers (leaving the ear canal open for ambient sounds) and a microphone for oral communication-based voice recognition during use of a virtual personal assistant (VPA) [under development].

RL

Research Plan

Three tasks are involved in our consideration, they are: 1) Optimize the design of the platform (hardware) with human-based experimental testing; 2) Enhance the 3D environmental awareness (software); 3) Perform human-centered efficacy studies/testing in ecologically valid 3D environments. All three tasks will be elaborated.

Task 1: Optimize the design of the platform (hardware) with human-based experimental testing

The VIS4ION platform consists of four main elements or element sets: sensors, micro controller, human-machine interfaces, and user (as illustrated in Figure below). The first (sensory) element set includes a collection of various sensors (infrared sensor, stereo camera sensor and ultrasonic sensor as shown in the figure). The sensors are ergonomically equipped on the scaffold to consistently and reliably acquire scene data/information from the immediate (surrounding) environment. Specifically, the stereo cameras are used to capture the 3D scene information from the users’ viewpoint. The accuracy of the estimated depth information, which is indispensable for 3D scene reconstruction, can be further enhanced by complementing the data acquired from the infrared and ultrasonic sensors in a multi-modal data fusion process. The second (processing) element, a micro-controller unit, is responsible for coordinating the communication flows among different functional system units. Along with a portable GPU unit, the platform is able to complete a list of 3D scene understanding tasks including 3D scene reconstruction, scene parsing, object detection as well as 3D active scene exploration. Nvidia Pascal, as shown in the Figure below, is chosen as the GPU unit for our VIS4ION platform for its high computing performance for deep learning and computer vision algorithms, power efficiency and portability. The third (feedback) element set, the haptic and audio interfaces, displays processed and filtered environmental information to the end user in real-time via an intuitive, ergonomic and personalized vibrotactile re-display along the torso (a belt-based system that was retro-fitted into a lumbar back support) and audio feedback that is delivered through bone-conduction transducers in a paired headset. The end user, as the fourth (user) element, receives live scene information and dynamic alerts in an ongoing fashion upon full-system intiation (core), and responds accordingly. In addition, the end user is enabled to actively explore the immediate surrounding environment to satisfy their needs and to safeguard their personal welfare. The platform will be designed in an on-demand fashion to a la carte feature sets based on the end users’ needs and intentions, operating above and beyond the core system. Additional feature sets will be selected based on voice (speech) to text through the headset mic and/or predefined human actions such as pre-specified hand movements.

RL

Task 2: Enhance the 3D environmental awareness (software)

There is an inherent trade-off between the speed of processing scene information and the granularity with which elements of the scene can be identified, categorized, and understood. However, when the same geographic locations are revisited, it is possible to re-use stable elements of the previously acquired scene information to both increase the speed of building a new (updated) map of the revisited environment, and to enhance the precision of current identification by allocating relatively more processing resources to elements of the scene that are new. Therefore, to enhance the localization, mapping, and environmental awareness that can be provided via the VIS4ION platform, we propose a new paradigm for dynamic environment perception and awareness. The proposed project leverages state-of-the-art techniques from multiple research domains including computational geometry, computer vision, deep learning, and image processing to cope with the challenges posed by complex and fast changing surroundings. The key development efforts will include the development of novel real-time 3D dynamic scene reconstruction for indoor and outdoor navigation, design of innovative deep-learning- based algorithms for 3D scene parsing through object detection and semantic labeling, as well as 3D user-scene interaction interface. The figure belows demonstrates the pipeline.

3d pipeline

We also highlight User-Scene Interaction for helping building the user experience. Communicating between the user and the scene. The user receives the live scene information and dynamic alerts constantly from the environment, and responds accordingly. Besides, the end user is enabled to explore the 3D scene by sending their instructional operations to the system. The results of 3D scene reconstruction and 3D scene parsing are applied in the user interaction to enhance the scene perception of the users. The proposed platform is designed to operate on two different modes according to the user’s needs: default mode and user-selective.

Default Mode: For the default mode, considering the hazards and obstacles in the surrounding environment, which will result in safety and health issues, the platform will automatically send alerts to the visually impaired via the haptical and/ or audio interface. For example, the drop-off detection is crucial for the safety of vision impaired travelers, since the missing drop-offs like stairways may easily result in fall or collision. Therefore, this proposed project plans to conduct a research on the drop-off detection and alerts.

User-selective Mode: For the user-selective mode, the end users are enabled to make active exploration through sending their instructional operations, engaging a specialized feature set. For example, we propose a ‘Point-to-Tell’ unit, allowing users to explore the information about the region they are pointing to. Some ‘Point-to-Tell’ experiement results are shown below.

point2tell

Task 3: Perform human-centered efficacy studies/testing in ecologically valid 3D environments

A critical element of our plan to create an intuitive, human-user-centered platform is the application of psychophysical testing at each stage of development as the platform is refined. This will also be true of Task 3, which will involve real-world obstacle course testing. This testing will be performed once in full multi-sensory mode using the improvements implemented in Task 2, and once in reduced-sensing mode with stereo cameras and/or LIDAR systems inactivated. Because only simulation testing will be used directly to improve the design of the platform during Task 2, an additional test of performance in a separate, real-world navigation protocol can serve as an independent measure of overall improvements in the system as well as a road map for future avenues to enhance performance. In an uncluttered, indoor environment it is nearly always possible to navigate successfully, assuming one can move at a slow enough pace. Therefore, Task 3 will examine speeded navigation with both blindfolded-sighted, and blind subjects in a real world, combined obstacle avoidance / navigation task. Some of the experiments settings are shown below.

experiment setting