Furfaro Abstract: Deep Learning Algorithms for Autonomous Guidance: Applications to Hypersonics and Planetary Landing Guidance

Autonomy is a critical component for the next generation of hypersonics vehicles. Indeed, the effective implementation of the closed-loop “Sense-Think-Act” requires a new generation of intelligent algorithms that can 1) adapt to elusive targets and 2) be robust against unknown environments.  However, the current generation of autonomous hypersonics systems require a set of rule-based systems causing a dramatic limitation in the overall performance. One of the major challenge is how to devise a GNC system that effectively guide a hypersonics vehicle to mission success with guaranteed performances in a highly uncertain and changing environment. Similarly, autonomous and unconstrained exploration of small and large bodies of the solar system requires the development of a new class of intelligent systems capable of integrating in real-time stream of sensor data and autonomously take optimal decisions, i.e. decide the best course of action.

Over the past few years, enabled by large data availability and advancements in computing hardware (e.g. GPUs), there have been an explosion of intelligent systems based on deep learning that enable adaptive and fast reasoning over data.  One can naturally ask the following: how can such techniques help the development of the next generation of robust and adaptive algorithms for both hypersonics and space guidance that can learn optimal actions during the course of specified flight missions? In this talk, I will address this problem by presenting a set of deep learning models recently developed by my research team for direct applications to hypersonics and space exploration. The methodologies include the use of Convolution Neural Networks (CNN) Recurrent Neural Networks (RNN) within the framework of deep reinforcement learning and meta-learning (or “learn-to-learn”). The proposed framework enables learning a closed-loop guidance policy by simulated experience. Such policy (e.g. bank angle, angle of attack as function of the current position and velocity) is parameterized via a deep network and its parameters (weights) learned by experience, i.e. letting the agent interact with the environment in an attempt to maximize a reward signal. Examples for both hypersonics re-entry and planetary landing are presented to demonstrate the performance of the proposed approach.