In this presentation an Actor-Critic algorithm that makes use of extreme learning machines, instead of deeper neural networks, is used to generate a control policy that ensures a hypersonic reentry vehicle achieves a target state while avoiding path constraints. The proposed algorithm is composed of three major blocks: (1) a generator of sample trajectories based on some initial control policy and randomized initial conditions, (2) an actor that maps control actions from the vehicle’s current state via a single-layer feedforward neural network (SLFNN), and (3) a citric that predicts the value of each control action via an extreme learning machine (ELM). At each iteration of the algorithm the actor makes use of sample trajectories to construct a control policy based on the value of assessment of the critic. The weights of the actor’s SLFNN define the control policy and are updated by stochastic gradient ascent during each iteration of the algorithm. As the algorithm is iterated a control policy that can guide the reentry vehicle towards the target state while avoiding certain path constraints in a closed-loop fashion is achieved.