OPTIMIZATION OF THE LAUNCHER ASCENT TRAJECTORY LEADING TO THE GLOBAL OPTIMUM WITHOUT ANY INITIALIZATION: THE BREAKTHROUGH OF THE HAMILTONJACOBIBELLMAN APPROACH

The resolution of the launcher ascent trajectory problem by the so-called HamiltonJacobiBellman (HJB) approach, relying on the Dynamic Programming Principle, has been investigated. The method gives a global optimum and does not need any initialization procedure. Despite these advantages, this approach is seldom used because of the di©culties of computing the solution of the HJB equation for high dimension problems. The present study shows that an e©cient resolution is found. An illustration of the method is proposed on a heavy class launcher, for a typical GEO (Geostationary Earth Orbit) mission. This study has been performed in the frame of the Centre National d£Etudes Spatiales (CNES) Launchers Research & Technology Program.


INTRODUCTION
A brief explanation of the HJB approach is proposed, associated assets justifying the motivation to use this method to optimize a launcher trajectory are described. An implementation and associated challenges are then presented. In the end, the HJB algorithm is operated to optimize the trajectory of a heavy class launcher, for a typical GEO mission; results are analyzed with regard to a solution obtained through a shooting method.
For any acronym in the following text, please refer to ACRONYMS at the beginning of the paper. For any mathematical notation in the following text, please refer to NOMENCLATURE at the beginning of the paper.

Mapping of Optimization Methods and Associated Issues
Most of optimization methods rely on necessary condition of optimality. In the ¦eld of control optimization, let us quote the Pontryagin Maximum Prin- Figure 1 Illustration of classes of optimization approaches ciple (PMP) [1], which relies on the use of relationship between real and dual states, and Direct Approach [2], which can be declined in various discretization strategies of the control and of the state. Approximate approaches can also be considered to simplify the algorithm implementation by looking for the optimal solution in a restrained control shape family but leading to a suboptimal solution (Fig. 1). Algorithms derived from exact optimization approaches previously mentioned can be used to ¦nd an optimal solution but it should be emphasized that there is no guaranty that this solution is the global optimal one because these approaches deal only with necessary conditions of optimality.
Besides, most of these algorithms are typically operated through iterative processes; as a consequence, it raises the question of initialization and convergence. Finding a good initial guess (in the sense that it would lead to a solution of good quality at the end of the iteration process) can be quite tricky, in particular, when dealing with duality and when the area of convergence is restricted. E¨orts have also to be put on the iterative process, so as to get an optimal solution in a reasonable time. A detailed presentation of optimization algorithm families can be found in [3].
With regard to local methods, the HJB approach has two major assets. First, the background theory assures to obtain the global solution when it exists. Second, the implementation of the HJB approach does not require any iterative process, freeing the engineer from tricky tasks that are initialization and convergence. These great assets justify the motivation for the e¨orts made jointly by CNES and Ecole Nationale Sup‚ erieure des Techniques Avanc‚ ees on applying HJB approach to optimize launcher trajectories.

Elements Related to the HamiltonJacobiBellman Approach: Discrete Problems
Let us consider the following problem: look for the path starting at point a and allowing reaching point d in minimum time (Fig. 2).
The dynamic is such that at any time, the time of arrival depends only on the current state and on the control we choose; in particular, it is considered that the history of dynamic state prior to the current state has no explicit in §uence on the future state. In this frame, the core idea at the origin of Dynamic Programming [4] may be expressed in this term: if the optimal trajectory from point a to point d goes through point b, the portion of this trajectory linking point b to point d is also the quickest path between these latter points. If it was not, there would exist a quicker trajectory between points a and d. Now considering the quickest trajectory between points b and d, this smaller optimization problem can be solved if we know that this trajectory goes through point c, and so on.
This simple-looking idea is at the origin of a powerful approach that can be used to solve an optimal control problem: starting from the target, we look for the points being reachable in a given time dt. The locus of this point may be understood as a level curve which is memorized. Starting from this curve, the location reachable in a given time dt is determined; this new level curve represents the locus of points reachable in 2dt and is memorized. This computation is running until the farthest level curve crosses point a. Then, this propagation phase, which is time reverse propagated, stops. Now, starting from point a, let us use the level curves which have been successively memorized during the propagation phase to determine the quickest path, the time §owing in direct sense from 0 to dt, then from dt to 2dt, and so on until reaching point d. This is the reconstruction phase. Let us notice that not only the best path starting from point a can be reconstructed, but all the paths starting from any point included inside the level curve crossing point a.

Elements Related to the HamiltonJacobiBellman Approach: Continuous Problems
As previously described, the Dynamic Programming approach proposes to solve an optimal control problem by considering it as a part of a more global family of optimal control problems: to ¦nd the optimal path from point a to point d, one may ¦nd the optimal path from point b to point d, and so on. The HJB equation extends this approach, expressed in discrete domain, to continuous optimal control problem. As explained hereafter, solving HJB equation leads to a ¤value function¥ that characterizes the optimal control problem family mentioned before and that can be determined as the solution of a ¦rst order nonlinear Partial Di¨erential Equation which dimension is related to the number of variables involved in the problem. Let us consider the following problem: where f : R d × A → R d is the dynamics; X 0 is the set of initial states; C i ⊂ R d is the target; t f is the combustion duration; y ux is the trajectory satisfying f ; t 1 is the beginning of the exo-atmospheric phase; and U is the set of admissible controls.
Let us recall some theoretical results concerning the HJB approach for problem (1). In this section, we assume that f satis¦es some classical assumptions: f is a continuous function and for every x ∈ R d , the set f (x, A) is closed and convex. There exists c 0 ≥ 0 so that sup a∈A | f (ξ, a) | ≤ c 0 (1 + |ξ|). Moreover, for every R > 0, there exists L R > 0, such that: Consider the minimal time function T which associates to any point x ∈ R d the minimal time needed to reach the target with an admissible trajectory y ux solution of (1) and satisfying y ux (θ) ∈ K (the set of admissible trajectories): Many works have been devoted to the regularity of the minimum time function T . When K ≡ R d and under some local metric properties around the target, the function T is the unique continuous viscosity solution of an HJB equation.
Here, without assuming any controllability assumption at the boundary of the target and neither at the boundary of K, the function T may be discontinuous. Indeed, if, for x ∈ R d , no trajectory y ux reaches the target C i or if any trajectory leaves K before reaching the target, we set T (x) = +∞. Nevertheless, it can be shown that T is low semicontinuous and its epigraph can be characterized by a Lipshitz function ϑ.
To do so, let us ¦rst consider a Lipschitz continuous function ϑ 0 : In particular, one has: Consider the value function u associated to the Mayer problem with ¦nal cost ϑ 0 : The capture basin is characterized by However, function u is a value function of a state constrained problem and we are still faced with the problem of characterizing this value function if no controllability assumption is made. To overcome this di©culty, let us consider another Lipschitz continuous function g: Then, let us consider the control problem: Problem (6) has no ¤explicit¥ state constraint. In fact, in this new setting, the term max θ∈[0,t] g(y ux (θ)) plays the role of a penalization that a trajectory y ux would pay if it violates the state constraints. In Theorem 3.2, it will be shown that the advantage of considering (6) is that ϑ can now be characterized as the unique continuous solution of an HJB equation. Let u and ϑ be the value functions de¦ned, respectively, by (4) and (6). Then, for every t ≥ 0, one has the following: (i) the capture basin is given by: exists an admissible trajectory y ux that never touches the boundary ∂K.
The use of a level-set approach is a standard way to determine the minimal time function of unconstrained control problems; we generalize it to the case when the time control problem is in the presence of state constraints. Our formulation also allows obtaining the capture basins. As mentioned before, the function ϑ can be characterized as the unique solution of a HamiltonJacobi equation. More precisely, let us consider the Hamiltonian: Theorem 3.2: assume (2) and that ϑ 0 and g are Lipschitz continuous. Then, ϑ is the unique continuous viscosity solution of the HJB variational inequality (obstacle problem): Here, g is the ¤¦ctitious cost¥ that a trajectory would pay if it leaves K; it comes from the presence of the sup-norm max θ∈[0,t] g(y ux (θ)) in the cost function which de¦nes ϑ (see (6)). The HJB equations (7) are set in the whole space R + × R d , with d = 6; nevertheless, in order to perform computations, a ¦nite domain is used on which the HJB equations (7) are discretized. For this, a uniform space grid is chosen; the time step is chosen constant for simplicity. Advanced numerical technics are then applied to e©ciently evaluate the approximation of the function ϑ which corresponds to the propagation phase previously described.
Once the function ϑ is computed everywhere, the minimal time function is computed as This corresponds to the reconstruction phase: it is now possible to get the optimal feedback control law and the corresponding optimal trajectory, typically by using a classical second-order RungeKutta scheme.

Challenges Related to the Implementation of the HamiltonJacobiBellman Approach
Main issues related to HJB approach are due to the strong dependence of computation e¨orts (in terms of memory size and computation time) to the number of dimensions of the optimization problem to solve: these issues are due to the fact that the HJB equation is a partial di¨erential equation which has to be solved in terms of state and time which all are discretized. Besides, in the frame of the HJB approach, parameters behave like additional states (associated to steady dynamics), which increase the number of dimensions to consider. When dimension of the problem increases (typically, above dimension 3), numerical computation becomes very challenging. This is what Robert Bellman describes as ¤the curse of dimensionality;¥ for a typical three-dimensional (3D) trajectory such as the one presented here after, 12 dimensions have to be considered (see section 4); considering a very rude discretization of 10 points in each dimension (decreasing the computation accuracy), we still get 10e 12 points to handle. In the frame of launcher trajectory optimizations, e¨orts have been pushed on dealing with high-dimension problems.

APPLICATION OF THE HAMILTONJACOBIBELLMAN APPROACH TO LAUNCHER TRAJECTORY OPTIMIZATION
Having in mind the issues previously mentioned, a formulation of the launcher trajectory optimization has been proposed in order to replace this high-dimension problem by a set of smaller ones, without losing the global optimality. The principle is to consider a cost function which is separable in time or, in other words, a cost function which value depends at each time only on the current state (and, naturally, on the control) and not on the history of previous states. In such a frame, if the problem to optimize relies on control and parameters occurring at di¨erent phases, it is possible to cut the global problem in smaller ones only ruled by current ¤active¥ control and/or parameters. Each subproblem can be solved sequentially, while the global optimality is maintained thanks to the combining e¨ort of memorization operated at the end of each subpropagation phase and to the matching operated between the subproblems during the global reconstruction phase.

Presentation of the Reference Problem
As a consequence, a formulation of the trajectory optimization problem suitable for HJB computations implies ¦rstly a precise characterization of the §ight sequential and of the variables to optimize (control, parameters). A mission toward GEO with an ¤Ariane5-like¥ launcher is considered. The choice of this mission is justi¦ed by satellite market considerations and also because it includes several elements which are challenging for the HJB approach implementation: the state domain is large (roughly 36 000 km around the Earth, required launcher increase of speed of more than 10 000 m/s), the mission duration is long (more than 5 h), and the number of parameters is also important as detailed hereafter.
The objective is to optimize the launcher trajectory so as to maximize the payload mass injected on the GEO. The con¦guration of the launcher relies on two solid propulsion boosters ¦red on ground burning simultaneously, a cryogenic lower stage ignited on ground and still operating after booster jettisoning and a cryogenic upper stage. The engine of the upper stage can be ignited several times in §ight; the total combustion duration is spread along among the di¨erent burns. In addition, the perigee altitude of the transfer orbit reached at the end of the ¦rst upper stage boost should be at least 180 km. For all the engines, the pro¦le of thrust magnitude is preset on ground; the trajectory is commanded through the orientation of thrust.
The launcher is operated from French Guyana; the §ight starts with a vertical takeo¨, goes on with a pitch over maneuver at a constant angle rate which intensity can be optimized, and is followed by a atmospheric gravity turn phase during which the launcher thrust is steered according to a null aerodynamic angle of attack path, so as to limit aerodynamic loads on launcher structures. The azimuth of launch characterizing roughly the geographical direction of the plane in which occurs the atmospheric §ight has also to be optimized.
The atmospheric phase ends with the booster jettisoning; above begins the exo-atmospheric phase during which the thrust orientation is fully optimized, under constraints. The upper stage will inject the satellite on the GEO thanks to two boosts; respective combustion durations and associated ballistic phase length have to be optimized.

Formulation of the HamiltonJacobiBellman Approach
From the above description of the launcher §ight sequential, we can remark that the atmospheric §ight is totally controlled by the initial parameters that are the pitch over rate and the launch azimuth, the orientation of the thrust being then constrained by the gravity turn phase. Naturally, other parameters such as the ballistic phase duration also have an e¨ect on the cost function but do not act explicitly during the atmospheric phase (recall that the combined contributions of these parameters will be taken into account during the reconstruction phase, assuring the global optimality of the HJB solution).
For the atmospheric §ight having low dimensions in terms of parameters, which totally determine the control, an e©cient way to reduce computation time is to manage atmospheric and exo-atmospheric separately. Taking also into account the fact that the atmospheric §ight is quite short, this part of the §ight can be handled by computing and storing in memory a set of atmospheric trajectories associated to a range of set of values for tilting angle rates and launch azimuths.
Exo-atmospheric §ight can also be subdivided into di¨erent phases, considering that the lower cryogenic stage exo-atmospheric §ight phase is explicitly controlled only by the orientation of the thrust. The same consideration can be made during the two propelled phases of the upper stage, while the ballistic phase is controlled by its duration, no engine thrust being available by de¦nition.
The management of di¨erent successive exo-atmospheric subproblems is handled by starting from the GEO, which is perfectly known, and then performing a time-reverse propagation.
Additionally, it is assumed that the second upper stage boost is an impulsive one; as it can be seen during numerical computation, this hypothesis is quite weak: considering the engine mass §ow rate and the amount of propellant determined thanks to HJB, the second boost lasts less than a minute, which is quite negligible with regards to the total upper stage combustion time and the duration of the ballistic phase.
Making this hypothesis allows to nicely simplify the management of di¨erent successive subproblems: the ¦nal boost being impulsive, the GTO necessarily intersects the GEO; consequently, starting from the ¦nal GEO, several sets of kinematic conditions corresponding to di¨erent orbit intersection con¦gurations are considered. To each of them corresponds an amount of propellant required for getting from the GTO to the GEO, this amount being determined thanks to space mechanics equations.
Then, the ¦rst HJB problem is introduced during which all the sets of kinematic conditions are time reverse propagated, the only control of this problem being the ballistic phase duration. The propagation from a discretize state to another is performed by computing the function ϑ for the nodes in the neighborhood of the current state, the values at the di¨erent nodes being then stored in memory. At the end of the propagation, the value of the function ϑ is known everywhere on the grid; that corresponds to get the level map drawn in Fig. 2. The second HJB problem is then considered from the end of the ¦rst upper stage boost to the beginning of the exo-atmospheric phase. In this subproblem, controls are successively the thrust orientation of the upper and lower stage. The dynamic states at the interfaces of all the subproblems are handled thanks to multidimensional interpolations minimizing the loss of information occurring at the encounter of the di¨erent dynamic state ¤wave fronts¥ generated during the propagation phases.
The optimal trajectory and associated control are ¦nally determined by performing the reconstruction among the trajectories resulting from the matching of di¨erent subproblems. The state being discretized, this reconstruction corresponds to follow the nodes which have the lowest value for the function ϑ.
From the above description of the launcher §ight sequential, we can remark that the payload mass is not only the criterion to maximize but it intervenes also as a parameter which in §uences the dynamics of all the §ight phases. In order to reduce the dimension of all the subproblems, the above process is run considering the payload mass as a constant. The criterion to be minimized in the HJB problem is the combustion duration which corresponds to the pro- Figure 3 Implementation of the HJB approach pellant consumption; this formulation allows expressing the launcher trajectory optimization problem as a minimum time problem which matches the notion of reachability inherent to the HJB approach. At the end of the reconstruction, if the optimal trajectory does not use all the available propellant, it means that the launcher could bring a heavier payload: the above process is run another time with an increased value of the payload mass. The process goes on until payload mass variations tend to be negligible. This scheme preserves the properties of the HJB approach assuring to reach the global optimal solution (Fig. 3).

NUMERICAL RESULTS
The launcher trajectory optimization toward GEO is solved by applying the HJB described previously. The characteristics of the launcher are given in Table 1.
Let us precise some elements about the §ight dynamics considered: the launcher is considered to respond instantaneously to the control (thrust orientation); in particular, inertia dynamics around the center of gravity is considered to be negligible, the launcher is a point on which are concentrated the weight, thrust, and aerodynamics forces. The trajectory is considered in 3D space: the state includes three positions, three velocities, and scalar parameters: pitch over rate, launch azimuth, ballistic phase duration, allocation of propellant consumption among the two upper stage boosts, and payload mass. This leads to 11 dimensions to which the time (which is also discretized) should be added.
The trajectory is optimized with two independent algorithms: one relying on a shooting method, the other corresponding to the HJB implementation strategy previously described. The trajectory optimized by the shooting method is named ¤referenced trajectory:¥ the associated numerical code is validated and representative of CNES trajectory/performance studies is performed in standard launcher engineering analysis.
Before optimizing, a trajectory is simulated using the §ight dynamics implemented in HJB algorithm, but with the control associated to the reference  Figure 4 Comparison of reference (1) and simulated (2) trajectories trajectory. This trajectory is compared to the reference trajectory (please refer to section NOMENCLATURE for the de¦nition of the plotted parameters) (Fig. 4). The two trajectories are quite close, which validate the §ight dynamics implemented in the HJB algorithm. The small gaps will be investigated in future activities. Now, the ¦rst HJB optimization is performed in order to minimize the upper stage propellant consumption, with payload mass ¦xed to the value maximized with the reference trajectory Table 3 Grid sizes considered and targeting the GTO of the reference trajectory. The ¦rst optimization is performed (hereafter, named ¤opt1¥) with launch azimuth and pitch over rate coming from the reference trajectory and then another optimization with these parameters optimized (hereafter, named ¤opt2¥) ( Table 2).
The comparison of the reference and opt1 trajectories shows that HJB algorithm allows identifying a more e©cient trajectory in the sense that it requires less propellant to reach the same GTO, with the same payload mass. In addition, the comparison of the opt1 and opt2 trajectories shows a moderate update of the optimal parameter values.
But it may be understood that HJB opt1 and opt2 trajectories are computed on a given grid (as previously explained) and that the accuracy of this grid is still not completely satisfying ( Table 3).
The size of Grid 0 is determined by putting a ¦ne discretization on parameters that show the quickest evolution along the trajectory; it is typically the case for the evolutions of the altitude and of the norm of the velocity vector. The total size of the grid remains limited in order to get ¦rst results quickly. For this ¦rst preliminary study, Grids 1 and 2 are re¦ned just by increasing arbitrarily the mesh density, the discretization remaining evenly distributed in each state dimension. For further studies, it could be worth trying to improve the distribution of the nodes in the state space in order to increase the numerical accuracy while keeping a reasonable computation time. It could be also use- Figure 5 Impact of the grid re¦nement on results and computation duration: 1 ¡ remaining propellant; and 2 ¡ computation duration ful to elaborate an assessment of the computational accuracy for a given grid (Fig. 5).
While re¦ning the grid from 0 to 2, it can be seen that propellant remaining in tanks after GEO injection tends to decrease. A more re¦ned grid would allow determining more accurate results but Grid 2 is a compromise between accuracy and computation time which increases quickly. Increasing the accuracy of the HJB algorithm is one axis of improvement expressed hereafter. Let us note also that a rough grid already gives preliminary tendencies for a very short computation time. For the moment, it should be kept in mind that the gaps between reference and HJB trajectories rely at least partly on the grid accuracy.
Finally, the last trajectory is optimized (named ¤opt3¥), corresponding to the initial problem presented before: to maximize the payload mass on the GEO, while optimizing all controls and parameters. The computations were performed on a 50 × 15 × 50 × 30 × 15 × 3 grid (computation lasted for 2 h 30 min for a ¦xed payload mass). To get the optimal trajectory, 14 h are required (Fig. 6).
The reference and opt3 trajectories have a global good consistence, the latter trajectory showing an extra amount of payload mass of 300 kg. Gaps between the two solutions may result from several causes (in addition to previous identi¦ed elements) that will be further investigated; in particular, the second boost in HJB computation being considered as impulsive, trajectory losses due to gravity and incidence are not taken into account, which tends to overestimate the launcher performance. The perturbation of Earth gravity ¦eld is not considered in the dynamics handled by HJB algorithm, which tends to overestimate the altitude of the apogee of the GTO and so the launcher performance. These points motivate some future improvements presented hereafter.

Figure 6
The opt3 trajectory pro¦les: 1 ¡ reference; and 2 ¡ HJB The HJB computation of trajectory opt3 lasts roughly 10 h. While being a nonnegligible ¦gure, it should be emphasized that the process is fully automatic, freeing engineers from time consuming e¨orts such as initialization and convergence tasks. Moreover, this duration is small enough to allow computation in ¤bash mode¥ during nonworking hours and to have results available at the beginning of the next working session.
In Fig. 7, there are given some illustrations of the trajectory pro¦le, from launch base to ¦nal GEO (in red are the propelled phases and in blue are the ballistic phases).

IMPROVEMENTS
In order to reduce the computation time, an implementation relying of parallel computation is going to be developed, taking bene¦t of the power o¨ered by a computer network organized through a cluster.
The corresponding increased calculation capacity may also be used to re¦ne the grid used to discretize the dynamic state. The postprocessing of the GEO trajectory shows that the dynamic evolution is not evenly distributed among the grid which suggests re¦ning the node density in some areas. An adaptive mesh may also be envisaged to perform automatically the grid improvement.
Another way of improvement would be to consider the HJB approach as a way to improve local indirect approach: indeed, the gradient of the level set function corresponds to the dual state considered in the PMP. The HJB approach could be used to provide a correct estimate of the dual state at the beginning of the trajectory, feeding the shooting method classically derived from the expression of transversality relationships. The HJB approach would be used with a not too much re¦ned grid to speed up calculations; then the local approach would only be used to converge quickly toward the global optimum.
Some e¨orts may also be made on introducing other trajectory constraints that should be useful for using HJB algorithm in engineering studies.