Reinforcement learning agent controls DIII-D plasma shape through sensor failures
Category: Diagnostics, Magnets, Simulations, Tokamak


Inside DIII-D – the tokamak where a reinforcement learning agent took direct command of the coils
(Image courtesy of General Atomics)
Researchers at Next Step Fusion and UC San Diego trained a reinforcement learning agent that tracks dynamic plasma shape targets on the DIII-D tokamak while tolerating the loss of nearly a third of its magnetic diagnostics. The policy commanded chopper power supplies directly during two experimental shape maneuvers, collapsing the classical reconstruction-then-control workflow into a single end-to-end policy with an auxiliary reconstruction head.
Magnetic probes and flux loops degrade or fail between shots through hardware faults, calibration drift, and deliberate exclusion. Classical control pipelines were designed for a full sensor set and require manual weight updates to handle each new failure pattern. A controller that operates across arbitrary sensor subsets without that retuning step removes a real operational burden.
A single policy for dynamic plasma confinement
The agent was trained in NSFsim, a physics-based simulator that models the full DIII-D power system including chopper circuits and coil current dynamics. Training drew on 120 experimental Lower Single Null shapes curated from more than 329,000 EFIT equilibria recorded between 2014 and 2020. Target shapes were resampled randomly every 0.25 seconds, forcing the agent to learn transitions across the operational envelope rather than memorising fixed setpoints.
Across roughly one million training steps the agent encountered approximately 4,000 start-to-target pairs out of 14,400 possible combinations. That coverage drove generalisation to unseen trajectories at inference time, with no fine-tuning required for specific shape sequences. The architecture pairs an asymmetric Truncated Quantile Critics learner with an auxiliary shape reconstruction head attached to the actor’s penultimate layer, mirroring the inductive structure of the classical two-stage pipeline within one network.
Diagnostic dropout and tokamak scaling robustness
Each training episode randomly zeroed 30 percent of the 71 magnetic probes and 43 flux loops, producing a controller policy that operates across arbitrary sensor subsets without backup logic or mode switching. The deployment mask at DIII-D disables 33 of 114 maskable channels, or 28.9 percent, closely matching the training distribution. The mask is fixed before each shot rather than adapted in real time.
The team also swept dropout rates from 10 to 70 percent. A policy trained at 10 percent dropout proved fragile against the real failure rate and degraded to a mean shape error of 5.4 cm across the 120-shape dataset. Policies trained at 50 and 70 percent occasionally lost the diverted configuration entirely, with the plasma transitioning from Lower Single Null to a limited shape on certain targets. The 30 percent policy struck the best balance, achieving 4.1 cm mean shape error against 3.4 cm for an oracle trained specifically on the deployment mask. That 0.7 cm gap buys generalisation across sensor subsets without advance knowledge of which diagnostics will fail.

Real device, real constraints – the x-point sweep and centroid shift data that classical isoflux control cannot replicate under partial sensor loss
(Image courtesy of Next Step Fusion)
DIII-D experiments and plasma-facing performance
Two physical experiments validated transfer to the device. In discharge 205580 the x-point radial coordinate moved from 1.36 m to 1.31 m, with the policy tracking the target throughout. A second maneuver shifted the plasma centroid by 2.5 cm between matched discharges 205576 and 205580.
The classical isoflux controller still achieved lower steady-state shape error in the independent GSevolve simulator on the same trajectories, but isoflux had been tuned for that specific operating point and offers no robustness to missing diagnostics. A further hardware factor shaped performance at boundary configurations. The DIII-D patch panel routes multiple coils through shared supply circuits, reducing actuator degrees of freedom and making the rightmost x-point the hardest target to reach.
Auxiliary head as a shape reconstruction module
The auxiliary shape head proved valuable beyond training. Ablation showed it reduces mean shape error from 4.8 to 4.0 cm and cuts episode-length standard deviation from 21.0 to 0.7 steps, acting as a training stabiliser rather than a reward maximiser. Deployed alongside the actor, the head runs at 4 kHz on the partially masked sensor set and reconstructs the plasma boundary to within 1.21 cm on discharge 205580 and 1.43 cm on discharge 205576, both inside typical EFIT uncertainty. The authors present it as a useful auxiliary reconstruction module rather than a replacement for the EFIT pipeline.
The work has clear limitations. It covers a single tokamak geometry. Absent sensors are handled by substituting the running mean rather than by explicit fault inference. The deployment mask is fixed before each shot, so robustness to diagnostics that fail mid-shot is not demonstrated. The authors identify multi-machine transfer, adaptive dropout scheduling, and dynamic mid-shot masking as the next steps.
Stay ahead in the fusion revolution explore more breakthroughs from leading innovators in clean energy technology.