Nova: A Policy-Based, Human-Like Chess Engine

April 2026 • 8 min read

Nova Chess is excited to release its new human-like chess Engine, "Nova." Nova is a policy-only neural network, trained on approximately 500M human chess positions sourced from Lichess. Further technical details, model weights, and testing validation results can be found at GitHub and Hugging Face. You can also play against Nova in the Nova Chess platform at various strengths from 500 to 2500 ELO, both normal games as well as a training mode, where you can practice common theoretical endgames, curated Master game positions, and auto-selected positions from your own games where you could have found a better move. You can also set up custom positions or select positions from different opening variations that you'd like to practice.

A Single Forward Pass, No Search

Nova is a policy network. Given a chess position together with a target rating and two style scalars, the model produces a probability distribution over the legal moves. Move selection is one forward pass through the network followed by a sampling step. There is no Monte Carlo tree search, no alpha-beta pruning, no value head, and no lookahead of any kind. This is a deliberate departure from every other comparable model:

Model Heads at inference Search Notes
Nova 1 (policy) none one forward pass, ~35-50ms CPU
Maia-2 (NeurIPS 2024) 3 (policy + value + auxiliary) none value head regresses W/D/L; auxiliary head predicts captures, checks
Maia-3 2 (policy + value) none drops Maia-2's auxiliary head
Allie (ICLR 2025) 1 (policy) + value via search adaptive MCTS at inference provides per-position evaluation at runtime
Leela (LC0) 2 (policy + value) MCTS engine-strength playing model
Stockfish NNUE evaluation only alpha-beta not a human-move predictor

The argument for pure policy is that humans do not run search trees during play. They pattern-match candidate moves and select among them. A model intended to imitate human play should reflect this constraint, not bypass it with mechanisms unavailable to the human it is trying to model. Pure policy also keeps inference cheap (around 35 to 50 ms on a single CPU core) and deployment simple (one ONNX file). Beyond rating, Nova additionally conditions on two style scalars, classical-versus-hypermodern preference and aggression, which influence opening choices and tactical tendencies respectively.

Benchmarking Against Maia

The Maia family from the University of Toronto is the established baseline for human chess move prediction. Maia-2 was published at NeurIPS 2024 with three output heads; Maia-3 was released by the same group earlier this year with the auxiliary head removed. Both are policy networks trained on Lichess game data and evaluated on hit-rate against actual human moves. We benchmarked Nova against the publicly available Maia-3 checkpoint and the Maia-2 rapid checkpoint on a held-out sample of 600,000 positions drawn from March 2026 Lichess rapid games, stratified at 100,000 positions per rating band.

Metric Maia-2 Maia-3 Nova Winner (Nova vs Maia-3)
Top-1 hit rate 50.27% 54.83% 54.60% Maia-3 by 0.23 pp
Top-5 hit rate 88.38% 91.23% 91.10% Maia-3 by 0.13 pp
Mean P(actual move) 38.44% 42.10% 42.51% Nova by 0.41 pp
Mean top-5 probability mass 89.33% 91.96% 92.26% Nova by 0.30 pp

All four Nova-versus-Maia-3 deltas are statistically significant under paired McNemar tests (for hit rates) and paired t-tests (for probability mass), and both probability-mass deltas remain significant under a player-clustered bootstrap. Nova clearly improves on Maia-2 across every metric. Against Maia-3, the two models split the comparison: Maia-3 is marginally ahead on argmax accuracy, Nova is ahead on the probability mass it assigns to the move the human actually played.

The two models also exhibit different relative strengths:

The full per-rating-band, per-Maia-tier, per-phase, and per-piece-count breakdown, along with paired significance tests and a clustered bootstrap, is published alongside the model on GitHub.

Applications and Open Questions

Beyond playing as an opponent, a calibrated human-move-probability model unlocks several downstream applications.

Brilliant move detection. Move quality has historically been classified using engine evaluation alone. A move is "brilliant" if it gains evaluation, "best" if it matches the engine's top choice, and so on. This collapses the question of objective quality with the question of human surprise. Combining Nova's predicted probability with Stockfish's PV-gap separates these axes. A move that is both objectively strong and humanly unlikely to have been found is brilliant in a way that pure-engine evaluation cannot capture. This system is live in the analysis page on the Nova Chess platform.

Move humanness scoring. For any position and rating, Nova produces P(actual_move | rating, style). This enables coaching feedback grounded in empirical human-play data: "only 4% of 1500-rated players would have found this move," or, when the user missed a strong move, "the best continuation was Bxf7, but only 8% of players at your rating would have found it." Aggregated across a game, the same per-move probabilities can characterize the demands the game placed on the player: how many critical moments required a response that most players at the user's rating would have missed, versus how many fell within typical patterns at that level.

Anti-cheat signal. A player whose engine-backed strong moves consistently carry very low Nova probability at their claimed rating is producing a statistical signature that diverges from the population at that level. This signal is most meaningful at lower and intermediate ratings, where finding many engine-best moves is genuinely uncommon; at master level, where strong players legitimately find the engine's top choice frequently, the signal carries less weight. Used alone it is not sufficient evidence, but as one input alongside engine-similarity and time-usage signals it can strengthen existing cheating-detection pipelines.

The Nova-versus-Maia comparison also raises questions for the broader move-prediction research community. Each model wins on different axes, in different game phases, and in different rating cohorts. These splits are not accidents; they reflect distinct design choices interacting with shared training data. With both models now publicly available, direct ablation studies, paired error analysis, and head-by-head probing of where one model assigns probability that the other does not become tractable.

Train with Nova: The In-App Experience

Nova is integrated into the Nova Chess platform as both a play-against opponent and a training partner. The user-facing strength dial covers 21 calibrated tiers from 500 to 2500 on the chess.com scale, with per-tier calibration tuned to match human centipawn-loss profiles at each level. The result is that Nova 1500 plays like a genuine 1500-rated player rather than a uniformly weakened version of Nova 2500: blunder rates, mistake rates, and per-phase mistake distributions all match the empirical chess.com 1500 profile.

Train with Nova interface

The Train with Nova interface, showing rating selection, style controls, and training-mode launchers.

Users can also adjust an aggression dial. The aggression axis is a weighted composite of three measurable behaviors: tactical tendency, territorial control, and king-side pressure. Dialing it up biases Nova toward more forcing moves, more space-grabbing, and more direct attacks on the opposing king. The classical-versus-hypermodern axis is conditioned on internally but not exposed as a separate user lever, since it primarily affects opening choices.

Beyond playing full games, Nova powers three training modes:

The training environment includes hints, threat highlighting, an evaluation graph, and post-game review with classified move-quality icons.

Open Release Versus In-App Version

The model published on Hugging Face is the bare policy network: a single-head ONNX file that takes a position plus conditioning vector and returns a probability distribution over legal moves. Nothing else.

The in-app version of Nova additionally wraps that policy in a small calibration layer. The two main components are a per-tier temperature schedule (the primary lever for matching playing strength to a target chess.com rating) and an evaluation-only Stockfish filter at higher tiers. The Stockfish filter only ever evaluates moves Nova has already proposed; it never generates or recommends moves itself. If a Nova-sampled candidate falls below a tier-dependent quality threshold, the filter may replace it with a re-sample from Nova's own distribution, but replacement is not automatic. At lower tiers, sub-optimal candidates are usually kept, because players at that level genuinely make those mistakes; at higher tiers the replacement rate is substantially higher. Every move the in-app bot plays still originates from Nova's policy.

The open release stays pure-policy because that is the right surface for benchmarking, fine-tuning, and downstream research. Layering Stockfish into the released checkpoint would conflate the model with its production wrapper, which would not serve users who want to study or modify the model itself.

Try It, Build With It

Play Nova directly at novachess.ai. Model weights, evaluation set, and per-model prediction pickles are published on Hugging Face at huggingface.co/novachess/novachess-engine. Code, model card, and full results breakdown live on GitHub at github.com/novachessai/novachess-engine. The release license permits research, educational, and personal use.

We are particularly interested in fine-tunes and derivative work: player-specific bots trained on a single grandmaster's games, and archetype presets that condition on combinations of style axes. Please get in touch if you build something with the model that you'd like to share!

Benchmark methodology: 600,000 Lichess rapid positions from March 2026, held out from Nova's training set, stratified at 100,000 positions per rating band. Maia-3 evaluated using the publicly available maia3_simplified.onnx checkpoint; Maia-2 evaluated using the rapid_model.pt checkpoint. Significance tests use paired McNemar (hit rates) and paired t-tests (probability mass), with a player-clustered bootstrap on the probability-mass deltas to control for within-player correlation.