What if the most powerful AI model wasn't the biggest one but the smartest one? Right now, most AI headlines focus on and are dominated by huge AI models that cost billions and have trillions of parameters. OpenAI aims to change that with a deceptively simple challenge: create the best language model you can, but keep the entire artifact, weights, and training code under 16 megabytes, total.
Welcome to OpenAI Model Craft: Parameter Golf, an open research competition that is already drawing attention from machine learning researchers, engineers, and independent builders worldwide. The name is deliberate, just as a golfer aims for the lowest score, participants are optimizing for the lowest possible bits-per-byte (bpb) loss under absurdly tight constraints. It is not about who has the most resources. It is about who is the most creative.
AINews.sh: Stay ahead with the latest AI product releases, in-depth reviews, and news. Compare AI tools, open-source models, and paid platforms.
What Is the Challenge?
The goal is to minimize held-out loss on a fixed FineWeb dataset while staying within a strict 16 MB artifact limit (weights and training code combined) and a 10-minute training budget on 8×H100 GPUs.
OpenAI has provided a GitHub repository with a baseline model, along with the fixed dataset and evaluation scripts. Participants fork it, improve the model within the constraints, and submit a pull request with their code, training logs, score, and a short write-up.
The challenge is heavily inspired by the NanoGPT Speedrunning challenge, where participants competed to train a model reaching 3.28 FineWeb validation loss as quickly as possible. Parameter Golf is a related but different problem: rather than optimizing for time, it asks how much intelligence can be compressed into a fixed number of bytes.
One important technical detail is that the 16 MB cap is decimal, that is, 16,000,000 total bytes, not 16 MiB (16,777,216 bytes). Every byte counts.
Why the Challenge Matters
The challenge is a live experiment in AI efficiency, and its implications stretch far beyond a leaderboard. Most discussions about AI focus on scale: having more parameters, more data, and more computing power. Parameter Golf deliberately asks a different question: what is the theoretical floor of intelligence you can fit into a tiny package?
This has enormous real-world significance:
- Edge and on-device AI: Smartphones, wearables, and IoT sensors cannot run 70B-parameter models.
- Cost efficiency at scale: Smaller, equally capable models mean dramatically lower inference and API costs.
- Architectural breakthroughs: Constraints historically drive innovation in the same way mobile hardware birthed efficient networks like MobileNet.
- Accessibility: A sub-16 MB model can be shared, deployed, and run by virtually anyone, anywhere.
The Rules and Technical Setup
The challenge pushes participants toward unique architectures such as test-time compute, aggressive parameter tying, depth recurrence, and low-rank training, as well as novel compression schemes including low precision, quantization-aware training (QAT), BitNets, and novel tokenizers, and other creative submissions like test-time training, long context, and megakernels.
Base44: an AI-powered platform that empowers anyone to transform their ideas into fully-functional apps within minutes, all without coding
Key rules at a glance:
- 16,000,000 bytes total: Your model weights and train_gpt.py script combined.
- 10 minutes of wall-clock training: On 8×H100 GPUs (equivalent hardware accepted).
- Fixed dataset: A cached version of FineWeb with a 1,024-token SentencePiece BPE vocabulary.
- Evaluation metric: Tokenizer-agnostic bits-per-byte (bpb) on the frozen FineWeb validation set.
To submit your work for the leaderboard, you must beat the existing state-of-the-art by at least 0.005 nats. You also need to provide enough run logs to prove your results are statistically significant at a p-value of less than 0.01. During evaluation, the competition does not allow any external downloads or network calls. Your submission must be completely self-contained and reproducible.
The challenge runs from March 18 to April 30, 2026.
The current public leaderboard baseline, set by OpenAI itself, sits at 1.2244 bpb, using a 9-layer, 512-dimension model with 1,024-token vocabulary and tied embeddings. That is the bar every participant is trying to beat.
More Than a Competition: It Is a Talent Search
OpenAI chief research officer Mark Chen has described the Model Craft Challenge as a contest designed to test whether applicants can "come up with creative ideas in a sandbox setting." The competition is intentionally modeled on the spirit of elite mathematics and programming Olympiads, with less emphasis on credentials and more on demonstrable problem-solving ability.
The first challenge, Parameter Golf, recreates the kinds of problems OpenAI researchers face during pretraining, the initial process of building a model by having it ingest large amounts of training data.
In June, OpenAI plans to hire a small cohort of early-career researchers, targeting current undergraduate students and recent graduates, including Olympiad medalists and elite competitors.
The financial incentive is also real: OpenAI is sponsoring $1,000,000 in compute credits through RunPod to help participants get started training their models. Participants can request a compute grant directly on the challenge page, with credits available on a while supplies last basis.
Getting Started
The barrier to entry is intentionally low. Participants can begin locally:
- Apple Silicon users can run the included train_gpt_mlx.py script directly on their machine.
- Cloud GPU users can deploy via RunPod using an official Parameter Golf template pre-loaded with the repository and all dependencies.
- The dataset (FineWeb with the sp1024 vocabulary) is downloadable via a single Python script included in the repo.
- All submissions must be open-source, made as GitHub pull requests. Each submission should include a README, a training log, a submission.json file, and a working train_gpt.py script.
In Conclusion:
Parameter Golf is more than a competition; it is a discussion about how the next breakthrough in AI might come from being smarter, not just bigger. It focuses on the core problem of L(N) optimization: finding the lowest possible loss with a fixed number of parameters, without being limited by data, computation steps, or architectural choices.
- For researchers, it is a career-defining showcase with a direct pipeline to OpenAI.
- For engineers, it is an open sandbox.
- For the industry at large, it is a pointed reminder that the most interesting problems in AI are still the ones where ingenuity beats infrastructure.
The tee is set. The leaderboard is live. How low can you go?
💡 For Partnership/Promotion on AI Tools Club, please check out our partnership page.