New Language Models Released by OpenAI: Empowering Consumer GPUs with 16GB Memory Capacity Optimization
In a significant step towards boosting adoption in emerging markets and various sectors, OpenAI has announced its latest open-weight language models: gpt-oss-120b and gpt-oss-20b. These models are designed for strong reasoning, tool use, and efficient deployment, with notable differences in size and hardware requirements.
Model Sizes and Parameter Use
- gpt-oss-120b boasts 117 billion total parameters and activates approximately 5.1 billion parameters per token.
- gpt-oss-20b, on the other hand, features 21 billion total parameters and activates around 3.6 billion per token.
Hardware Requirements
- gpt-oss-120b can run efficiently on a single 80 GB GPU, such as the NVIDIA H100.
- gpt-oss-20b is smaller and designed for edge devices or local setups, requiring only 16 GB GPU memory, making it ideal for consumer hardware and on-device inference.
Performance on Benchmarks
- gpt-oss-120b approaches the accuracy of OpenAI's proprietary o4-mini model on core reasoning benchmarks and surpasses o3-mini. It outperforms several proprietary models in HealthBench evaluations, focusing on domain-specific reasoning, especially in health contexts.
- gpt-oss-20b delivers competitive results close to OpenAI’s o3-mini despite being about 6 times smaller.
- Both models excel at chain-of-thought (CoT) reasoning and few-shot function calling, supporting adjustable reasoning effort levels.
Optimization and Architecture
- Both models use a mixture-of-experts (MoE) architecture for parameter efficiency and controllable computation effort per token.
- They employ MXFP4 optimization to fit large models on single consumer GPUs.
- They are trained with reinforcement learning and techniques drawn from OpenAI’s advanced internal models like o3 and frontier systems.
Tool Use and Function Calling
- Both models demonstrate strong tool use capabilities and support few-shot function-calling APIs, making them suitable for interactive agentic applications.
Deployment and Licensing
- Released under an Apache 2.0 license, gpt-oss models are fully open-weight and permissive for wide use.
- Accessible through various SDKs and platforms such as Ollama, LM Studio, VLLM, and a free web interface requiring Hugging Face login.
Limitations
- Their coding task performance is weaker compared to some contemporaries like GLM 4.5 Air, particularly on complex programming challenges and 3D rendering tasks.
- GLM 4.5 Air notably outperforms gpt-oss models on practical coding benchmarks, despite similar hardware demands.
Summary Table
| Feature | gpt-oss-120b | gpt-oss-20b | |---------------------------|-------------------------------------|---------------------------------------| | Total parameters | 117 billion | 21 billion | | Params activated per token | 5.1 billion | 3.6 billion | | Hardware requirement | Single 80 GB GPU (e.g., NVIDIA H100) | Edge devices with 16 GB memory | | Benchmark performance | Near OpenAI o4-mini; surpasses o3-mini | Comparable to OpenAI o3-mini | | Key strengths | Strong reasoning, tool use, CoT reasoning | Competitive reasoning, lightweight deployment | | Coding tasks performance | Disappointing vs GLM 4.5 Air | Similar limitations | | Licensing and availability | Apache 2.0 open license; multiple deployment platforms | Same |
These models mark OpenAI’s return to large-scale open-weight releases since GPT-2, emphasizing transparent, efficient, and accessible reasoning-focused LLMs strongly suited for real-world applications, especially where local inference and cost efficiency matter.
The two new OpenAI models, gpt-oss-120b and gpt-oss-20b, are available to use now under the Apache 2.0 open-source license. Additionally, OpenAI has partnered with companies like ONNX Runtime, Azure, AWS, and Ollama to support its latest models on various platforms.
Technology and artificial-intelligence are integral parts of the new OpenAI models, gpt-oss-120b and gpt-oss-20b. These models employ mixture-of-experts (MoE) architecture for parameter efficiency, MXFP4 optimization for fitting large models on single consumer GPUs, and reinforcement learning for training. Their strong technology foundation enables them to excel at chain-of-thought (CoT) reasoning, tool use, and few-shot function calling, making them suitable for interactive agentic applications in various sectors, including those requiring local inference and cost efficiency.