All about technology. — All about artificial intelligence.

New Language Models Released by OpenAI: Empowering Consumer GPUs with 16GB Memory Capacity Optimization

Open-Model Language Models, specifically GPT-oss-120b and 20b, mark the latest developments in the arena of large-scale open-source Language Models, successors to GPT-3.

, and Administrator

2025 August 12 . 9:14 PM

3 min read

Introducing Two Open-Weight Language Models Suitable for Consumer GPUs, Efficiently Running on... — Introducing Two Open-Weight Language Models Suitable for Consumer GPUs, Efficiently Running on Devices with 16GB Memory

New Language Models Released by OpenAI: Empowering Consumer GPUs with 16GB Memory Capacity Optimization

In a significant step towards boosting adoption in emerging markets and various sectors, OpenAI has announced its latest open-weight language models: gpt-oss-120b and gpt-oss-20b. These models are designed for strong reasoning, tool use, and efficient deployment, with notable differences in size and hardware requirements.

Model Sizes and Parameter Use

gpt-oss-120b boasts 117 billion total parameters and activates approximately 5.1 billion parameters per token.
gpt-oss-20b, on the other hand, features 21 billion total parameters and activates around 3.6 billion per token.

Hardware Requirements

gpt-oss-120b can run efficiently on a single 80 GB GPU, such as the NVIDIA H100.
gpt-oss-20b is smaller and designed for edge devices or local setups, requiring only 16 GB GPU memory, making it ideal for consumer hardware and on-device inference.

Performance on Benchmarks

gpt-oss-120b approaches the accuracy of OpenAI's proprietary o4-mini model on core reasoning benchmarks and surpasses o3-mini. It outperforms several proprietary models in HealthBench evaluations, focusing on domain-specific reasoning, especially in health contexts.
gpt-oss-20b delivers competitive results close to OpenAI’s o3-mini despite being about 6 times smaller.
Both models excel at chain-of-thought (CoT) reasoning and few-shot function calling, supporting adjustable reasoning effort levels.

Optimization and Architecture

Both models use a mixture-of-experts (MoE) architecture for parameter efficiency and controllable computation effort per token.
They employ MXFP4 optimization to fit large models on single consumer GPUs.
They are trained with reinforcement learning and techniques drawn from OpenAI’s advanced internal models like o3 and frontier systems.

Tool Use and Function Calling

Both models demonstrate strong tool use capabilities and support few-shot function-calling APIs, making them suitable for interactive agentic applications.

Deployment and Licensing

Released under an Apache 2.0 license, gpt-oss models are fully open-weight and permissive for wide use.
Accessible through various SDKs and platforms such as Ollama, LM Studio, VLLM, and a free web interface requiring Hugging Face login.

Limitations

Their coding task performance is weaker compared to some contemporaries like GLM 4.5 Air, particularly on complex programming challenges and 3D rendering tasks.
GLM 4.5 Air notably outperforms gpt-oss models on practical coding benchmarks, despite similar hardware demands.

Summary Table

| Feature | gpt-oss-120b | gpt-oss-20b | |---------------------------|-------------------------------------|---------------------------------------| | Total parameters | 117 billion | 21 billion | | Params activated per token | 5.1 billion | 3.6 billion | | Hardware requirement | Single 80 GB GPU (e.g., NVIDIA H100) | Edge devices with 16 GB memory | | Benchmark performance | Near OpenAI o4-mini; surpasses o3-mini | Comparable to OpenAI o3-mini | | Key strengths | Strong reasoning, tool use, CoT reasoning | Competitive reasoning, lightweight deployment | | Coding tasks performance | Disappointing vs GLM 4.5 Air | Similar limitations | | Licensing and availability | Apache 2.0 open license; multiple deployment platforms | Same |

These models mark OpenAI’s return to large-scale open-weight releases since GPT-2, emphasizing transparent, efficient, and accessible reasoning-focused LLMs strongly suited for real-world applications, especially where local inference and cost efficiency matter.

The two new OpenAI models, gpt-oss-120b and gpt-oss-20b, are available to use now under the Apache 2.0 open-source license. Additionally, OpenAI has partnered with companies like ONNX Runtime, Azure, AWS, and Ollama to support its latest models on various platforms.

Technology and artificial-intelligence are integral parts of the new OpenAI models, gpt-oss-120b and gpt-oss-20b. These models employ mixture-of-experts (MoE) architecture for parameter efficiency, MXFP4 optimization for fitting large models on single consumer GPUs, and reinforcement learning for training. Their strong technology foundation enables them to excel at chain-of-thought (CoT) reasoning, tool use, and few-shot function calling, making them suitable for interactive agentic applications in various sectors, including those requiring local inference and cost efficiency.

Latest

SEC Proposes Changes to Derivatives Regulation Discussed with JP Bruynes by HFMWeek

News

SEC Proposes Regulation on Derivatives Discussed by HFMWeek, Featuring JP Bruynes' Insights

Gump's partner JP Bruynes has voiced concerns in the HFMWeek article "SEC Proposals Could Negatively Impact Alternative Mutual Funds" regarding the agency's recent proposals on...

, and Administrator

2025 September 9

Lucy spontaneously performs a catchy boogie-woogie duet alongside Brad Kella, who won 'The Piano'...

News

Brad Kella and Lucy, both winners from 'The Piano', spontaneously perform a lively boogie-woogie duet together

Impromptu piano duet by two national favorites radiates with the exuberance of 88 keys.

, and Administrator

2025 September 7

Wyoming marks a milestone in fintech history as it debuts the first state-issued stablecoin in the...

News

Wyoming initiates groundbreaking move in fintech sector as it introduces the first U.S. state-issued stablecoin.

Wyoming's State-Backed Stablecoin, Frontier Stable Token (FRNT), Officially Goes Live This Week

, and Administrator

2025 September 5

Securing Airspace Clearance Without the Need for LAANC Explained

News

Securing Airspace Approval Independently of LAANC Explained

Drone pilots find the LAANC system advantageous most of the time, yet it occasionally falls short.

, and Administrator

2025 September 5

New Language Models Released by OpenAI: Empowering Consumer GPUs with 16GB Memory Capacity Optimization

New Language Models Released by OpenAI: Empowering Consumer GPUs with 16GB Memory Capacity Optimization

Read also:

Related

Latest