QwQ-32B: The Power of Scaling RL

Introduction

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

Model Specifications

Type: Causal Language Models
Training Stage: Pretraining & Post-training (Supervised Finetuning and Reinforcement Learning)
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Number of Parameters: 32.5B
Number of Parameters (Non-Embedding): 31.0B
Number of Layers: 64
Number of Attention Heads (GQA): 40 for Q and 8 for KV
Context Length: Full 131,072 tokens

Key Features

QwQ-32B stands out from other models in the Qwen series with its enhanced reasoning capabilities. The model embodies the spirit of philosophical inquiry, approaching problems with genuine wonder and doubt. This approach enables it to tackle complex problems with a methodical and analytical mindset.

Performance Highlights

QwQ-32B demonstrates impressive analytical abilities, achieving remarkable scores on various benchmarks:

65.2% on GPQA
50.0% on AIME
90.6% on MATH-500
50.0% on LiveCodeBench

The model excels particularly in mathematics and coding tasks, showcasing its strong reasoning capabilities in these domains.

Limitations

While QwQ-32B offers impressive capabilities, users should be aware of certain limitations:

Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity.
Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance.
Performance Variations: While the model excels in math and coding, it has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Usage Guidelines

For the best experience, please review the usage guidelines before deploying QwQ models. The model is based on Qwen2.5, whose code has been integrated into the latest Hugging Face transformers library. We advise using the latest version of transformers(version 4.37.0 or later) to avoid compatibility issues.