Qwenlong-L1 for a long time reference rational challenge that stumps current llms

Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more

Alibaba Group is introduced Qwenlong-L1A new structure that enables the large language model (LLM) to argue on a very long input. This development can unlock a new wave of enterprise applications, requiring models to understand and draw insight from extensive documents such as wide corporate filing, long financial statements or complex legal contracts.

Long -term argument challenge for AI

Through recent progress, especially reinforcement learning (RL) in large logic models (LRM), their problems have been greatly improved. Research suggests that when RL is trained with fine-tuning, LRMs receive the same skills as human “slow thinking”, where they develop a sophisticated strategy to deal with complex tasks.

However, these improvements are mainly seen when the model works with relatively small pieces of the text, usually about 4,000 tokens. The capacity of these models remains a major challenge to score their logic for longer contexts (eg, 120,000 tokens). Such long-term logic requires a strong understanding of the entire context and the ability to analyze multi-phase. The developers of Qwenlong-L1 said about them, “This limit is an important obstacle for practical applications requiring interaction with external knowledge, such as intensive research, where LRM should collect and process information from the knowledge-intensity environment.” paper,

Researchers formally form these challenges in the concept of “long reference logic RL”. Unlike short-rendered arguments, which often depends on the already stored knowledge within the model, long reference arguments RL requires models to reconstruct and grass the relevant information from a long-term long input. Only then can they generate a series of arguments based on this involved information.

The training model is difficult for this through RL and often results in disabled learning and volatile adaptation processes. Models struggle to converge good solutions or lose their ability to detect diverse logic paths.

Qwenlong-L1: A multi-step approach

Qwenlong-L1 is a reinforcement of learning outline designed to help LRMS profusely in transition with small texts that are for strong generalization in long contexts. Framework increases the existing short-content LRM through a careful structured, multi-step process:

Warm-up supervised fine-tuning (SFT): The model first undergoes an SFT phase, where it is trained on examples of longer reference. This phase establishes a solid foundation, allowing the model to accurately enable the ground information from long input. This helps develop fundamental abilities in understanding reference, produces logical logic chain, and extracts the answer.

Course-directed phased RL: At this stage, the model is trained through several stages, the target length of input documents gradually increases. This systematic, step-by-step approach helps the model refer to its logic strategies for a long time. This often avoids instability when the model is suddenly trained on very long texts.

Difficulty-incumbent prejudice sampling: The final training phase includes challenging examples from the preceding training stages, ensuring that the model continues to learn from the most difficult problems. This preferences hard examples and encourages the model to detect more diverse and complex arguments.

Qwenlong-L1 for a long time reference rational challenge that stumps current llms — *Qwenlong-L1 Process Source: Arxiv*

Beyond this structured training, Qwenlong-L1 also uses a separate prize system. While training for miniature contextual argument functions often depends on strict rules-based awards (eg, a correct answer in a mathematics problem), Qwenlong-L1 appoints a hybrid reward mechanism. It combines the rule-based verification, which ensures accuracy by checking for strict adherence for purity criteria, a “with” a “with”Llm-ra-a-judge“This judge model compares the selse of the answer generated with the grassroots truth, allows better handling in more flexibility and diverse methods, the correct answers can be expressed while dealing with long, fine documents.

Putting Qwenlong-L1 for testing

The Alibaba team evaluated Qwenlong-L1 using the document question-answer-deeri (Docqa) as a primary task. This landscape is highly relevant to the needs of the enterprise, where AI must understand dense documents to answer complex questions.

Experimental results in seven long references showed the capabilities of Qwenlong-L1. In particular, Qwenlong-L1-32B model (based on) Deepsek-R 1-Dystil-Quen-32B) Anthropic’s cloud-3.7 achieved equal performance to sonnet thinking, and performed better than models such as OpenEE’s O 3-Mymini and Quven 3-235 B-A22. The small Qwenlong-L1-14B model also improved Google’s Gemini 2.0 flash thinking and Qwen3-32B.

An important discovery relevant to real -world applications is how long reference logic behavior develops in models as a result of RL training. Paper notes that trained with Qwenlong-L1 “grounding” (connecting answers to specific parts of a document), “subgoel settings” (breaking complex questions), “bankcatracing” (to recognize and correct their own mistakes), and “verification” (verification “(to repeat their answers).

For example, while a base model may differ from irrelevant details in a financial document or get stuck in a loop of more unrelated information, Qwenlong-L1 trained models demonstrated the ability to engage in effective self-confidence. It can successfully filter these disaster details, retreat from wrong paths, and reach the correct answer.

Techniques like Qwenlong-L1 can expand the utility of AI in the enterprise. Potential applications include legal techniques (analysis of thousands of pages of legal documents), finance (financial filing for intensive research and risk evaluation or investment opportunities on annual reports) and customer service (long customer interaction history analysis to provide more informed support). Researchers have released Code for Qwenlong-L1 recipe And this Weight for trained model,

Daily insights on business use cases with VB daily

If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

Read our privacy policy

Thanks for membership. See more VB newsletters here.

There was an error.

What's Hot

Forget Otter.ai: Chat only entered the meeting room

Save on airpods, ipads, macbooks and more

These streaming services have the best offline mode for traveling

Forget Otter.ai: Chat only entered the meeting room

I have just forgotten this Netflix Survival Thriller Movie – and I am kicking myself to remember it for the first time

AI working is a rapid network case, the latest benchmark test show

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

The new coding agent of Chatgpt is very big, even if you are not a programmer

Google’s AI overview is often wrong with so confident that I have lost all confidence in them

Indiana Jones and The Great Circle’s best side quest is about a Nazi grifter

Our Picks

Forget Otter.ai: Chat only entered the meeting room

Save on airpods, ipads, macbooks and more

These streaming services have the best offline mode for traveling

Subscribe to Updates

What's Hot

Qwenlong-L1 for a long time reference rational challenge that stumps current llms

Long -term argument challenge for AI

Qwenlong-L1: A multi-step approach

Putting Qwenlong-L1 for testing

Related Posts

Subscribe to Updates