Now it’s the role of Tiktok Parent byedance to get the logic of AI:

Photo of author

By [email protected]


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


I started to announce Openai’s O1 model In September 2024, but it really started Deepseek R1 was released in January 2025.

Now, it seems that most of the service providers and trainees of the artificial intelligence in a new race to present Amnesty International forms are better, fastest, cheaper, more powerful or more powerful and performance, ”which may take a little longer to respond to a human user, but they do so perfectly with the answers better and more comprehensively, which communicate with these models. To them for honesty before responding.

Bytedance, the giant father of the Chinese media in Tiktok, is the latest to join the party with advertisement and Publishing the artistic paper Behind the seed-V1.5 thinking, a large upcoming language model (LLM) is designed to enhance the performance of thinking in both the fields of science, technology, mathematics, engineering (STEM) and the fields of general purposes.

The model is not yet available for download or use, and it is not clear what the licensing conditions will be – whether it is a special source/closed or open source/free for everyone to use and modify it according to the will, or somewhere between them. But the technical paper provides some of the details worth noting that it is worth continuing now before its availability.

Love Meta’s New Llama 4 and Mixral Mixtral Before that, Bees-Hinking-V1.5 is designed using a mixture of experts (MEE).

This architecture is designed to make models more efficient, mainly combining the possibilities of multiple models to one, each model specializing in a different field.

In this case, the MEE structure means that the seed -1.5 thinking uses only 20 billion teachers at one time of a total of 200 billion.

He says bytedance in An artistic paper published for Jaithb The thinking of the seed-V1.5 gives priority to organized thinking and generating the studied response.

The results are almost talking about themselves, with seeds-V1.5 thinking out of Deepseek R1 and approaching Google Gemini 2.5 Pro and Openai’s O3-Mini High over many ARC-AAGIWhich measures the progress towards artificial public intelligence, which is seen as the goal or “holy cup” of Amnesty International – a model that surpasses humans in most of the economically value tasks, according to the definition of Openai.

It has been developed as a compact but capable of larger advanced models, and the V1.5 seed thinking achieves standard competitive results and provides innovations in reinforcement learning (RL), training training data, and Amnesty International’s infrastructure.

Performance standards and concentration of the form

Seed-V1.5 thinking shows a strong performance on a set of difficult tasks, as 86.7 % in AIME 2024, 55.0 % Passing@8 on Codeforses, and 77.3 % on the GPQA science standard. These results are placed close to or matching models such as Openai’s O3-Mini-Hight and Google Gemini 2.5 Pro on specific thinking standards.

Regarding the non -advanced tasks, the model was evaluated through human preference comparisons and achieved a 8.0 % higher victory rate on Deepsek R1, indicating that its strengths go beyond mere logical or mathematics challenges.

To address saturation in common standards such as AIME, Byteedance is presented to Beyondaime, which is a new and difficult sports standard with coordinated problems designed to better resist conservation and perform the discriminatory model. This group and Codeforce evaluation group are expected to be publicly released to support future research.

Data strategy

Training data played a major role in the development of the model. In order to install the supervision (SFT), the team sponsored 400,000 samples, including 300,000 verified samples (STEM, logic, coding) and 100,000 unsupported problems such as creative writing and roles.

For RL Training, data has been divided into:

  • Problems that can be verified: 100,000 STEM questions that were accurately filtered and logical puzzles with well -known answers, obtained from elite competitions and expert review.
  • Inspected tasks: Human discrimination data sets focus on open claims, which were evaluated using marital reward models.

STEM has greatly bowed on advanced mathematics, representing more than 80 % of the problem of problems. Additional logic data included tasks such as Sudoku and 24 points, with adjustable difficulty to match model progress.

Reinforce learning approach

Learning for reinforcement in seed thinking-V1.5 is supported by critical active action frameworks (VAPO) and Politics frameworks (DAPO), which have been developed to address the well-known instability in RL training. These technologies focus on reducing the contrary signal contrast and enhancing training stability, especially in the long idea chain settings (COT).

Reward models play an important role in overseeing RL outputs. By attedance two main tools:

  • SEED-Verififier: LLM on the rules based on whether the answers created and reference answers are mathematical equivalent.
  • Thinking of the seeds: A judge based on thinking step by step that improves the consistency of judgment and resists the reward.

The bilateral bonuses system enables accurate assessment of both direct and complex tasks.

Infrastructure and expansion

To widespread effective training, Bytedance has built a system at the top of its hybrid work frame, while carrying out the treatment of Ray groups and training and joint inference to reduce the time of inactivity in GPU.

Promise innovation is a flowing deportation system (SRS), which separates the development of the model from the implementation of the operating time. It speeds up the speed of repetition by managing completely complete generations through models versions. According to what is reported, this architecture is available up to 3 x RL cycles faster.

Additional infrastructure techniques include:

  • Mixed accuracy (FP8) for memory
  • Parallel experts and automatic control of the Ministry of Agriculture’s efficiency
  • Bytecheckpoint for flexible and flexible correction
  • Autotting to improve parallel and memory compositions

Human evaluation and the influence of the real world

To assess compatibility with human preferences, Bytedance conducted a human test across a group of areas including creative writing, human sciences and general conversation.

The seeds of the seed-V1.5 constantly outperformed the Deepseek R1 through sessions, which enhances its application to the needs of users in the real world.

The development team notes that mainly trained thinking models on check -up tasks have shown a strong circular to creative fields – a result attributed to the structure and hardness included in the course of sports training.

What it means for technical leaders, data engineers and decision makers for institutions

For technical progresss, managing a life cycle of large language models-from data organization to publishing-shows thinking in thought-V1.5 is an opportunity to rethink how to integrate the capabilities of thinking about the AI’s chimneys of the institution.

The standard training process, which includes verified thinking data sets and multi -stage reinforcement learning, is especially attractive to the teams that look forward to expanding the scope of LLM development while maintaining accurate control.

Bytedance moves to introduce seed display mechanisms and seed thinking for confidence -worthy bonuses, which can be decisive when publishing models in environments facing a customer or organization.

For the teams that often operate under narrow final dates and limited domain display, the stability of the model under reinforcement learning-which is enabled by innovations such as Vapo and take dynamic samples-can reduce repetition courses and simplify careful control of specific tasks.

From the perspective of coordination and publishing, the hybrid infrastructure of the model – including the SRS arrangement system (SRS) and the support of FP8 improvement – is a great gain in productivity training and devices.

These features will be valuable for engineers responsible for expanding the scope of LLM operations through cloud and drawing systems. The fact that the thinking of the -1.5 seed was trained with mechanisms to adapt the reward notes based on the dynamics of the time of operation speaks directly to the challenges of managing heterogeneous data pipelines and maintaining consistency across fields.

For the difference in reliability guarantee, reproduction, and the continuous integration of new tools, the system’s system design for V1.5 can be a plan to build strong multimedia formatting systems.

For professionals in data engineering, the regulatory approach to training data – including strict liquidation, enlargement and experts verification – restores the importance of data quality as a multiplier to perform the model. This can inspire more deliberate methods of developing the data set and checking pipelines.

Future expectations

The seed-V1.5 thinking is the result of cooperation within the SEED’s SEDANCE team, led by Yonghui Wu and with a general representation by Haibin Lin, has been a long contributing.

The project also depends on previous efforts like Doubao 1.5 Pro and includes joint techniques in RLHF and data arrangement.

Looking at the future, the team plans to continue to improve reinforcement learning techniques, focusing on training efficiency and rewards for unaccounted tasks. The general release of internal standards such as Beyondaime aims to enhance the broader progress in thinking that focuses on thinking.



https://venturebeat.com/wp-content/uploads/2025/04/ChatGPT-Image-Apr-11-2025-03_04_50-PM.png?w=1024?w=1200&strip=all
Source link

Leave a Comment