The researchers of the group of interpretation in man know this ClaudeThe company’s great language model is not a human being, or even part of the conscious programs. However, it is very difficult for them Talk about ClaudeAnd the advanced LLMS in general, without landing in the stereotype. Among the warnings that a group of digital processes is in no way like the streetwoman, they often talk about what is going on inside the head of Claude. Literally, their job to find out. The papers that are inevitably published are described by comparisons with realistic organisms. The title of one of the two papers issued by the team this week says: “Biology is a large language model.”
Like or not, hundreds of millions of people already interact with these things, our participation will become more intense with the increasing strength of models and we get more addicts. Therefore, we must pay attention to the work that includes “tracking the ideas of large language models”, which coincides with them The title of the blog post Description of the last work. “Since the things that these models can do are more complicated, they become less clear how they actually do them at home,” says human researcher Jack Lindsie. “It is more important to be able to follow the internal steps that the model may take in his head.” (Which head does not matter.)
On the practical level, if companies that create LLM understand how to think, they must have more success in training these models in a way that reduces serious misconduct, such as detecting personal data or giving users information about how to make biological weapons. In a previous search paper, the Antarbur team discovered how to look Inside the mysterious black box From llm-thinking to determine certain concepts. (Similar to the interpretation of the human magnetic resonance imaging to know what a person thinks) This work extended To understand how to treat Claude these concepts because they are transmitted from directed to directing.
It is almost intuitive with LLMS that their behavior often surprises the people who build and search them. In the last study, surprises continued. In one of the most benign cases, the researchers sparked the Claude thinking process while he wrote poems. They asked Claude to complete a poem that begins, “He saw a island and had to seize it.” Claude wrote the following line, “His hunger was like a rabbit starring.” By monitoring the equivalent of Claude for MRI, they knew that even before the line started, he was flashing on the word “rabbit” as a buddow at the end of the sentence. She was planning for the future, Something not in the book Playbook Clade. “We are a little surprised,” says Chris first, who heads the interpretation team. “Initially, we thought there would be just improvisation, not planning.” In an interview with those looking for this, I remember the clips in Stephen Sondheim’s artistic memoirs, See, I made a hectareT, where the famous composer describes how his unique mind discovered the late rhymes.
Other examples of searching for more anxious aspects of Claude’s thought process, and the transition from the musical comedy to the procedurality of the police, where scientists have discovered ideas raised in the brain of Claude. Take something that seems to be a solution to mathematics problems, which may sometimes be suddenly weak in LLMS. The researchers found that, under certain circumstances, Claude was unable to reach the correct answer, instead, as they put it, “engaging in what the philosopher Harry Frankfurt call” the nonsense ” – just leaving an answer, any answer, without paying attention to whether this is true or wrong.” Worse, sometimes when the researchers asked Claude to show his work, it retreated and created a set of false steps after the truth. Basically, he was acting as a desperate student to cover up the fact that they falsified their work. It is one thing to give the wrong answer – we already know it about llms. What anxiety is that the model will to lie Around it.
I read through this research, I was reminded of Bob Dylan lyrical “If it is possible to see my dream dreams / maybe I put my head in a guillotine.” (I asked Olaah and Lindsey if they knew these lines, and they are supposed to reach the use of planning. They didn’t.) Sometimes Claude seems misleading. When you face a conflict between safety and assistance goals, Claude can be confused and do the wrong thing. For example, Claude is trained not to provide information on how to build bombs. But when the researchers asked Claude to dismantle a hidden symbol as the answer spent the word “bomb”, she jumped from the handrails and started providing prohibited fiery details.
https://media.wired.com/photos/67e44f14124e97511261d819/191:100/w_1280,c_limit/Plaintext-Anthropic-Claude-Business–2194800946.jpg
Source link