ℏεsam's picture
13 5

ℏεsam PRO

hesamation

AI & ML interests

post-training / reasonign models / RAG

Recent Activity

reacted to their post with ❤️ about 6 hours ago
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs, Here's some of their key findings: 1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint. This is verified in the DeepSeek-R1 paper. 2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer. 3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too. This shows the RL reasoning is generalized beyond the specific domain knowledge. Previous research also shows RL can be a great generalizer. 4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on. So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation) 5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit. RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss. This might explain the "aha" moments! 6/ OpenAI's competitive programming paper showed an interesting finding: o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution) RL helps LLMs develop their own reasoning & verification methods. The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models. He also lists more influential papers on this topic, It's a must-read if you're interested. check it out 👇 https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
posted an update about 6 hours ago
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs, Here's some of their key findings: 1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint. This is verified in the DeepSeek-R1 paper. 2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer. 3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too. This shows the RL reasoning is generalized beyond the specific domain knowledge. Previous research also shows RL can be a great generalizer. 4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on. So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation) 5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit. RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss. This might explain the "aha" moments! 6/ OpenAI's competitive programming paper showed an interesting finding: o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution) RL helps LLMs develop their own reasoning & verification methods. The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models. He also lists more influential papers on this topic, It's a must-read if you're interested. check it out 👇 https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
View all activity

Organizations

lora concepts library's profile picture AI Zero to Hero's profile picture The Waifu Research Department's profile picture Blog-explorers's profile picture Multi🤖Transformers's profile picture Team Tonic's profile picture Cohere Labs Community's profile picture Hugging Face Discord Community's profile picture

hesamation's activity

reacted to their post with ❤️ about 6 hours ago
view post
Post
209
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

Here's some of their key findings:

1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.

This is verified in the DeepSeek-R1 paper.

2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.

3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.

This shows the RL reasoning is generalized beyond the specific domain knowledge.

Previous research also shows RL can be a great generalizer.

4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.

So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)

5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.

RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.

This might explain the "aha" moments!

6/ OpenAI's competitive programming paper showed an interesting finding:

o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)

RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.

He also lists more influential papers on this topic, It's a must-read if you're interested.

check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
posted an update about 6 hours ago
view post
Post
209
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

Here's some of their key findings:

1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.

This is verified in the DeepSeek-R1 paper.

2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.

3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.

This shows the RL reasoning is generalized beyond the specific domain knowledge.

Previous research also shows RL can be a great generalizer.

4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.

So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)

5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.

RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.

This might explain the "aha" moments!

6/ OpenAI's competitive programming paper showed an interesting finding:

o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)

RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.

He also lists more influential papers on this topic, It's a must-read if you're interested.

check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
replied to their post 3 days ago
reacted to their post with ❤️ 3 days ago
view post
Post
1896
OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph
·
posted an update 3 days ago
view post
Post
1896
OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph
·
reacted to their post with ❤️ 5 days ago
posted an update 5 days ago
reacted to their post with 🔥 9 days ago
view post
Post
7418
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
reacted to their post with ❤️ 11 days ago
view post
Post
7418
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
posted an update 11 days ago
view post
Post
7418
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
reacted to their post with ❤️ 15 days ago
view post
Post
2835
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)
posted an update 15 days ago
view post
Post
2835
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)