Fine-Tune Agents with Reinforcement Learning
Escape from prompting hell & fine-tune without data collection

Decrease the Failure Rate
Comparison of failure rates of a coding agent backed by GPT-4o vs Qwen-32B, fine-tuned with Augento RL
Reinforcement Learning Fine-Tuning
RL Fine-Tuning is a new post-training paradigm for LLMs to adapt foundational models to your specific use case. It solely relies on a set of training cases (prompts) and a reward function on which the model is trained to optimize.
Augento's platform allows you to simply connect your existing LLM, define a reward function and start RL training jobs to produce open-source fine-tuned models that are automatically hosted and can be used via its standard API.
VS Supervised Fine-Tuning
Unlike SFT, RL Fine-Tuning doesn't require curated prompt-response pairs. Instead of teaching through examples, RL learns from feedback, meaning you don't need to collect perfect responses that match your desired output.
VS Prompting
While prompting struggles with complex tasks, RL Fine-Tuning embeds behaviors deeply into the model. It's more reliable, requires less tokens, and can learn tasks that are challenging to specify via prompts alone.
Real-World Applications
Coding Agent
We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly.
By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors.
Results:
40% reduction in critical bugs
with just 20 training samples
MCP Tool Specialization
For custom internal tools using the MCP protocol, agents often select the wrong tool or pass incompatible parameters.
Fine-tune with a reward function that scores tool selection and parameter matching to create specialized tool-using agents.
Benefits:
Optimized tool selection
and improved parameter compatibility
Browser Agent Navigation
Browser agents often struggle with complex web UIs and specific sites. Fine-tuning helps them better understand UI elements and navigation patterns.
Create reward functions that score successful task completion like "find the best price" or "complete this multi-step form".
Improvements:
Better identification of interactive elements
and navigation through complex single-page applications
VLA Robot Control
Vision-language models controlling robotic arms or other hardware need to be tailored for specific actuator setups.
Fine-tune with reward functions based on high-level task completion to better translate natural language commands into precise actuator controls.
Example:
"Move the red block behind the blue cylinder"
translated to specific hardware controls
Hooks into your production system
To make the process as easy as possible, you don't have to provide explicit training cases. Instead, you just change your API key in your LLM connector and let us capture the data flow to identify the error cases.
1from langchain_openai import OpenAI
2
3llm = OpenAI(api_key="sk-XXX",
4 baseURL="https://api.augento.com/v1")
Define a Reward Function
Define reward functions to train your agent on your specific task. Let it learn reasoning, find edge-cases in your codebase, play chess, or interact with your MCP tools.
1import compiler
2
3def reward(completion):
4 try:
5 compiler.compile(completion)
6 return 1
7 except Exception as e:
8 return 0
RL Training
The RL Fine-Tuning process is fully automated. You connect your reward function, the captured training cases, and let us handle the rest.
Hosting
Fine-tuned models are automatically hosted on our infrastructure, so you can switch to them just in one click.
1curl https://api.augento.ai/v1/chat/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Bearer sk-..." \
4 -d '{
5 "model": "finetuned-model",
6 "messages": [{"role": "user", "content": "Hello!"}]
7 }'
Pricing
Pay As You Go
Perfect for startups and individual developers with variable usage needs.
20 $ in credits included
+ $0.50 per training step