Fine-Tune Agents with Reinforcement Learning

Escape from prompting hell & fine-tune without data collection

Decrease the Failure Rate

Comparison of failure rates of a coding agent backed by GPT-4o vs Qwen-32B, fine-tuned with Augento RL

Reinforcement Learning Fine-Tuning

RL Fine-Tuning is a new post-training paradigm for LLMs to adapt foundational models to your specific use case. It solely relies on a set of training cases (prompts) and a reward function on which the model is trained to optimize.

Augento's platform allows you to simply connect your existing LLM, define a reward function and start RL training jobs to produce open-source fine-tuned models that are automatically hosted and can be used via its standard API.

VS Supervised Fine-Tuning

Unlike SFT, RL Fine-Tuning doesn't require curated prompt-response pairs. Instead of teaching through examples, RL learns from feedback, meaning you don't need to collect perfect responses that match your desired output.

VS Prompting

While prompting struggles with complex tasks, RL Fine-Tuning embeds behaviors deeply into the model. It's more reliable, requires less tokens, and can learn tasks that are challenging to specify via prompts alone.

Real-World Applications

Coding Agent

We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly.

By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors.

Results:

40% reduction in critical bugs

with just 20 training samples

MCP Tool Specialization

For custom internal tools using the MCP protocol, agents often select the wrong tool or pass incompatible parameters.

Fine-tune with a reward function that scores tool selection and parameter matching to create specialized tool-using agents.

Benefits:

Optimized tool selection

and improved parameter compatibility

Browser Agent Navigation

Browser agents often struggle with complex web UIs and specific sites. Fine-tuning helps them better understand UI elements and navigation patterns.

Create reward functions that score successful task completion like "find the best price" or "complete this multi-step form".

Improvements:

Better identification of interactive elements

and navigation through complex single-page applications

VLA Robot Control

Vision-language models controlling robotic arms or other hardware need to be tailored for specific actuator setups.

Fine-tune with reward functions based on high-level task completion to better translate natural language commands into precise actuator controls.

Example:

"Move the red block behind the blue cylinder"

translated to specific hardware controls

Hooks into your production system

To make the process as easy as possible, you don't have to provide explicit training cases. Instead, you just change your API key in your LLM connector and let us capture the data flow to identify the error cases.

agent.py

1from langchain_openai import OpenAI
2
3llm = OpenAI(api_key="sk-XXX", 
4             baseURL="https://api.augento.com/v1")

Define a Reward Function

Define reward functions to train your agent on your specific task. Let it learn reasoning, find edge-cases in your codebase, play chess, or interact with your MCP tools.

reward.py

1import compiler
2
3def reward(completion):
4  try:
5    compiler.compile(completion)
6    return 1
7  except Exception as e:
8    return 0

RL Training

The RL Fine-Tuning process is fully automated. You connect your reward function, the captured training cases, and let us handle the rest.

Initializing TensorFlow.js...

Hosting

Fine-tuned models are automatically hosted on our infrastructure, so you can switch to them just in one click.

terminal

1curl https://api.augento.ai/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Bearer sk-..." \
4  -d '{
5    "model": "finetuned-model",
6    "messages": [{"role": "user", "content": "Hello!"}]
7  }'

Pricing

Pay As You Go

Perfect for startups and individual developers with variable usage needs.

$20.00$0.00/month

20 $ in credits included

+ $0.50 per training step

Available Models:

Qwen 2.5 32B Instruct

Access to Qwen 2.5 32B Instruct model

Pay only for what you use

No long-term commitment

API access

Basic support

Usage analytics

Enterprise

Custom solutions for organizations with high-volume requirements.

Coming Soon

Available Models:

DeepSeek R1

DeepSeek V3

Meta Llama 3.1 405B Instruct

All Pro features

Custom model deployment

Dedicated infrastructure

SLA guarantees

24/7 priority support

Custom integrations

Dedicated account manager