This is a submission for the Hermes Agent Challenge.
My Hermes research agent was failing after about 40 turns. The cause: conversation history growing past the context window. The fix everyone reaches for is "just drop old messages" — but if you drop a tool_use without its matching tool_result, Anthropic's API rejects the whole request.
I needed something smarter. That's agent-message-trim.
One call
from agent_message_trim import trim_messages
result = trim_messages(messages, max_tokens=4000)
# Send result.messages to the model — it's safe.
response = client.messages.create(
model="claude-sonnet-4-5",
messages=result.messages,
...
)
print(f"Dropped {result.dropped_count} messages to fit")
Tool pair safety
This is the part that matters. If your history looks like this:
[
{"role": "user", "content": "search for X"},
{"role": "assistant", "content": [{"type": "tool_use", "id": "call_001", ...}]},
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "call_001", ...}]},
{"role": "assistant", "content": "Here is what I found."},
]
trim_messages never drops the tool_use without also dropping its tool_result. They move as a unit. The conversation you get back is always API-valid.
Keep your system prompt
result = trim_messages(messages, max_tokens=4000, keep_system=True)
# system-role messages are pinned — never dropped, not counted toward drop candidates
Two strategies
# Default: drop from the front (oldest messages go first)
result = trim_messages(messages, max_tokens=4000, strategy="drop_oldest")
# Keep first + last, remove from the middle
result = trim_messages(messages, max_tokens=4000, strategy="drop_middle")
drop_middle is useful when you want to keep the original task context AND the most recent exchange, but can sacrifice the middle of a long conversation.
Custom token counter
The built-in estimator is max(1, (len(text)+3)//4). Plug in your own:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
result = trim_messages(
messages,
max_tokens=4000,
count_tokens=lambda text: len(enc.encode(text)),
)
TrimResult tells you what happened
result = trim_messages(messages, max_tokens=4000)
result.messages # trimmed list
result.token_count # estimated tokens used
result.original_count # how many messages came in
result.dropped_count # how many were removed
result.ok # True if nothing was dropped
result.kept_count # len(result.messages)
Just want the list?
from agent_message_trim import trim_to_fit
trimmed = trim_to_fit(messages, max_tokens=4000)
# returns the list directly
Zero dependencies
Standard library only: json, dataclasses. Nothing else.
pip install agent-message-trim





















