Guide
Tool calls and MCP calls spend tokens at every step — The schema, an LLM-generated call, storing the full result in context, all burn tokens.
Margarita's @effect func
runs Python locally and injects only what the LLM actually needs.
When the LLM fetches data through a tool or MCP call, every part of that exchange burns tokens — the schema that defines the tool, the message the LLM generates to invoke it, and the raw result that comes back. The LLM also decides when to call, adding an extra round-trip you never needed.
Every tool definition lives in context for the entire run. Complex schemas with descriptions, parameters, and types can cost hundreds of tokens before any work is done.
The LLM generates a tool call message, the result comes back as another message — both stay in context. A single fetch can flood the window with raw data the LLM only skims.
The LLM has to decide to call the tool, generate the call, wait for the result, then generate a response. Three API calls where one would do.
import requests
def get_urgent_tickets() -> list:
resp = requests.get(
"https://api.example.com/tickets",
params={"status": "open"},
)
tickets = resp.json()["tickets"]
# Filter and trim locally — the LLM never sees the rest
return [
{
"id": t["id"],
"subject": t["subject"],
"body": t["body"],
}
for t in tickets
if t["priority"] == "urgent"
]
@effect tools, the LLM decides when to call the function and receives
the full raw result as a message in context. With @effect func, Python runs
it immediately and the filtered result lands directly in state — ready to be referenced
anywhere in the template.
# support-agent.mgx — tool call approach
import get_urgent_tickets from tickets
// Registers the function as an LLM tool.
// Schema goes into context. LLM decides when
// to call. Full result lands as a message.
@effect tools get_urgent_tickets() => tickets
<<
You are a support agent. Fetch the open tickets
and draft a response for each urgent one.
>>
@effect run
// LLM generates a tool call, waits for result,
// then generates the final response.
// That's three API calls.
# support-agent.mgx — local function approach
import get_urgent_tickets from tickets
// Python runs now, in your process.
// No LLM round-trip. No tool schema.
// Result goes straight into state.
@effect func get_urgent_tickets() => tickets
<<
You are a support agent. Draft a response
for each of these urgent tickets:
${tickets}
>>
@effect run
// One API call. That's it.
@effect func approach sends one focused prompt with
only the pre-filtered data.
Three separate messages, plus the tool schema, before the LLM writes its first word of output. All of it stays in context for any follow-up calls.
// ① Tool schema — present for the entire run
[tools]
get_urgent_tickets() ◄ ~60 tokens
Returns open tickets with priority=urgent.
No parameters.
// ② LLM generates a call to decide it wants data
[assistant]
I'll fetch the urgent tickets now.
<tool_call>get_urgent_tickets()</tool_call> ◄ ~30 tokens
// ③ Full API response lands in context
[tool result]
{"tickets": [ ◄ ~600 tokens
{"id":1,"priority":"low","subject":"...","body":"..."},
{"id":2,"priority":"urgent","subject":"...","body":"..."},
{"id":3,"priority":"low","subject":"...","body":"..."},
{"id":4,"priority":"low","subject":"...","body":"..."},
{"id":5,"priority":"urgent","subject":"...","body":"..."},
... 45 more tickets of all priorities
]}
// ④ Finally — the actual task prompt
[user]
You are a support agent. Fetch the open tickets ◄ ~20 tokens
and draft a response for each urgent one.
One message. Only the pre-filtered urgent tickets. No schema, no call-and-response overhead, no low-priority noise the LLM has to ignore.
// ① Single focused prompt — nothing else
[user]
You are a support agent. Draft a response ◄ ~120 tokens total
for each of these urgent tickets:
[
{"id":2,"subject":"Payment failing","body":"..."},
{"id":5,"subject":"Can't log in","body":"..."}
]
// get_urgent_tickets() ran in Python before this
// call was made. No schema. No extra messages.
// 45 low-priority tickets never touched the context.
Python runs in your process the moment the line executes — no API call to the model, no waiting. Latency drops and the LLM never has to decide whether or when to fetch anything.
Shape the data in Python before it ever reaches the prompt — filter rows, trim fields, format strings. The LLM sees a clean, minimal payload instead of raw API output it has to wade through.
Call several functions before a single @effect run and the LLM receives all
the results in one focused prompt. Each function adds state, not messages — context stays
flat no matter how many calls you stack.
just import and call functions now.