Spend less tokens

The Problem

Tool calls cost tokens twice

When the LLM fetches data through a tool or MCP call, every part of that exchange burns tokens — the schema that defines the tool, the message the LLM generates to invoke it, and the raw result that comes back. The LLM also decides when to call, adding an extra round-trip you never needed.

📋

Schema overhead

Every tool definition lives in context for the entire run. Complex schemas with descriptions, parameters, and types can cost hundreds of tokens before any work is done.

🔄

Call-and-response bloat

The LLM generates a tool call message, the result comes back as another message — both stay in context. A single fetch can flood the window with raw data the LLM only skims.

⏱

An extra round-trip

The LLM has to decide to call the tool, generate the call, wait for the result, then generate a response. Three API calls where one would do.

1

Write a plain Python function

No special SDK, no decorator, no schema to maintain. Write the function you would have written anyway. Fetch the data, filter it, reshape it — do all the work the LLM doesn't need to do. Return only what the prompt actually needs.

tickets.py

import requests

def get_urgent_tickets() -> list:
    resp = requests.get(
        "https://api.example.com/tickets",
        params={"status": "open"},
    )
    tickets = resp.json()["tickets"]

    # Filter and trim locally — the LLM never sees the rest
    return [
        {
            "id": t["id"],
            "subject": t["subject"],
            "body": t["body"],
        }
        for t in tickets
        if t["priority"] == "urgent"
    ]

2

Replace the tool call

With @effect tools, the LLM decides when to call the function and receives the full raw result as a message in context. With @effect func, Python runs it immediately and the filtered result lands directly in state — ready to be referenced anywhere in the template.

Before @effect tools

# support-agent.mgx — tool call approach
import get_urgent_tickets from tickets

// Registers the function as an LLM tool.
// Schema goes into context. LLM decides when
// to call. Full result lands as a message.
@effect tools get_urgent_tickets() => tickets

<<
You are a support agent. Fetch the open tickets
and draft a response for each urgent one.
>>

@effect run
// LLM generates a tool call, waits for result,
// then generates the final response.
// That's three API calls.

After @effect func

# support-agent.mgx — local function approach
import get_urgent_tickets from tickets

// Python runs now, in your process.
// No LLM round-trip. No tool schema.
// Result goes straight into state.
@effect func get_urgent_tickets() => tickets

<<
You are a support agent. Draft a response
for each of these urgent tickets:

${tickets}
>>

@effect run
// One API call. That's it.

3

What the LLM actually sees

The difference shows up in the context window. The tool call approach sends schema, a generated call message, and a raw API response — all before the LLM writes a single word of output. The @effect func approach sends one focused prompt with only the pre-filtered data.

@effect tools — context window

Three separate messages, plus the tool schema, before the LLM writes its first word of output. All of it stays in context for any follow-up calls.

// ① Tool schema — present for the entire run
[tools]
  get_urgent_tickets()                         ◄ ~60 tokens
    Returns open tickets with priority=urgent.
    No parameters.

// ② LLM generates a call to decide it wants data
[assistant]
  I'll fetch the urgent tickets now.
  <tool_call>get_urgent_tickets()</tool_call>  ◄ ~30 tokens

// ③ Full API response lands in context
[tool result]
  {"tickets": [                                ◄ ~600 tokens
    {"id":1,"priority":"low","subject":"...","body":"..."},
    {"id":2,"priority":"urgent","subject":"...","body":"..."},
    {"id":3,"priority":"low","subject":"...","body":"..."},
    {"id":4,"priority":"low","subject":"...","body":"..."},
    {"id":5,"priority":"urgent","subject":"...","body":"..."},
    ... 45 more tickets of all priorities
  ]}

// ④ Finally — the actual task prompt
[user]
  You are a support agent. Fetch the open tickets  ◄ ~20 tokens
  and draft a response for each urgent one.

@effect func — context window

One message. Only the pre-filtered urgent tickets. No schema, no call-and-response overhead, no low-priority noise the LLM has to ignore.

// ① Single focused prompt — nothing else
[user]
  You are a support agent. Draft a response      ◄ ~120 tokens total
  for each of these urgent tickets:

  [
    {"id":2,"subject":"Payment failing","body":"..."},
    {"id":5,"subject":"Can't log in","body":"..."}
  ]

// get_urgent_tickets() ran in Python before this
// call was made. No schema. No extra messages.
// 45 low-priority tickets never touched the context.

⚡

Zero round-trips

Python runs in your process the moment the line executes — no API call to the model, no waiting. Latency drops and the LLM never has to decide whether or when to fetch anything.

🔀

Pre-filter before it lands

Shape the data in Python before it ever reaches the prompt — filter rows, trim fields, format strings. The LLM sees a clean, minimal payload instead of raw API output it has to wade through.

📈

Stack multiple calls cheaply

Call several functions before a single @effect run and the LLM receives all the results in one focused prompt. Each function adds state, not messages — context stays flat no matter how many calls you stack.

Ready to Have more control?

just import and call functions now.

Read the Margarita Docs Back to Margarita