USATLAS Face-to-Face · Throughput Computing Week 2026

LLM-Assisted Analysis
at the UChicago AF

Agentic tools your analysis can use today

Giordon Stark (UC Santa Cruz / SCIPP) · on behalf of the UChicago AF team · 2026‑06‑09

Agentic intelligence for HEP infrastructure

Slides AI-assisted by Claude (Anthropic · Opus 4.8, 1M) · title illustration AI-generated.

The whole talk in one example

“Where do my TopCPToolkit outputs go?”

Your batch jobs emit big ntuples and small histograms. Where should each land on the AF?

A generic LLM

“Transfer the outputs back to submit node”, i.e. $HOME. The big ntuples blow the 100 GB /home quota almost immediately.

An LLM that knows the AF

big ntuples → /data (5 TB); small histograms → /home (100 GB). It knows the quotas and layout and mounts.

The thesis

The model didn’t get smarter. We gave it facility context. That gap, between a generic assistant and one that knows your storage, scheduler, and data, is the key.

Grounding · the vocabulary

The vocabulary, fast

Agent

The engine doing the work: a loop that uses an LLM (the “brain”) to decide each step, call a tool, read the result, and repeat.

MCP server

Model Context Protocol: a standard adapter exposing a service (Rucio, AMI, your notebook) as tools and resources — live context any agent can call.

Skill

Static context the agent loads on demand: instructions, a reusable recipe (“how we do a pyhf fit here”).

Agent harness

The app that gives the agent hands: runs its commands, reads/writes files, wires MCPs, enforces permissions. e.g. Claude Code, Codex.

Bundle skills + MCPs + agents + hooks → a plugin. Many plugins in one place to discover & install → a marketplace (which can link other marketplaces).

Related work · the framing

The ecosystem framing (Watts)

“It’s not about a smarter LLM — it’s about smarter infrastructure around it.” G. Watts, “Beyond Code Generation,” CHEP 2026

His framing: data & tools → MCP → skills → agents; grounded in primary sources; “we need a pip-install.”
This talk: one facility’s instance, running now, and the claim that facility context is the decisive layer, built to port.

Gordon Watts' Analysis Ecosystem boat diagram

G. Watts’ CHEP 2026 slides · © Gordon Watts + AI

Watts, CHEP 2026: indico.cern.ch/event/1471803/contributions/6969164

The present

What an analyzer
can use today

What you can use today · managed settings

You bring the agent; we ship the rules

You install your own harness on the login nodes; we don’t force one
Our config management (Puppet) ships system-wide managed settings: a curated allow-list of safe HEP commands + the ATLAS env
Any harness that reads them inherits the facility’s guardrails

Why it matters

The facility, not the user, decides what an agent may do here.

Open question

Ship this with the facility (Puppet) or via a centralized marketplace? We’ll come back to it.

managed-settings.d/10-uchicago-af.json

{
  "permissions": { "allow": [
    "Bash(condor_watch_q*)", "Bash(condor_q*)",
    "Bash(condor_submit*)", "Bash(condor_release*)",
    "Bash(pixi*)", "Bash(xrdcp*)", "Bash(lsetup*)",
    "Bash(asetup*)", /* …safe HEP cmds */
  ] },
  "env": {
    "ATLAS_LOCAL_ROOT_BASE": "/cvmfs/atlas…",
    "SITE_NAME": "AF_200"
  }
}

Shipped system-wide by our config management (Puppet) to login01–04 · live since 2026‑05‑19. Excerpt; allow-list trimmed.

What you can use today · the key idea

The facility teaches the agent about itself

Auto-loaded into every session: /etc/claude-code/CLAUDE.md

Path	Quota	Use for
`/home/$USER`	100 GB, backed up	Code, scripts, condor files
`/data/$USER`	5 TB, not backed up	ROOT files, datasets
`/scratch`	node-local	Ephemeral, copy out before exit

# live monitor (DO NOT use: watch condor_q)
condor_watch_q
# why is a job held?
condor_q -hold
# XCache-optimized reads (SITE_NAME=AF_200)
rucio list-file-replicas <scope>:<name> --protocol root

This is the whole thesis

A generic model can’t know that /home is backed up but tiny, that /scratch vanishes, or that XCache exists. We write it down once, and the agent inherits it.

Storage layout, quotas, backup policy
Scheduler etiquette & debugging recipes
Data access: XCache, grid proxies, Rucio
pixi for new work; how to get help

The analyzer’s own CLAUDE.md stacks on top.

What you can use today · distribution

The USATLAS marketplace: installing the know-how

> /plugins marketplace add usatlas/marketplace

Three plugins (each bundles skills, subagents & hooks) at github.com/usatlas/marketplace:

analysis-facilities

Facility skills: HTCondor, JupyterLab, XCache, Rucio, ServiceX, Coffea-Casa, Triton (for UChicago, BNL, SLAC).

atlas

5 subagents + 25+ skills: Rucio/AMI/Open-Data MCPs, pyhf, cabinetry, TRExFitter, TopCPToolkit, FastFrames, Scikit-HEP.

hep-python-tools

Good-practice Python: building command-line tools, self-contained scripts, packaging, testing, and code quality.

Plugins install in one line; the tools don’t

The agent know-how adds instantly; the tools come via setupATLAS + pixi. No /cvmfs? pixi + conda-forge covers most — incl. ARM builds & supply-chain security (e.g. pixi dependency cooldowns) via HEP Packaging Coordination (cf. Feickert, CHEP 2026).

Open question

Where should AF knowledge live: shared marketplace or with the facility (Puppet)? Today: both.

Context shouldn't be re-invented per user. I made this, usatlas/marketplace is a plugin registry anyone adds in one line. Three plugins: analysis-facilities (per-site skills, including UChicago), atlas (the physics, 25+ skills), hep-python-tools (Python hygiene). It's the unit of sharing across facilities. One honest caveat on the "pip-install" dream: the PLUGINS install in one line, but the actual TOOLS (ROOT, Athena, coffea, the whole zoo) don't pip-install cleanly. We rely on setupATLAS plus pixi. For those who don't know it: pixi is a fast, modern, and reproducible software environment management tool for conda (any language!) and Python packages. Crucially, without /cvmfs you can still get most of that stack via pixi and conda-forge. Big shoutout to the HEP Packaging Coordination effort, which is doing the ARM builds and supply-chain hardening, including pixi's dependency cooldowns that hold back freshly-published versions to blunt compromised-package attacks. Matthew Feickert's SciPy proceedings cover pixi well. That's what makes this tractable off-grid. And the recurring question: should facility-specific knowledge live here in the shared marketplace, or ship with the facility via Puppet? Today it's in both, duplication that bloats context and can drift. A discussion seed.

What you can use today · data & metadata MCPs

Find datasets & metadata by asking

rucio-mcp: dataset access (find, inspect, replicas, download)

ami-mcp: metadata & cross-sections

also atlasopenmagic, glance via stare

Run it locally (your x509)

pixi exec rucio-mcp serve --read-only
pixi exec ami-mcp serve
uvx atlasopenmagic-mcp serve

rucio-mcp alone exposes 50+ tools; --read-only blocks every write.

Or the hosted server (OIDC)

# multi-site, one HTTP endpoint
rucio-mcp.af.uchicago.edu
  · atlas · cms · dune · escape

Browser OIDC login via an OAuth bridge. Not every experiment supports OIDC yet (FTS/RSE-dependent); the bridge manages the flow.

Still being figured out

OIDC UX still varies by VO: token lifetime & refresh differ; audiences differ (atlas/dune: no custom issuer; cms/escape: extra token-exchange). And on a hung call, who times out: MCP or LLM?

Observability

rucio-mcp exports Prometheus metrics: LLM usage per-site & per-tool (Grafana).

Auth reuses grid creds (x509) or Keycloak OIDC · OAuth bridge docs · running workflows/jobs is panda-mcp / condor-mcp’s job (not yet at AF).

Two clarifications on my earlier framing. First, the scope: rucio is dataset ACCESS: find, inspect, replicas, download. "Talk to the grid / run my workflow" is actually panda-mcp's job, Brian Bockelman's space. We don't have it at the AF yet but will document it once confirmed. Second, two ways to run these. Locally with your x509 proxy, pixi exec, one line, read-only by default because rucio-mcp has 50+ tools and you don't want an agent deleting replication rules. OR the hosted server at rucio-mcp.af.uchicago.edu, one HTTP endpoint serving four experiments, with a browser OIDC login. Be honest about the rough edges, because they're real open questions: even with OIDC the token UX varies by VO, lifetimes and refresh/offline-access differ, and the audiences differ, so atlas and dune use no custom issuer while cms and escape need a custom-issuer token-exchange. And we haven't settled how to handle a hung request: socket timeout on the MCP side, or have the LLM close and retry? How do we enforce that across all LLM clients? On the plus side, rucio-mcp exports Prometheus metrics so we can actually watch LLM usage per-site and per-tool. There's a public Grafana. Workflow/job execution itself is panda-mcp or condor-mcp territory, not here.

What you can use today · Jupyter MCP

Drive your AF notebook from anywhere

An MCP server runs inside your JupyterLab pod. Any agent that reaches it (Claude Code, Claude Desktop, even your phone) can insert_execute_code_cell, read_cell (incl. plots), execute_code in the live kernel, use_notebook.

≠ Jupyter AI (agent inside the notebook server), which may lose out to the VS Code workflow many users prefer.

claude mcp add jupyter --transport http \
  .../user/<you>/mcp --header "Authorization: Bearer <tok>"

The convenience/security tradeoff

Servers have finite lifetimes (equitable sharing) → you re-mint URL + token each respawn. Idea: a Keycloak-OIDC reverse-proxy → one token, pick your server in-browser (but one at a time).

Jupyter MCP streamable-HTTP extension · full-size demo video: Google Drive.

My favorite demo. The Jupyter MCP runs inside your JupyterLab pod, so the chat happens in ANY agent that reaches the endpoint, I've driven a live AF notebook from my phone. Kick off a long cell from your laptop, check it from your phone on the bus. Distinguish this from Jupyter AI, which puts the agent INSIDE the notebook server. That's a different bet, and honestly it may lose out, a lot of our users prefer the VS Code workflow over JupyterLab. The honest caveat: server pods have finite lifetimes for fair sharing, so when yours respawns you re-mint the URL and bearer token. It's a security/convenience tradeoff. One idea I'm weighing: a Keycloak-OIDC reverse-proxy so you keep a single token and just pick which server it routes to in the browser, the cost being you can only drive one server at a time.

What you can use today · OpenWebUI

A web chat that already knows the AF

Not everyone wants a terminal. OpenWebUI is a browser chat at af.uchicago.edu/chat, backed by a facility knowledge base.

Zero install, just a URL
Answers based on AF docs & capabilities
Same knowledge, friendlier UI/UX

Real exchange · user xju, Feb 2026

Q: “Are there NVIDIA Triton models available at AF?”

A: Yes. Triton in the AF k8s cluster, serving from CVMFS + an S3 repo (s3://triton-models/<user>/); upload yours, then ask admins to enable it.

“this… is not awful and it is correct” — Giordon

OpenWebUI AF assistant landing at af.uchicago.edu/chat

Even the suggested prompts are AF-specific: quota, GPUs, Varnish cache-hit at MWT2…

Assistant built by Ilija Vukotic (slides) · it also pointed xju to xAOD/coffea examples · Triton→athena: Vakho Tsulaia, CHEP 2026.

What you can use today · agents on your behalf

An agent watches the cluster for you

Daily HTCondor cluster report posted to Slack by the AI monitor agent

Every morning a scheduled agent scans HTCondor and posts a report to Slack: state summary, hold-reason breakdown, top users, jobs held >7 days, recommended actions.

Today: 448 held of 14,671 (3.1%), below the 20% alert line. Holds auto-categorized: wall-time, output-transfer, OOM, …

No HTCondor MCP, just our own skills around the condor CLI.

Two lanes on OpenClaw: Shannon = privileged, trust-earned runbooks (this bot, human-gated); Elwood = user-space, where facility rules apply (framing). A dedicated HTCondor MCP exists (Bockelman); deeper → next talk.

Some agents work even when you're not there. Every morning a scheduled agent scans HTCondor and posts this report to Slack, state summary, hold-reason breakdown, top users, jobs held over 7 days, recommended actions. Today: 448 held of ~14,700, 3.1%, healthy. Worth naming the framing: it all runs on OpenClaw, and there are two lanes. Shannon is the privileged lane, agents that run trusted runbooks, and that trust is earned, this human-gated condor bot is a Shannon runbook. Elwood is the user-space lane, the analyzer's assistant, where the facility's rules are applied. We introduced this framing at a facility R&D meeting, link in the footnote. It auto-categorizes holds using encoded knowledge, and uses NO HTCondor MCP, just our own skills around the condor CLI. Bockelman has a real HTCondor MCP if you want that route.

What you can use today · agents on your behalf

…then drafts your fix, you approve

Drafted held-job email posted to Slack, approved by a human, then sent by the agent

When your jobs are stuck it drafts an email to you with a specific fix. Real OOM case: used the actual observed memory vs the limit → raise request_memory; write big outputs to /data.

Human-in-the-loop

Drafted → a human replies approve in Slack → sent. Never autonomous (note the stand-down banner).

Good advice because it carries AF knowledge (hold-reason taxonomy, c111/c113, /home…); same lesson as the facility CLAUDE.md.

Thinking ahead

How do we tag a batch job with its experiment so the agent gives the right context? ATLAS-only today, but for FCC / EIC / DUNE / Belle‑II the same agent can’t assume everyone’s ATLAS.

The architecture

Why it works

The core insight

Swap the model freely: context makes it useful here

ModelOpus · GPT-5 · local / NRP-hosted open weights → cost control

HarnessClaude Code · Codex · Copilot

Facility context & ATLAS toolsstorage, scheduler, Rucio, FastFrames, coffea, pyhf

An assistant that’s actually useful hereroutes ntuples to /data, not /home

The first two blocks are interchangeable: pick any model (or self-host to cut cost), any harness. Every win today came from the third: the managed CLAUDE.md, the marketplace plugins, the Rucio/AMI MCPs, the condor bot’s HTCondor.md.

Pragmatically too: model & harness evolve fast, so we can’t run a cluster and chase that. Facility context is the tractable scope.

Self-hosting option for open-weight models: NRP-hosted LLMs (nrp.ai/llms).

If you take one slide home, this one. And I'm dropping the buzzwords to be concrete. Three ingredients. The model: Opus, GPT-5, or a locally-hosted open-weight model, which matters for cost at HL-LHC scale. The harness: Claude Code, Codex, Copilot. Both of those you can swap freely; they're not where our work is. The third block is the facility context and ATLAS tools, and THAT is where every win in section one came from: the CLAUDE.md, the marketplace, the MCPs, the agent's memory file. So: models keep improving on their own, harnesses are a preference, and you can self-host to control cost. Our job, the facility's job, is the context. That's why this is a facility conversation, not just a vendor one. There's also a pragmatic reason to draw the line here: the model and harness layer moves incredibly fast, and we genuinely cannot both operate a cluster and stay on top of that churn. Narrowing our scope to facility context and tools keeps this a tractable problem. We own the part that's stable and actually ours.

The core insight · the flip side

No context? It should stay quiet.

What actually happened

Our cluster agents confidently recommended Lustre/MDS tuning for a filesystem that is actually Ceph. They had zero Ceph context. Plausible, fluent, and wrong.

“If we’re supposed to rely on the agents, they need to be accurate — otherwise they can’t be trusted.” Judith, #analysis-facility

“If the agent has no Ceph context, it shouldn’t make recommendations or ping people — we’d just chase irrelevant suggestions.” Farnaz, same thread

The rule we landed on: if it isn’t grounded in real facility context, it stays silent rather than speculates. Context isn’t just what makes the agent useful — it’s what makes it safe to trust.

The flip side of the thesis, and it's a real scrape. Early on, our cluster-monitoring agents recommended Lustre and MDS tuning for a problem on a filesystem that's actually Ceph. They had no Ceph context, so they pattern-matched to the storage they DID know, confident, fluent, and wrong. The team's reaction matters. Judith: if we're going to rely on these, they have to be accurate or they can't be trusted. Farnaz: if it has no Ceph context, it shouldn't be making recommendations or pinging people, we'll just chase irrelevant noise. Ilija's point: those agents had literally zero Ceph instructions. So the rule we adopted: no grounded context, stay quiet. This is the same thesis from the other side, context isn't only what makes the agent useful, it's what makes it safe to trust. And it's exactly why "where does the knowledge live and who keeps it true" is a real question, not bookkeeping.

The architecture · abstraction

The agent speaks intent, not implementation

What the physicist & agent say

(meta)data_tool“get me this dataset + its cross-section”

transform_tool“turn DAODs into histograms”

batch_tool“run this across the cluster”

inference_tool“fit and get a limit”

What the facility wires underneath (swappable)

rucio-mcp · ami-mcp · Open Data

coffea · uproot · FastFrames · ServiceX · Athena

HTCondor · Dask · Slurm · PanDA · REANA · lxbatch · graphed-org

pyhf · cabinetry · TRExFitter · optimistix · SBI

The agent never names “coffea” or “condor.” It asks for an outcome; the facility decides the backend. That indirection is what makes the same agent portable across facilities.

The architecture · the “Elwood” vocabulary

Reasoning engine & playbook

Reasoning engine

The framework: orchestration, tool routing, execution, guardrails. Model- and experiment-agnostic.

sharedwritten once

Playbook

Everything facility-/experiment-specific: which MCPs & backends, the system prompt / CLAUDE.md, the knowledge corpus & examples.

swappable

Think of it as a sports team

A shared glossary keeps the team’s language consistent. (“Elwood” is our internal name, nothing public yet.)

Sports-team analogy: coach=reasoning engine, playbook=playbook, athletes=agents, equipment=services, stadium=gateway

Image: Google Gemini 3.1 Pro

Third idea, and let me fix my own vocabulary, we've moved past "codec/codebook" to clearer terms. The reasoning engine is the framework: orchestration, routing, guardrails, written once, agnostic to model and experiment. The playbook is all the context: which tools, the system prompt, the knowledge, the examples. Quick orientation, because the ontology has more pieces: think of our system, internally we call it Elwood, nothing public yet, like a sports team. The reasoning engine is the coach, the playbook is the game plan, agents are the athletes, services are equipment kits, the gateway is the stadium entrance. We keep a written glossary so the whole team uses the same words, which matters a lot when you're instructing agents to help build the infrastructure. Next slide: what swapping the playbook actually looks like.

The architecture · portability, concretely

Same engine, different playbook

Only the playbook changes per facility/experiment. The reasoning engine (the “coach”) is untouched. The headline difference is usually data_tool:

playbook/atlas @ UChicago AF

data_tool:   rucio-mcp     # grid
transform:   coffea, FastFrames
batch_tool:  htcondor
inference:   pyhf, TRExFitter

playbook/us-hfcc

data_tool:   eos + https   # browse/dl
transform:   uproot
batch_tool:  local / slurm
inference:   pyhf

The per-tool skills are facility-independent (written once, reused everywhere), so each experiment/facility only writes its thin playbook, not a whole stack. The goal: stop re-developing the same agentic tooling in parallel.

You’ve already met the ATLAS playbook in pieces: the managed CLAUDE.md, the marketplace plugins, the agent’s HTCondor.md.

Still open: how to link & reuse others’ skills without copying them in: usatlas/marketplace#60 (options exist; no best practice yet).

Concretely, here's a playbook swap. On the left, ATLAS at UChicago: data_tool delegates to rucio-mcp because we're on the grid; transforms via coffea or FastFrames; batch via HTCondor; inference via pyhf or TRExFitter. On the right, a different facility, say a US Higgs-factory effort, whose data_tool today is just "download or browse files off EOS or the web," batch is local or Slurm. Same reasoning engine, same high-level tool names. Only the playbook differs, and the biggest difference is how data_tool is wired. The key reuse point: the skills for each tool are facility-INDEPENDENT, written once, shared, so a new facility only writes a thin playbook, not the whole stack. That's the whole goal: stop every experiment and facility re-developing the same agentic tooling in parallel. And you've already met the ATLAS playbook in pieces: the CLAUDE.md, the marketplace plugins, the agent's memory file. It's just facility context as a versioned artifact. One honest caveat on "reuse": HOW you reuse someone else's skill without just copying it into your repo is genuinely unsolved. There's an open issue on the marketplace, #60, with a few options and no agreed best practice yet. That's part of the same modularity conversation.

The architecture · where it’s heading

One conversation, the whole analysis

The target: the analyzer describes the physics; the agent runs the loop: find data, generate code, submit jobs, recover from errors, make histograms, and fit, re-running as needed.

Every box here is a tool we’ve shown today. Stitching them into one supervised loop is the work ahead.

“Elwood” is our internal project name, nothing public yet.

Elwood end-to-end: chat-driven analysis across AMI/Rucio, code generation, HTCondor, monitoring, error recovery, histograms, and statistics

To close the architecture section, here's where it's heading. An analyzer types a physics request, "analyze the ttbar cross-section in the 2022 open dataset with the recommended b-tagging working point", and the agent runs the whole loop: dataset search through AMI and Rucio, code generation, job submission to HTCondor, monitoring, error recovery, histograms, statistical analysis, refining and re-running as needed. Be honest about status: every box in this picture is a tool I've shown you today, working. What's still ahead is stitching them into one supervised end-to-end loop. That's the work, and it's why the architecture, high-level tools, reasoning engine, playbook, matters. And again, Elwood is just our internal name for this; nothing public yet.

The portable future

Toward HL-LHC
analysis facilities

The portable future · MCP granularity

One MCP, or many? Three topologies

Live today: rucio-mcp.af.uchicago.edu/site/{atlas·escape·cms·dune}

what we run

One server, many sites

you

→

rucio-mcp/site/atlas·escape…

one deploy; add rucio-atlas, rucio-escape separately in your harness

per-VO auth inside one process

alt

One server per site

rucio-atlas

rucio-escape

clean isolation; per-site auth & scaling

N servers to run & maintain

alt

Single gateway

mcp.af…→ sub-MCPs

one endpoint & identity

token sprawl / choke point; FastMCP = 1 auth provider/server

Same agent, same tools across sites; only the per-site auth/playbook differs. Granularity vs identity is a genuine open trade-off → discuss.

This is deployed, not a slide: one hosted rucio-mcp at rucio-mcp.af.uchicago.edu serves four experiments, path-routed: /site/atlas, escape, cms, dune. But HOW you carve up MCPs is a real design choice, so here are three topologies with their caveats. We run one server, many sites: a single deploy, and an analyzer adds rucio-atlas and rucio-escape as separate entries, but all the per-VO auth lives in one process. The alternative is one server per site: cleaner isolation and independent auth/scaling, but now you operate N servers. Or a single gateway fronting sub-MCPs: one endpoint and identity, but you inherit token-delegation sprawl and a choke point, and FastMCP composition only allows one auth provider per server, which is why something like PandaMCP delegates auth to the PanDA server. There's no obviously-right answer. That's the point, and it's a discussion seed. (Aside: this whole MCP-for-science pattern is also what DOE's Genesis Mission, with Anthropic, is pushing. We're a bottom-up pilot of it.)

Let’s discuss · the open questions

What should we decide together?

Granularity & identity

One MCP per service, or a single gateway delegating identity? (FastMCP allows one auth provider/server; PandaMCP delegates to PanDA.)

Where do knowledge & skills live?

Facility knowledge: shared marketplace vs with the facility (Puppet). And tool skills: a TopCPToolkit skill in the marketplace, or with the framework? Today: scattered.

Who maintains the playbook?

Each facility writes its own context, but who reviews it and keeps it true over time?

What may an agent do, and how sandboxed?

Write/submit/email need a human gate. We isolate with k8s pods; no-k8s sites → bubblewrap?

Who pays for inference?

At HL-LHC scale: hosted frontier models vs self-hosted open weights on facility GPUs?

Across AFs & experiments?

A shared commons across AFs. And one facility may serve many experiments w/ heterogeneous frameworks. Keep facility vs experiment context modular: where’s the line?

My bet: the model and harness are the easy part. Context, identity, and trust are the facility’s homework.

What’s bubblewrap?

The sandboxing tech behind Flatpak: it isolates a process from the rest of the Linux system, with no access to your files, network, or hardware unless you explicitly grant it. A lightweight way to fence in an agent’s tool execution on facilities without Kubernetes.

This is what I actually want from the next 10 minutes. Six questions, none of which a vendor answers for us. Granularity and identity: one MCP per service or a single gateway delegating identity? There's a real constraint, FastMCP composition allows one auth provider per server, which is why PandaMCP delegates auth to PanDA. Where should facility knowledge live, the shared marketplace or shipped with the facility via Puppet? Today it's in both and that duplication bloats context. Who maintains each playbook so it stays true? What may an agent DO without a human, and how do we sandbox it, we use k8s pods for isolation; a site without k8s might use bubblewrap. Who pays for inference at scale? And should there be one shared commons across all AFs, who curates it? My bet, to provoke you: the model and the harness are the easy part. Context, identity, and trust are our homework. Let's argue.

Thank you

Generic LLM → facility-aware collaborator

Swap the model and the harness freely. The facility context is what makes it useful here, and it ports to the next AF.

A team effort at the UChicago AF

Ilija Vukotic — OpenClaw/Shannon, ES MCP

Fengping Hu — Kubernetes, Keycloak, Jupyter AI

Rob Gardner — vision, marketplace, Genesis

Judith Stephen — HTCondor expertise, runbooks

Aidan Rosberg — RP1 (Infra-as-Config), core dev/maintainer

David Jordan — hardware ops: networking & Kubernetes

Farnaz Golnaraghi — hardware ops: storage

What’s next: port these lessons (incl. agentic AI) to the Open Data Facility (ODF) and RP1.

Try it / read more

/plugins marketplace add or npx skills add usatlas/marketplace
Docs: usatlas.github.io/af-docs/ai
github.com/usatlas/marketplace · rucio-mcp · ami-mcp · stare
Shannon/Elwood/OpenClaw framing (R&D meeting)

Now, let’s discuss.

Giordon Stark · kratsg · USATLAS F2F @ HTC26 · 2026‑06‑09. In the spirit of Gordon Watts’ CHEP 2026 ecosystem framing.

To close: we turned a generic LLM into a facility-aware collaborator, and the thing that did it, context, is exactly the thing that ports to your facility next. Genuinely a team effort: Ilija on the agent infrastructure and ES MCP, Fengping on Kubernetes, Keycloak, and Jupyter AI, Rob driving the vision, the marketplace, and the Genesis tie-in, Judith on HTCondor expertise and the runbooks, Aidan on RP1 and infrastructure-as-config, David and Farnaz on hardware operations, David on networking and Kubernetes, Farnaz on storage, plus the wider AF team. Everything's installable today: add the marketplace in Claude Code, or npx skills add to cherry-pick skills; docs at af-docs/ai. And looking ahead: the plan is to translate everything we've learned building the AF, including this agentic AI work, into the Open Data Facility, the ODF, link on the slide. And now I'd really like the discussion. Back to those six questions.

LLM-Assisted Analysisat the UChicago AF

“Where do my TopCPToolkit outputs go?”

A generic LLM

An LLM that knows the AF

The vocabulary, fast

Agent

MCP server

Skill

Agent harness

The ecosystem framing (Watts)

What an analyzercan use today

You bring the agent; we ship the rules

The facility teaches the agent about itself

The USATLAS marketplace: installing the know-how

analysis-facilities

atlas

hep-python-tools

Find datasets & metadata by asking

Drive your AF notebook from anywhere

A web chat that already knows the AF

An agent watches the cluster for you

…then drafts your fix, you approve

Why it works

Swap the model freely: context makes it useful here

No context? It should stay quiet.

The agent speaks intent, not implementation

Reasoning engine & playbook

Reasoning engine

Playbook

Same engine, different playbook

One conversation, the whole analysis

Toward HL-LHCanalysis facilities

One MCP, or many? Three topologies

One server, many sites

One server per site

Single gateway

What should we decide together?

Granularity & identity

Where do knowledge & skills live?

Who maintains the playbook?

What may an agent do, and how sandboxed?

Who pays for inference?

Across AFs & experiments?

Generic LLM → facility-aware collaborator

A team effort at the UChicago AF

LLM-Assisted Analysis
at the UChicago AF

What an analyzer
can use today

Toward HL-LHC
analysis facilities