inference.sh

The agent runtime for production AI agents.

Overview

inference.sh is a production runtime platform for AI agents. Instead of building infrastructure (queues, retries, state persistence, auth), you focus on what your agent does.

Tagline: "run reliable agents in production without reinventing the platform."

Key Features

Runtime Layer:

Durable execution - Event-driven, state persists across invocations
Tool orchestration - 150+ apps as tools via one API
Observability - Real-time streaming and logs for every action
Pay-per-execution - No idle costs

Agent Primitives:

Deep-agents - Agents spawn sub-agents as tools
Human-in-the-loop - Approval workflows
Widgets - Dynamic UI generation (HTML/CSS inline)
Webhooks - Async API callbacks
Client tools - Execute on user's system
Memory - Built-in key-value store per conversation
Planning - Multi-step plans, resume after interruption
Structured output - Typed results

Infrastructure:

App library - 150+ pre-built tools
OAuth integrations - Real oauth with token refresh, encrypted storage
x402 payments - Agentic payments protocol for agents
Self-hosted option - Deploy in your VPC or on-prem
Multiple models - OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, 500+ more

Why It Exists

Common developer pain points:

"The agent framework is not the moat. The specialized tools are."
"6 hours debugging with zero error logs."
"Agent works in 2-minute quanta, long tasks need pipeline thinking."
"Users thought it crashed. If they see internal monologue, they wait."
"Spent 10 hours deploying on EC2… $13/mo per agent. Serverless: 10 cents."

Philosophy

"Trust is not a feature. It is a design constraint."

Automation should never be a black box. Systems should fail gracefully, not catastrophically.

Usage Levels

No code - Build in UI, use in UI with chat interface
Low code - Design in UI, call with code, embed anywhere
Full API - Create, manage, run agents via SDK

SDK:

pip install inferencesh
npm i @inferencesh/sdk

Pricing & Scale

Pay-per-execution model
Free tier available
Self-hosted enterprise option
x402 stats: $24M+ volume processed, 75M+ transactions

Technical Details

Built on Cloudflare's edge infrastructure
Durable execution with graph-backed decision chains
Chat history persistence across sessions
Event-driven architecture
Automatic retries and error handling

Company

Backed by EWOR
Enterprise support available
Self-hosted deployments supported
Active community on Discord

Coming Soon

Chat deployment - Agents live in Slack, Discord, Telegram, WhatsApp
Event triggers - Fire agents on messages, file uploads, reactions
Scheduled runs - Cron-style execution for background tasks