Take off with Modelplane

On this page

The open source control plane for AI models.

Modelplane extends Crossplane to manage AI model inference across a fleet of GPU clusters. Platform teams provision clusters and define hardware classes. ML teams deploy models and get back a unified, OpenAI-compatible endpoint. Modelplane handles fleet scheduling, multi-cluster routing, and infrastructure composition.

Deploy a model

 1apiVersion: modelplane.ai/v1alpha1
 2kind: ModelDeployment
 3metadata:
 4  name: qwen-demo
 5  namespace: ml-team
 6spec:
 7  replicas: 2
 8  workers:
 9    topology:
10      tensor: 1
11    template:
12      spec:
13        containers:
14        - name: engine
15          image: vllm/vllm-openai:v0.7.3
16          args:
17          - "--model=Qwen/Qwen2.5-0.5B-Instruct"

This deploys two replicas of Qwen 2.5 0.5B and produces a unified, OpenAI-compatible endpoint. The scheduler picks which clusters the replicas run on based on GPU capacity and the deployment’s topology.

How it works

Modelplane draws a clear boundary between two teams.

Platform teams create InferenceClusters describing their GPU fleet and InferenceClasses defining hardware recipes (GPU type, count). They set organizational metadata via labels on clusters: tier, region, provider.

ML teams create a ModelDeployment carrying everything needed to serve a model: the worker template, hardware topology, and replica count. Modelplane schedules each replica to a ready cluster with matching capacity, composes a ModelReplica per cluster, and creates ModelEndpoints for routing. A ModelService routes traffic across endpoints through a unified Envoy Gateway endpoint on the control plane.

Modelplane is the fleet-level control plane above the inference engine. It doesn’t compete with vLLM or Dynamo. It manages them across clusters.

Current status

Modelplane is at v0.1. It’s early and evolving fast.

	What works today
Cluster sources	GKE (provisioned), Existing (bring your own kubeconfig)
Serving engines	vLLM
Scaling	Scale ModelDeployment using `spec.replicas`
Routing	Unified OpenAI-compatible endpoint via ModelService

See issues labeled enhancement for what’s planned.

Getting started

Follow the getting started guide to deploy Modelplane on a local kind cluster and serve a model on GKE. The concepts page explains the key resources and how they relate.

The examples/ directory has annotated manifests covering the full workflow: gateway setup, cluster provisioning, and model deployments.

Development

Modelplane uses Nix for builds and the development environment. You don’t need Nix installed locally. See CONTRIBUTING.md for how to get set up, run checks, and submit changes.

Get involved

Contributions, bug reports, and feature requests are welcome.

Issues: GitHub Issues
Discussions: GitHub Discussions
Slack: #modelplane in the Crossplane workspace

License

Modelplane is under the Apache 2.0 license.