ML Engineer
Build and optimise the production machine learning systems that power enterprise agentic AI — from model serving and fine-tuning to RAG pipelines and inference at scale.
Express interest →About Deliverance AI
Deliverance AI is the production AI platform company. We exist because 94% of enterprises fail to scale AI — not for lack of ambition or budget, but for lack of platform, governance, and delivery capability. We close the gap between AI investment and AI production for enterprises across regulated industries including pharma and biotech, financial services, retail, telecommunications, and logistics.
Our proprietary nine-layer platform — built around three core capabilities: Clarity (see everything), Govern (control everything), and Accelerate (ship everything) — is deployed inside customer environments with live workloads running on it from day one. We are not a consultancy that writes strategy decks. We are not a staffing firm that lends contractors. We are an engineering-led company with proprietary platform IP, a growing agent marketplace, and 15+ pre-built AI blueprints that cut months off delivery timelines.
Our engagement model is simple: Assess (4 weeks), Deploy (12–16 weeks), Operate (ongoing). Dedicated engineering pods own delivery end-to-end. Every deployment compounds the platform. Every use case ships faster than the last. Governed, observable, and delivering value from day one.
About the role
The ML Engineer is the technical execution engine within our engineering pods. You will build the production ML pipelines, model serving infrastructure, RAG implementations, and inference optimisation that turns our platform blueprints into live, governed AI systems running real enterprise workloads.
This is not a research role. This is production engineering. You will work with models from across the open-source ecosystem, deploy them via our Inference Platform layer with autoscaling, build RAG pipelines and knowledge graphs through our Data & RAG layer, and ensure everything is governed through our Agent Governance and ARMOR security framework. You will be embedded within customer engagements as part of a dedicated pod — working alongside an Engagement Lead, AI Architect, and Data Engineer to get 2–3 workloads live within 12–16 weeks.
You will also contribute to the agent marketplace — building, testing, and improving the purpose-built agents (fraud detection, compliance automation, document processing, customer service) that customers deploy from our library. Every agent you improve compounds value across the entire customer base.
What you will do
- Build and maintain production inference pipelines using model serving frameworks (vLLM, TensorRT-LLM, Triton Inference Server), optimising for throughput, latency, and cost across GPU infrastructure.
- Implement and configure RAG architectures — vector databases, hybrid search, knowledge graphs — that connect enterprise knowledge bases to production AI agents while maintaining data governance.
- Deploy and configure agents from the Deliverance AI marketplace for customer-specific use cases, adapting pre-built patterns from our 15+ blueprints.
- Build model lifecycle management tooling — version control, A/B testing, canary deployments, automated rollback, and performance monitoring — integrated with our Five Registries.
- Fine-tune models for domain-specific customer requirements using techniques including LoRA, QLoRA, and PEFT, with full governance and audit trails.
- Contribute to the development and continuous improvement of agents in our marketplace — building, testing, and enhancing governed AI agents that ship across multiple customers.
- Collaborate with AI Architects on architecture decisions and with Data Engineers on data pipeline integration during the Deploy phase.
- Work within the ARMOR security framework to ensure all model deployments meet EU AI Act, NIST AI RMF, ISO 42001, and sector-specific compliance requirements.
What we are looking for
- 3+ years of experience as an ML Engineer, MLOps Engineer, or similar role working with language models in production environments.
- Strong Python skills and hands-on experience with PyTorch, Hugging Face Transformers, and at least one model serving framework (vLLM, TensorRT-LLM, or Triton preferred).
- Practical experience building RAG systems in production — vector databases (Pinecone, Weaviate, Milvus, pgvector), embedding models, retrieval strategies, and knowledge graph integration.
- Understanding of GPU compute, CUDA, and how to profile and optimise inference workloads for throughput, latency, and cost.
- Experience with Kubernetes and containerised ML deployments in cloud or hybrid environments.
- Familiarity with the open-source model ecosystem and the agentic AI landscape — you stay current with a rapidly evolving field.
- Good communication skills — you will work within engineering pods and need to explain model behaviour, trade-offs, and recommendations to both technical and non-technical stakeholders.
- A pragmatic engineering mindset that prioritises production reliability, governed deployments, and customer outcomes over theoretical perfection.