Latin America. Akamai Technologies has marked an important milestone in the evolution of artificial intelligence by introducing the world's first implementation of the NVIDIA AI Grid reference design.
By integrating NVIDIA's AI infrastructure into its own and leveraging intelligent workload orchestration across its network, Akamai aims to move the industry beyond siloed AI factories to a unified, distributed network for AI inference.
This is a significant step in the evolution of Akamai Inference Cloud, introduced late last year. Akamai, the first company to launch the AI Grid network, is deploying thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, providing a platform that enables enterprises to run agent-based and physical AI, with the responsiveness of local computing and the scale of the global web.
"AI centers are purpose-built for training and edge model workloads, and centralized infrastructure will continue to deliver the best results in terms of tokenomics for those use cases," said Adam Karon, COO and general manager of Akamai's Cloud Technology Group. "However, real-time video, physical AI, and highly concurrent personalized experiences demand inference at the point of contact, not a round trip to a centralized cluster. Our AI Grid intelligent orchestration enables AI factories to extend inference outward and leverage the same distributed architecture that revolutionized content delivery to route AI workloads across 4,400 locations, at the right cost, at the right time."
The architecture of "Tokenomics"
The core of AI Grid consists of an intelligent coordinator who acts as a real-time intermediary for AI requests. By applying Akamai's expertise in optimizing application performance to AI, this workload-sensitive control plane optimizes tokenomics, improving cost per token, time to first token, and performance.
One of the key features that differentiate Akamai is its ability for customers to access lean or dispersed models across its massive global footprint, which is a huge cost and performance advantage for the long tail of AI workloads. For example:
● Cost-effectiveness at scale: Enterprises can dramatically reduce inference costs by automatically allocating workloads to the right compute tier. This coordinator applies techniques such as semantic caching and intelligent routing to direct requests to appropriately sized resources and reserves premium GPU cycles for workloads that need them. All of this is underpinned by the Akamai Cloud, built on an open-source infrastructure with extensive output allocations to support data-intensive AI operations.
● Real-time responsiveness: Game development studios can offer AI-powered non-playable character interactions (NPCs) that keep the player immersed in a matter of milliseconds. Financial institutions can provide personalized fraud detection and marketing recommendations in the period from login to first screen. Broadcasters can transcode and dub content in real-time for global audiences. These results are made possible by Akamai's globally distributed edge network, which has more than 4,400 locations with built-in caching, serverless edge computing, and high-performance connectivity, and processes requests at the user touchpoint, avoiding the delay of round-trip to origin-dependent clouds.
● Production-quality AI at the core: Large language models, continuous post-training, and multimodal inference workloads require sustained, high-density computing that can only be provided by a dedicated infrastructure. Akamai's clusters of thousands of GPUs, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, deliver the concentrated power needed for the most demanding AI workloads and complement the distributed edge with centralized scale.
Continuity of Computing
Built on NVIDIA AI Enterprise and leveraging NVIDIA Blackwell architecture and NVIDIA BlueField DPUs for hardware-accelerated networking and security, Akamai can manage complex service level agreements (SLAs) between edge and core locations.
● The Edge (with more than 4,400 locations): Delivers fast response times for physical AI and autonomous agents. You'll leverage semantic caching and serverless capabilities, such as Akamai Functions (WebAssembly-based computing) and EdgeWorkers, to deliver model affinity and stable performance at the user touchpoint.
● Akamai Cloud IaaS and dedicated GPU clusters: The basic public cloud infrastructure enables portability and cost savings for large-scale workloads, while pods equipped with NVIDIA RTX PRO 6000 Blackwell GPUs enable high-intensity post-training and multimodal inference.
"New AI-native applications demand predictable latency and greater cost-effectiveness on a global scale," said Chris Penrose, global vice president of Enterprise Development and Telco at NVIDIA. "By operationalizing NVIDIA AI Grid, Akamai is building the connective tissue for generative, agencyal, and physical AI, and is moving intelligence directly into data, which will usher in the next wave of real-time applications."
Powering the next generation of real-time AI
Akamai is already seeing strong early adoption of Akamai Inference Cloud in compute-intensive and latency-sensitive industries:
● Gaming: Studios are implementing sub-50 millisecond inferences for AI-powered NPCs and real-time player interactions.
● Financial services: Banks rely on the web to do hyper-personalized marketing and provide quick recommendations when customers log in.
● Media and videos: Broadcasters use the distributed network for AI-powered real-time transcoding and dubbing.
● Retail: Retailers are adopting the network for in-store AI applications and associated productivity tools at the point of sale.
Driven by enterprise demand, the platform has also been validated by leading technology vendors, including a four-year service contract worth $200 million for a cluster of thousands of GPUs in a data center built specifically for enterprise AI infrastructure at the metropolitan edge.
Scaling AI Factories from Centralized to Distributed
The first wave of AI infrastructure was characterized by huge clusters of GPUs in a few centralized locations optimized for training. However, as inference becomes the dominant workload and companies across industries focus on building AI agents, this centralized model faces the same scalability limitations that previous generations of internet infrastructure encountered with media distribution, online gaming, financial transactions and complex microservices applications.
Akamai is addressing each of these challenges with a fundamental approach: distributed networks, intelligent orchestration, and systems specifically designed to bring content and context as close to the digital touchpoint as possible. The result has been an improvement in the user experience and a higher return on investment (ROI) for companies that have adopted this model. Akamai Inference Cloud applies the same proven architecture to AI factories, enabling the next wave of scalability and growth by distributing dense compute from the core to the edge.
For businesses, this means being able to deploy context-aware AI agents that are adaptable in their responsiveness. For the industry, this represents a model of the evolution of AI factories from isolated facilities to a globally distributed utility.

