Providing end-to-end infrastructure for AI model expansion and deployment
Cloudflare enables building full-stack AI applications on its network.
Cloudflare announced on the 4th that it is introducing Workers AI, a developer platform for building AI applications.
The Cloudflare platform provides security, compliance, and speed built right in, enabling developers to quickly deploy production-ready applications. Developers do not need to manage separate infrastructure.
As enterprises of all sizes aim to optimize the cost of launching AI-based applications, Cloudflare’s Workers AI is unveiled as the industry’s first large-scale serverless AI capable of doing so.
Cloudflare said the partnership “will provide access to GPUs operating on a massive global network, bringing AI inference closer to the user, reducing latency in the end-user experience.” It also said it would “reduce latency in the end-user experience when validating data, ensuring AI inference is closer to the user.”
When combined with our data distribution suite, which also helps control where data is verified, Workers AI enables customers to anticipate potential compliance and regulatory requirements that may arise as governments create policies around the use of AI.
Cloudflare’s privacy-first approach to application development helps ensure that data used for inference is not used for LLM learning.
Cloudflare now supports a catalog of models to help developers get up and running quickly, including use cases like LLM, speech-to-text, image classification, and sentiment analysis.
Cloudflare’s new vector database, Vectorize, accelerates AI workflows, enabling developers to build full-stack AI applications entirely on Cloudflare. It supports the entire process from creating, indexing, and storing query data using built-in models.
Cloudflare said, “With Workers AI and Vectorize, developers no longer need to glue together multiple pieces to power their apps with AI and machine learning; they can do it all in one platform.”
Vectorize can perform vector queries closer to the user, reducing latency and overall inference time. Developers can save embeddings generated by OpenAI or Cohere, allowing teams to take advantage of vectorize when scaling AI apps to production by bringing in embeddings they already have.
Cloudflare has introduced AI Gateway to improve the reliability, observability, and scalability of AI applications. AI Gateway provides developers with unified visibility into AI traffic, including the number of requests, number of users, cost of running apps, and duration of requests. Caching and rate limiting allow developers to cache answers to questions, reducing the need to make expensive API calls repeatedly. Rate limiting helps manage growth and costs by managing malicious actors and traffic spikes, and gives developers control over how their applications scale.
“Cloudflare provides all the infrastructure developers need to build scalable AI-powered applications and bring AI inference close to the user,” said Matthew Prince, CEO and co-founder of Cloudflare. “We’re investing in ensuring every developer has easy access to powerful, affordable tools to build the future. “Workers AI” helps build production-ready AI environments that are efficient and economical, in a matter of days, rather than the typical weeks or months it would take,” he said.
“It shows us a lot of potential,” he said.