
Some enterprises are best served by customizing larger models to fit their needs, but many companies plan to do so Create your own modelsA project that will require access to a GPU.
Google Cloud wants to play a bigger role in the model-building journey of enterprises with its new service, vertex ai trainingThe service provides enterprises wishing to train their own models access to a managed Slurm environment, data science tooling, and any chips capable of large-scale model training.
With this new service, Google Cloud hopes to draw more enterprises away from other providers and encourage the creation of more company-specific AI models.
While Google Cloud has always offered the ability to customize its Gemini models, the new service allows customers to bring their own models or customize any open-source model Google Cloud hosts.
Vertex AI training puts Google Cloud directly against companies like corewave And Lambda Labsas well as its cloud competitors AWS And Microsoft Azure,
Jaime de Guerre, senior director of product management at Google Cloud, told VentureBeat that the company is hearing from many organizations of varying sizes that they need a way to better optimize compute in a more reliable environment.
“What we’re seeing is an increasing number of companies that are building or adapting large-generation AI models to introduce product offerings built around those models or to help empower their business in some way,” De Guerre said. “This includes AI startups, technology companies, sovereign organizations building models for a particular region or culture or language, and some larger enterprises that may build it into internal processes.”
De Guerre said that while anyone can technically use the service, Google is targeting companies planning large-scale model training rather than simple fine-tuning or those adopting LoRA. Vertex AI Services will focus on long-running training tasks involving hundreds or even thousands of chips. Pricing will depend on how much computation the enterprise requires.
“Vertex AI is not about adding more information to the training context or using RAGs; it is about training a model where you can start with completely random weights,” he said.
Model optimization is increasing
Enterprises are recognizing the value of creating optimized models beyond recovering LLMs through recovery-augmented generation (RAG). Custom models will be able to learn more in-depth company information and respond with answers specific to the organization. companies like Arcee.ai have begun Presenting your model For customization for customers. Adobe recently announced a new service that allows enterprises Retrain the fireflies to their specific needsorganizations like FICOwho create small language models specific to the finance industryOften purchasing GPUs at significant cost to train them.
Google Cloud said Vertex differentiates itself by providing access to a larger set of AI training chips, services to monitor and manage training, and expertise learned from training Gemini models.
Some of Vertex AI Training’s early customers include AI Singaporea consortium of Singaporean research institutes and startups that created the 27-billion-parameter SEA-LION v4, and sales force‘s AI research team.
Enterprises often have to choose between taking an already built LLM and improving it or creating their own model. But building an LLM from scratch is usually unattainable for smaller companies, or doesn’t make sense for some use cases. However, for organizations where a completely custom or from-scratch model makes sense, the issue is gaining access to the GPUs needed to run the training.
Model training can be expensive
Training a model, De Guerre said, might difficult and expensiveEspecially when organizations compete with many others for GPU space.
Hyperscalers like AWS and Microsoft — and, yes, Google — have said that their huge data centers and racks and racks of high-end chips provide the most value to enterprises. Not only will they have access to expensive GPUs, but cloud providers often offer full-stack services to help enterprises move into production.
Services like CoreWeave gained prominence for offering on-demand access NVIDIA The H100s provides customers with flexibility in compute power when building models or applications. It has also given rise to a business model in which companies with GPUs rent out server space.
De Guerre said Vertex AI training is not just about providing access to train models on a barebones computer, where the enterprise rents a GPU server; They also have to bring their own training software and manage time and failures.
“It’s a managed Slurm environment that will help with all task scheduling and automated recovery of failed jobs,” De Guerre said. “So if the training job slows down or stops due to a hardware failure, the training will automatically restart very quickly based on the automated checkpointing we do in managing checkpoints to continue with very little downtime.”
He said it provides higher throughput and more efficient training for large-scale compute clusters.
Services like Vertex AI Training can make it easier for enterprises to build specific models or completely customize existing models. Still, just because the option exists doesn’t mean it’s right for every enterprise.

