How much does it cost to build a domain-specific foundation model and LLM app?
Your boss walks up to you while you’re making a cup of tea and drops the big question: “How much does it cost to build an LLM?” The answer: It depends. I’ll tell you why.
To build an LLM, you start with a foundation model. So when someone says they’re “building an LLM,” what they’re really doing is building a foundation model specialized in language.
Building a foundation model and LLM app isn’t one-size-fits-all. The cost depends on 3 critical factors:
1. Whether you’re using an existing model or training from scratch
Using an existing model
Using an existing model saves you time, money, and infrastructure. These models are freely available and have already been trained (not just labeled or filtered) by someone else on massive amounts of text, like books and websites, and social media. The training process involves learning patterns in a specific language, like English or Spanish.
The terms “filtered”, “labeled” and “trained” often get mixed up, but they refer to very different stages in the machine learning process. Here are the main differences:
- Filtered: It means that bad or irrelevant content is removed to improve the quality of the data. For example: removing spam, profanity, duplicates.
- Labeled: This is when the data is tagged with information the model uses to learn specific tasks, like classifying, or translating text. For example, labeling emails as “spam” or “not spam”.
- Trained: This is when the model learns how language works by reading a huge amount of text. It looks for patterns , like which words often come together or what sentences usually look like. This part needs a lot of computer power and can take weeks or months using very expensive hardware (why didn’t I buy Nvidia shares?). For example, training a big language model like GPT might require 1 trillion tokens. A token could be a word, part of a word, or even punctuation. To give you an idea: 1 trillion tokens is about 750 billion words!
Training a model from scratch
Training a model from scratch means starting with zero knowledge. AI engineers call this “starting from random weights” (weight is the value that adjusts model predictions). You feed the model massive amounts of text so it can learn language patterns from scratch, using machine learning.
But wait, don’t you always need to start with an open-source model that already understands English? Nope. That’s another common misconception. You don’t have to, but it’s the smartest choice because it’s faster, cheaper, and easier to take an already-trained model (like LLaMA) and fine-tune it for your specific domain, like finance.
So why not always fine-tune an open-source model? Well, if you’re building a model in a completely different language, like Arabic, Chinese, or Hindi, then you may need to train it from scratch. And training from scratch isn’t cheap. It takes massive computational power, usually with high-end GPUs like NVIDIA A100s. Just to give you an idea: 1,000 A100s could cost around £10 million.
Enough with the jargon, just tell me the cost!
- £1M–£2M: You’re using an existing open-source model (like LLaMA) and fine-tuning it on your domain data. This typically includes some infra, APIs, and an MVP app.
- £2M–£6M: Here, you might be pretraining a smaller domain-specific model from scratch, or you’re heavily investing in infra, scalable APIs, and a production-ready app. Example: Bloomberg spent £2.5 million to train BloombergGPT from scratch. They didn’t use an existing model like LLaMA or GPT, they built their own 50 billion parameter foundation model in just under 2 months, using NVIDIA GPUs on AWS.
- £6M–£10M+: Training a true foundation model from scratch, like LLaMA-7B or GPT-3 class, starts around £6 million and can exceed the £10 million.
2. Lack of a high-quality domain dataset
Everyone talks about foundation models and infra, and forgets about the data. Without the right data, you’ve got nothing, nada, nichts, niente! And that’s where the biggest hidden cost lives.
If you want to compete with GPT-4, Claude, or Gemini, you’re looking at billions (yes, billions!), just for the training, not to mention infra, data pipelines, testing, evaluating, and everything else that comes with it.
If you don’t have a dataset, then you’ll need to crawl or license domain data, generate your own, clean it, label it, preprocess it, and make sure it’s compliant, especially if you’re dealing with sensitive data like health, finance, or law. This is where things get expensive. The total cost of data alone, including sourcing, cleaning, and scaling, can easily reach hundreds of millions if you’re aiming for GPT-4-level performance.
Just look at how much has been raised by the top AI companies (in U$S, as of 2025):
- OpenAI: $40 billion
- Databricks: $10 billion
- xAI (Elon Musk): $6 billion
- Anthropic: $3.5 billion
- Inflection AI: $1.3 billion
- CoreWeave : $1.1 billion
- Mistral Al: $645 million
- Perplexity Al: $500 million
- Cohere: $500 million
- Moonshot Al: $300 million
3. How big and scalable the product needs to be
Last but not least: what exactly are you building? Are you building an internal tool for your team, or a real product with millions of users? Because those two live on completely different planets.
If you’re building an internal tool, you can hack something together with a couple of AI/ML engineers, fine-tune an open model, deploy it behind a login, and call it a day. This shouldn’t cost more than £300K, if you know who to hire.
But if you’re building something that needs to scale, now you have to think about product infrastructure, reliability, latency, security, auth, compliance, and support. You’re not just running a model, you’re running a system. And the model might end up being the cheapest part of the whole thing.
So ask yourself: are you building a tool, a product, or a platform? Because each step up costs 10 times more, and demands you think 10 times deeper.
Want to learn more?
Are you a CPO, Head of Product, or Product/Program Manager curious about foundation models and LLMs? This introductory course from Google covers the basics: what LLMs are, where they can be applied, and how techniques like prompt tuning can help you get better results.
If you’re an engineer or technical person who enjoys looking under the hood and want to learn how Transformers and LLMs work, this video is for you. It gives you a solid conceptual understanding of language models, deep learning, and neural networks, without getting lost in math.