How to Evaluate an AI Investment in 2026
The standard VC evaluation playbook was built for SaaS. AI is structurally different. The questions that determine whether an AI company compounds in value or commoditizes within 24 months are not about churn rate, sales cycle, or NPS. They are about data ownership, model dependency, inference unit economics, and whether the product actually improves the longer a customer uses it. Most investors are not asking those questions systematically. This post is a framework for the ones who want to.
Between 2023 and 2026, more than $300 billion was deployed into AI-adjacent companies globally. A meaningful share of that capital went into companies whose primary competitive advantage was being early to wrap a foundation model API in a vertical product and call it an AI company. Some of those bets will work out. The ones where the underlying model provider raises prices, releases a competing feature, or gets displaced by an open-weight alternative will not.
The evaluation problem is not that AI is hard to understand. It is that the surface signals of an AI company look identical whether the business has genuine structural advantages or not. Both types have impressive demos. Both have enthusiastic pilots. Both can cite accuracy benchmarks. The differences only surface when you know where to press.
Why the Standard Framework Fails
Traditional venture evaluation frameworks focus on the five factors most predictive of SaaS success: market size, team, product-market fit, go-to-market efficiency, and defensibility. These remain relevant for AI investments, but they require significant modification, and two entirely new dimensions must be added: data architecture and model economics.
A SaaS company builds product value through software features. Once built, those features cost almost nothing to deliver at scale. Gross margins of 70 to 85 percent are typical because the marginal cost of an additional user is near zero. An AI company that calls a commercial API to deliver its core value adds a variable compute cost to every transaction. That cost does not stay constant. It scales with usage. And the pricing of that compute is controlled by a third party with their own incentives.
The defining question for an AI investment is not “does the model work” but “what happens to this business when the model is commoditized.”
This matters because foundation model capability has been commoditizing steadily since 2023. DeepSeek-V3 was trained for approximately $5.6 million and matched closed frontier models on most enterprise benchmarks. Open-weight models now exist for code generation, document analysis, image understanding, and structured reasoning at quality levels that would have required $100 million in compute two years ago. Any AI company whose moat was “we use the best model” has seen that moat evaporate.
The companies that will generate venture-scale returns are the ones whose value compounds through mechanisms that cannot be replicated by switching to a better model. Those mechanisms are primarily about data.
The Eight Dimensions That Separate Real AI from AI Washing
After evaluating more than 200 enterprise AI deployments and advising on AI strategy for organizations ranging from early-stage companies to Fortune 500 boards, the following eight dimensions have proven to be the most reliable predictors of durable AI investment value. Each dimension maps to a question the model cannot answer on its own.
The most defensible AI companies sit on proprietary data streams that improve with scale. Exclusive partnerships with healthcare systems, financial institutions, or industrial operators. Transaction histories with millions of data points per customer. Sensor streams from physical operations. Data that took years to accumulate through relationships, regulatory clearances, or hardware deployment. The test is not whether the data is good. The test is whether a competitor with equal funding could replicate it in 18 months. If the answer is yes, it is not a moat. It is a head start.
There are four categories of AI company by model relationship: those that train proprietary models from scratch on domain data (rare, capital-intensive, full technical ownership); those that fine-tune open-weight models on proprietary data (strong technical moat, no vendor lock-in); those that use open-weight models with retrieval-augmented generation and prompt engineering (moderate differentiation, replicated in weeks by a determined competitor); and those that are entirely dependent on commercial API calls (zero technical moat). Assess which category your target company occupies, and model what happens to its gross margin if that API provider raises prices by three times. That scenario is not hypothetical. It has happened.
Many AI companies have gross margins that look acceptable at current scale but compress as revenue grows. The mechanism is straightforward: inference costs scale with usage, and usage grows faster than the pricing power that comes from customer lock-in. The best AI businesses have inference cost curves that improve with scale via batching, caching, and dynamic model routing. They also have pricing structures tied to outcomes rather than seats, which means that as the product delivers more value, the company captures more revenue without proportional cost growth. Require a detailed breakdown of cost of goods sold, broken out between inference costs and human costs, before committing capital.
AI development requires a specific combination of capabilities that rarely coexists naturally on a founding team. Research depth matters because intuitions about model failure modes, training dynamics, and evaluation methodology are not learnable from documentation. They come from building and breaking models across thousands of experiments. But research pedigree alone is insufficient. Enterprise AI deployment is operationally complex in ways that consistently surprise technical founders. The security review, the change management process, the SLA negotiation, the integration with legacy systems that were never designed for AI inputs. Teams that have only one of these two capabilities tend to build impressive technology that does not reach production, or reach production with technology that does not hold up. The target is a team that has published original research and also has a named enterprise contract with a defined SLA and a live integration.
The best enterprise AI products accumulate customer-specific value over time. Each customer’s usage generates labeled data, fine-tunes the model on domain-specific vocabulary, and creates integrations that are expensive to replicate with a competing solution. After 18 months of use, a company’s instance of the AI product is meaningfully different from the default product, and that difference belongs to the vendor as much as it does to the customer. This is the AI equivalent of a network effect at the single-customer level. When evaluating a company, ask not what it would take for a new customer to adopt the product, but what it would take for an 18-month customer to replace it.
Enterprise AI pilots are common. Production deployments are not. The difference matters enormously. A pilot runs in a controlled environment with a sympathetic internal champion, limited integration scope, and an implicit understanding that the vendor will absorb most of the operational friction. A production deployment has SLAs, security clearance, live integration with at least two enterprise systems, and at least one incident response event on record. Companies that have navigated from pilot to production have proven something that companies with a portfolio of perpetual pilots have not. Require production deployment evidence with documented outcomes before treating customer traction as validated.
AI-native problems are those that were not tractable before current-generation models existed. Synthesizing 50,000 clinical trial documents for a specific patient. Monitoring 3,200 regulatory feeds in real time and mapping proposed rule changes to specific business processes. Predicting equipment failure from multi-sensor IoT streams two to four weeks in advance. These required AI. Many markets being targeted by AI startups were already served by software, and AI is being added as a feature enhancement. That is a reasonable product improvement strategy. It is not a venture investment thesis. When incumbents add the same AI feature, and they will, the competitive question reduces to distribution and price. Those are not favorable dynamics for an early-stage company.
AI liability is not theoretical. The EU AI Act created real compliance requirements for high-risk AI systems across healthcare, finance, hiring, and critical infrastructure. HIPAA applies to any AI system handling protected health information, regardless of whether the vendor thinks of themselves as a healthcare company. Financial regulators in the US and Europe have published specific expectations for AI used in credit, fraud, and investment decisions. Companies operating in these sectors without documented compliance posture, SOC 2 Type II certification, and tested incident response procedures are carrying unquantified liability that will surface during enterprise security reviews, regulatory inspections, or following the first material output error. That liability transfers partially to investors at the moment of funding.
What Average Returns Actually Look Like Across the Spectrum
Venture return modeling for AI investments in 2026 is complicated by the speed at which the environment is changing. A company that looked differentiated in 2024 based on model performance may look commoditized by 2026, because the model performance it was built around is now available for free in an open-weight release. Return modeling has to account for this. Here is a realistic return framework by investment quality tier, based on the eight-dimension evaluation above.
| Verdict | Score | Realistic Return Range | Why |
|---|---|---|---|
| Tier 1 Opportunity | 88–100 | 15–40× | Proprietary data flywheel, model independence, and enterprise lock-in that compounds. Rare. When you find one, prioritize moving quickly. |
| Strong Opportunity | 75–87 | 8–18× | Real structural advantages with identifiable but manageable gaps. Solid institutional return thesis at reasonable entry valuations. |
| Investable with Conviction | 60–74 | 3–8× | Genuine potential but meaningful execution risk. Returns depend heavily on the team closing specific identified gaps. Appropriate for smaller checks or bridge rounds. |
| Cautious Consideration | 45–59 | 1–3× | More noise than signal at this tier. Returns are possible but asymmetry is poor. The gaps are likely to cost more to close than they appear today. |
| Pass / Revisit | 25–44 | 0–1× | Structural risks identified. Return profile is effectively lottery-ticket economics. Pass and revisit if the company resolves the red flags within 90 days with documented evidence. |
| Hard Pass | 0–24 | Capital at risk | Multiple foundational investment criteria unmet. The identified risks are not stage-appropriate gaps. They are business model problems. Do not deploy capital. |
These return ranges are not guaranteed outcomes. They reflect the realistic distribution of returns for companies that score in each tier, based on the structural characteristics those scores represent. A Tier 1 company can still fail if the market does not develop as expected, if the team breaks down, or if a geopolitical event disrupts the sector. A Hard Pass company can still generate returns if the founding team pivots to something structurally sounder. What the framework predicts is the probability-weighted distribution of outcomes, not the individual result.
A company that looks like a Tier 1 on slides but scores a 31 on a structured evaluation is not a Tier 1 with a good story. It is a 31 with a good story.
The Questions Most Investors Are Not Asking
Based on conversations with dozens of VCs and angels over the past 18 months, the questions that consistently go unasked in AI investment diligence are the ones most predictive of which companies will compound and which will commoditize. The following questions should be on every diligence checklist for an AI investment.
If your primary foundation model provider raised API prices by 3× tomorrow, what happens to your gross margin? This question surfaces dependency risk immediately. A well-prepared founder has a multi-model routing strategy and a specific gross margin impact figure. An unprepared one will tell you it is unlikely to happen.
Show me your data rights clause in your standard enterprise contract. The presence or absence of this clause determines whether customer usage generates a proprietary training asset for the vendor or is simply consumed. The absence of this clause is a strategic oversight that will cost the company significantly at Series B and beyond.
What is your inference cost per customer per month at current scale, and what is it projected to be at 10× current customer count? Companies with improving inference economics at scale have built batching, caching, and routing infrastructure. Companies that cannot answer this question have not.
Walk me through a customer who tried to leave and what happened. This question surfaces switching costs directly. A company that has never faced a churn attempt has no empirical data on its own defensibility. A company that has retained customers after competitive displacement attempts has proven something important.
What is the most significant output error your system has generated in production, and what was the customer impact? This is the hallucination liability question. Companies that have thought carefully about output quality management will have specific incident data and documented responses. Companies that have not will be evasive.
Evaluating the Founding Team in the AI Era
Team evaluation in AI requires two distinct assessments that rarely appear in the same person and must therefore be assessed separately across the founding team as a whole. The first is research depth: does someone on this team have enough hands-on model experience to make correct architectural decisions about training data, evaluation methodology, and the failure modes of different model architectures? The second is enterprise commercial depth: has someone on this team navigated a complex enterprise sales cycle, managed security reviews, handled SLA escalations, and maintained a customer relationship through a serious product incident?
Research depth without commercial depth produces companies that build impressive technology and then spend 18 months learning that enterprises will not deploy AI systems that have not passed their security team, their legal team, and their IT procurement process. Commercial depth without research depth produces companies that sell aggressively and then fail to deliver on the technical promises made in the sales cycle. The founding teams that generate Tier 1 returns typically have at least one person who is genuinely credentialed on each axis, and a culture of honest communication between them about where the product is versus where the pitch deck says it is.
One specific signal that separates strong teams from weak ones: how they discuss past failures. AI development involves regular discoveries that force architectural pivots, model replacements, and performance re-evaluations. Teams that present only successes in diligence conversations are either very early in their development or are not being honest with you. Teams that can describe a specific technical failure, what it cost in customer trust or engineering time, and how the architecture changed as a result are demonstrating the kind of organizational learning that compounds in complex technical environments.
A Free Tool Built for This Framework
The eight dimensions above can be assessed systematically with the right question set. Working through them in an unstructured conversation with a founding team is time-consuming and inconsistent. The quality of the answers depends too heavily on how the questions are framed and in what order.
To make this framework usable in practice, I built a free interactive scorecard that turns the eight-dimension evaluation into 30 specific scored questions. Each question maps directly to one of the dimensions above. The scoring is calibrated against the return distribution in the table above. When you complete the evaluation for a specific company, the tool generates a composite score, a tier verdict, a list of specific red flags surfaced by the answers, and a set of personalized next steps for that company’s specific risk profile.
The scorecard captures the company name, founder names, investment stage, and proposed check size at the start, so the printed report is a complete due diligence record for that specific investment opportunity. The report is designed to be shared with investment committee members or co-investors who were not present for the initial evaluation session.
AI Investment Scorecard
30 research-backed questions across 8 dimensions. Built on frameworks from Sequoia Capital AI Thesis 2024, a16z AI Stack research, McKinsey AI Value Survey 2024, and enterprise deployment data from 50+ Fortune 500 AI projects. Generate a printed due diligence report in 12 minutes. Available free because every serious investor should be asking these questions before wiring capital into an AI company.
What to Do When You Find a Tier 1 Company
The distribution above suggests that genuine Tier 1 AI investments are approximately 3 percent of deal flow evaluated on these criteria. That is not a small number in absolute terms given the volume of AI investment activity, but it is small enough that most investors who are not screening systematically will miss them or fail to recognize them when they see them.
When you find a company that scores in the Tier 1 range, the right response is urgency. The founders of a company with a genuine data flywheel, model independence, improving unit economics, strong enterprise evidence, and a credible team will have multiple institutional term sheets within weeks of a fundraise announcement. The investors who move with conviction and speed, rather than waiting for the final diligence checklist to be complete, are the ones who get allocation.
The trap to avoid is using the structured evaluation as a reason to delay. The framework is designed to give you confidence to act, not additional reasons to hesitate. If a company scores above 85 and the red flags are in dimensions that have clear remediation paths, that is sufficient to move to a term sheet while completing the standard legal and financial diligence in parallel. Waiting for a company to also resolve every amber signal before committing is how investors miss the companies that matter most.
The Honest Version of AI Investment in 2026
The AI investment market in 2026 is not what it appeared to be in 2023. The narrative of every company being transformed by AI, which drove much of the early-stage excitement, is giving way to a more nuanced reality. Most enterprise AI deployments are generating real but uneven value. The companies capturing the most of that value are the ones with the best data infrastructure, the most thoughtful model strategies, and the operational depth to deploy at enterprise scale without generating incident reports. Those are the companies worth investing in. And they are identifiable, if you know what to look for.
The framework in this post is not a guarantee. No evaluation framework is. What it is, is a set of questions that surfaces the structural characteristics most predictive of durable AI investment returns, applied consistently across every company in your deal flow. That consistency is where the edge is. Not in having better access, or a stronger network, or a more sophisticated model of the AI market. In asking the right questions, in the right order, with enough domain knowledge to recognize what a good answer actually looks like.
References
- Sequoia Capital. AI in 2024: A Practitioner’s Guide to Enterprise Deployment. Sequoia Capital Research, 2024. sequoiacap.com
- Andreessen Horowitz. The AI Stack: A Framework for Evaluating AI Infrastructure Investments. a16z, 2024. a16z.com
- McKinsey Global Institute. The State of AI in 2024: Enterprise Value and Investment Patterns. McKinsey & Company, 2024. mckinsey.com
- Besiroglu, T., Erdil, E., Barnett, A., et al. Chinchilla Scaling: A Replication Attempt. arXiv:2404.10102, 2024. arxiv.org
- Chen, L., Zaharia, M., & Zou, J. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv:2305.05176, 2023. arxiv.org
- Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. What Makes Good In-Context Examples for GPT-3? arXiv:2101.06804. Referenced in enterprise AI governance literature for evaluating AI system reliability. arxiv.org
- NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology, 2023. nist.gov
- European Parliament. Regulation (EU) 2024/1689 on Artificial Intelligence (EU AI Act). Official Journal of the European Union, 2024. eur-lex.europa.eu
- Nvidia Corporation. Enterprise AI Infrastructure and Total Cost of Ownership Report. Nvidia, 2024. nvidia.com
- DeepMind / Google Research. Scaling Laws for Neural Language Models. Referenced in enterprise AI procurement guidance for model selection and quality benchmarking. arxiv.org