Methodology — How Dappiehub Reviews and Scores AI Tools

The scoring rubric

Five universal categories, scored out of 100.

Every tool on Dappiehub gets a score in each of five universal categories. The category scores are weighted and combined to produce the headline 0–10 score you see on tool cards. The five categories were chosen because they're the ones that consistently determine whether a piece of business software is worth paying for.

25%

Ease of Use

Time to first useful output. How quickly a non-technical owner can install the tool, complete a real task with it, and get a result they'd actually use.

25%

Feature Depth

Whether the tool delivers on its core promise meaningfully better than free alternatives — or just adds polish to something a generic tool already does.

20%

Value

What you get per pound spent at each pricing tier — and whether the cheapest plan is actually usable for real work, or a marketing tease.

15%

Integrations

How well the tool connects to the rest of a small business stack — Word, Google Drive, common CRMs, automation tools, payment platforms.

15%

Reliability

Uptime, support quality, data security posture, and whether the company looks likely to still be here in two years. New AI startups die fast.

Category-specific scoring

The 14 family rubrics.

The five universal categories give us a like-for-like score across every tool. But comparing an AI writing tool against an AI image generator on "Feature Depth" means little if you don't know what features matter for that category. So every tool also gets scored against a category-specific rubric.

The 14 families currently covered are:

AI Assistants — versatility, writing, coding, image gen, value
AI Writing — output quality, tone control, long-form coherence, SEO features, editing
AI Image — prompt fidelity, style range, output resolution, commercial licensing, speed
AI Video — quality, length limits, editing tools, voice/music, export options
AI Voice & Audio — voice naturalness, language coverage, voice cloning, latency, cost
AI Search & Research — citation accuracy, source quality, depth, freshness, fact-checking
AI Productivity — workflow integration, task automation, summarisation, search, share
AI Coding — code quality, language coverage, IDE integration, autonomy, debugging
AI Automation — connector breadth, logic depth, error handling, cost at scale, ease
AI Sales & CRM — lead scoring, outreach quality, CRM integration, analytics, compliance
AI Customer Support — answer quality, escalation logic, knowledge base, channels, analytics
AI Marketing — content quality, channel coverage, analytics, A/B testing, scheduling
AI Design & Brand — output quality, brand consistency, asset management, collaboration, export
Legal AI — legal-specific accuracy, citation handling, document scale, security, workflow

Within each family, the rubric weights are tuned to that category. A coding tool's "feature depth" score should reward IDE integration and autonomy; an image tool's should reward prompt fidelity and style range. The full per-family rubric appears on each individual tool page under "Category Scores".

The two benchmarks

Every score is calibrated against two reference tools.

Numerical scores in a vacuum mean little. A "9.1" only matters if you know what an 8.0 looks like. To prevent score inflation, every tool we review is calibrated against two reference benchmarks that don't move:

ChatGPT

Generalist benchmark · 9.2

The world's most widely-used AI tool, scored as it stood at our most recent review cycle. Any tool claiming to be a better generalist needs to beat it on something specific to earn a higher score.

Claude

Quality benchmark · 9.1

The reference for output quality and reasoning depth, particularly on long-form and document work. A tool scoring above Claude on quality categories needs to demonstrably outperform it.

This calibration is why our headline scores cluster between 7.5 and 9.5 rather than spanning the full 0–10 range. Tools below 7.0 generally aren't included in the catalogue at all — there's no point reviewing software we wouldn't recommend to anyone. The tools we cover are tools we think have a legitimate case for being chosen by someone.

The review process

How a tool gets reviewed.

A new review is roughly a two-week process. The steps are the same regardless of whether the tool is new on the market or a long-established player:

Hands-on testing across real business tasks

We use the tool on actual work — not vendor demos — for at least a week. Writing real emails, drafting real proposals, processing real documents, automating real workflows. Marketing screenshots are not evidence.

Pricing analysis at every tier

We try the free tier (if there is one) and at least the next paid tier up. Tools whose free tier is unusable get marked down even if their paid tier is excellent — most readers will never get past the free tier to find out.

Integration and stack-fit testing

We test the tool against a representative small-business stack — Microsoft 365, Google Workspace, Slack or similar, a common CRM, and Zapier or Make. Tools that don't play well with the existing world get marked down.

Comparison against benchmarks

Direct head-to-head against ChatGPT and Claude on the same tasks where relevant. If a generalist tool can do the same job for free, the specialist tool needs a real reason to exist.

Verdict, scoring and publication

Scores are assigned to all five universal categories and the relevant family rubric. The verdict, pros, cons and "best for" recommendations are written. The review is published with a clear date, and the tool enters the rolling re-review cycle.

How we keep reviews current

Re-review cycle.

AI tools change month to month — pricing changes, models get upgraded, features get added or removed. A review written six months ago can be materially wrong today. To keep the catalogue honest, every tool is on a rolling re-review schedule:

Major tools (top 10 by score, plus benchmarks) — re-reviewed every 90 days
Catalogue tools (the rest of the 52) — re-reviewed every 180 days
Triggered reviews — any tool with a major version release, pricing change, or company-level event (acquisition, funding round, leadership change) gets pulled forward

Each tool page shows its Last Updated date prominently. If a date is older than 180 days when you read this, that's a bug we want to know about — please let us know.

Conflicts of interest

What gets paid for, and what doesn't.

Dappiehub does not currently take affiliate commission on tool sign-ups. The "Try X →" links on tool pages are direct links to the vendor, with no tracking or commission attached.

If this changes — and most independent review sites at our scale eventually do take affiliate links — the policy will be:

Affiliate links will be labelled on every page they appear, in line with FTC and ASA guidance
Scoring and rankings will not be influenced by commission rates. The rubric is the rubric.
Tools we'd score below 7.0 will not be added to the catalogue regardless of how generous the affiliate offer is

Sponsored content, where it exists, will be clearly labelled as "Sponsored" and held to the same factual accuracy standards as editorial content. Sponsored placement does not move scores.

Corrections

If we get something wrong.

We will get things wrong. Pricing changes we miss. New features we underweight. Tools we score generously and shouldn't have, or harshly and shouldn't have either. When this happens, the response is straightforward:

Spotted an error? Email corrections@dappiehub.com with the page URL and the issue. We respond within five working days, correct verifiable factual errors promptly, and note material corrections at the bottom of the affected page along with the date of correction.

Scoring disagreements aren't factual errors — they're editorial judgement, and we'll defend our scores even when readers (or vendors) disagree with them. But if we've got a feature wrong, missed a pricing tier, or misrepresented what a tool does, that's a correction and we'll make it.

How we score, rank and review every tool on Dappiehub.

Five universal categories, scored out of 100.

Ease of Use

Feature Depth

Value

Integrations

Reliability

The 14 family rubrics.

Every score is calibrated against two reference tools.

ChatGPT

Claude

How a tool gets reviewed.

Hands-on testing across real business tasks

Pricing analysis at every tier

Integration and stack-fit testing

Comparison against benchmarks

Verdict, scoring and publication

Re-review cycle.

What gets paid for, and what doesn't.

If we get something wrong.

See the rubric applied