Methodology

How The Clinical Index Works

Full transparency into our evidence verification process, scoring methodology, and quality assurance pipeline.

Our Philosophy

We believe in radical transparency

The supplement industry is plagued by opaque proprietary blends and unsubstantiated marketing claims. We take the opposite approach: our scoring formula, weights, and methodology are entirely public. If you can reproduce our inputs, you can reproduce our outputs.

PMID-Verified Citations Only

Every study referenced in a TCI dossier is indexed in PubMed and identified by its PMID. We never generate synthetic citations. Each PMID is validated against the NCBI database before inclusion.

Deterministic Scoring

All numeric scores are computed by deterministic Python code with fixed, published weights — no LLM ever performs the arithmetic. LLM agents produce qualitative narrative and structured extractions only. Given the same extracted evidence, the scoring engine returns the same score every time.

Full-Text Evidence Extraction

We go beyond abstracts. Our pipeline retrieves full-text articles from PubMed Central and Unpaywall, extracting specific passages that support or contradict each ingredient claim.

The Pipeline

8 LLM Agents + 4 Deterministic Checks, One Sequential Pipeline

Every product analysis passes through eight purpose-built LLM agents and four deterministic (non-LLM) checks in strict sequence — never a parallel swarm. Each stage has a single responsibility and produces structured output for the next. The four deterministic nodes — Classifier, Verifier, Dusting, and the Scoring Engine — run in pure Python with zero LLM calls.

IntakeAgent

Opus 4.7 · Vision

Runs on Claude Opus 4.7 with vision to parse supplement facts panels and label images directly. Extracts every ingredient, its dose, unit, and form (e.g., magnesium glycinate vs. magnesium oxide). Normalizes naming conventions across brands.

IngredientClassifier

Deterministic

Categorizes each ingredient by functional role: PRIMARY (active therapeutic), SUPPORTING (cofactors, bioavailability enhancers), or OTHER (excipients, fillers). This agent uses rule-based logic with zero LLM calls for complete reproducibility.

EvidenceResearchAgent

Haiku 4.5

Searches PubMed/NCBI and ClinicalTrials.gov for clinical studies per ingredient. Retrieves full-text articles via a PMC/Unpaywall cascade, prioritizing randomized controlled trials and systematic reviews. Extracts dosage data, population info, and outcome measures, scored under the GRADE certainty framework.

VerifierAgent

Deterministic

Validates every citation the EvidenceResearchAgent surfaced against the live NCBI database before anything downstream consumes it. Confirms each PMID exists and fuzzy-matches extracted claims against source text. Runs as deterministic logic with zero LLM calls — no hallucinated study survives this gate.

FormulationAnalysisAgent

Haiku 4.5

Evaluates whether each ingredient is dosed at clinically meaningful levels by comparing label doses against therapeutic ranges established in peer-reviewed literature, anchored to MCID / Cohen's d effect-size thresholds. Flags under-dosed and over-dosed ingredients.

DustingDetector

Deterministic

Identifies "fairy-dusted" ingredients included at sub-clinical doses, typically far below the minimum effective dose found in clinical literature. Deterministic rule-based logic with zero LLM calls — flags ingredients present on the label for marketing value rather than therapeutic benefit.

ClaimsValidationAgent

Haiku 4.5

Cross-references marketing claims made on the product label against the verified evidence base, then runs the FTC compliance pass: it scans label claims, product names, and marketing language against the FTC Health Products Compliance Guidance (2022) and FDA Qualified Health Claim (A/B/C/D) standards. Determines whether each claim is Supported, Partially Supported, or Unsupported and flags disease and unsubstantiated "clinically proven" claims.

SafetyAssessmentAgent

Haiku 4.5

Checks each ingredient against Tolerable Upper Intake Levels (ULs) established by the National Academies. Identifies known drug interactions, contraindications, and population-specific warnings (pregnancy, pediatrics, elderly).

COAVerificationAgent

Haiku 4.5

When a Certificate of Analysis is provided, validates that tested ingredient quantities match label claims within acceptable variance. Checks for heavy metals, microbial contamination, and identity verification against COA data.

Scoring Engine

Deterministic

Computes 100% of the numeric scoring in deterministic Python with fixed published weights — no LLM ever touches the math. Combines the per-ingredient Scientific Credibility Scores into the product composite and assigns the badge tier.

SynthesisAgent

Sonnet 4.6

The premium narrative agent, running on Claude Sonnet 4.6. Synthesizes outputs from all preceding agents into a coherent executive summary. Produces the final dossier text while maintaining scientific accuracy — it writes the story, never the score.

QAReviewAgent

Haiku 4.5

Final quality assurance pass. Validates that all citations are real, scores are consistent with evidence, and the narrative accurately reflects the data. Catches edge cases and ensures fairness across product categories.

Scoring

Scientific Credibility Score (SCS)

Every product receives a composite score from 0 to 100, computed entirely by deterministic Python code. LLMs contribute narrative analysis and structured extractions but never influence the numeric score. Each ingredient first earns a per-ingredient Scientific Credibility Score — Evidence 40% / Dose 40% / Bioavailability 20% — which then rolls up into the product composite below.

Grounded in Established Frameworks

The scoring substrate is not invented in-house. Every dimension maps to a peer-reviewed or regulatory methodology, so a reviewer can trace each judgment back to its source standard.

GRADE

Certainty-of-evidence grading across the body of studies for each ingredient.

RoB 2

Cochrane Risk of Bias 2 — risk assessment for randomized controlled trials.

ROBINS-I

Risk Of Bias In Non-randomized Studies of Interventions, for observational evidence.

FTC HPCG 2022

FTC Health Products Compliance Guidance (2022) — the claims-substantiation standard.

FDA QHC (A/B/C/D)

FDA Qualified Health Claim tiers governing permissible structure/function language.

NNHPD · EFSA

Health Canada NNHPD monographs and EFSA dossiers for international dose and safety anchors.

MCID / Cohen's d

Minimal Clinically Important Difference and effect-size thresholds for dose adequacy.

Without Certificate of Analysis

The standard scoring model applied to all products analyzed without third-party lab testing data.

35%

Evidence Quality

Strength, relevance, and volume of PubMed studies supporting each ingredient.

30%

Dose Adequacy

Whether ingredients are dosed within clinically effective ranges from published trials.

20%

Bioavailability

Quality of ingredient forms (e.g., chelated minerals vs. oxides, methylated B vitamins).

15%

Safety

Absence of Tolerable Upper Limit violations, contraindications, and interaction risks.

With Certificate of Analysis

When a brand provides third-party lab test results, the scoring model adds a Label Accuracy dimension and rebalances weights.

30%

Evidence Quality

Strength, relevance, and volume of PubMed studies.

25%

Dose Adequacy

Clinical dose alignment from peer-reviewed literature.

20%

Label Accuracy

How closely tested quantities match label claims (COA verification).

15%

Safety

UL compliance, drug interactions, contraindications.

10%

Bioavailability

Ingredient form quality and absorption characteristics.

Grade Scale

A+

95 - 100

90 - 94

A-

85 - 89

B+

80 - 84

75 - 79

B-

70 - 74

C+

65 - 69

60 - 64

C-

55 - 59

40 - 54

< 40

Verification Badge

Four Verification Tiers

A product earns the verification tier that matches the depth of evidence it clears — not a single all-or-nothing pass/fail. The deterministic Scoring Engine assigns the tier from gated thresholds on the published scoring dimensions; the tier is earned, never bought.

Verified

The highest tier. The product clears evidence, dose adequacy, safety, and label accuracy — a Certificate of Analysis confirms tested quantities match the label. Full-spectrum verification.

Label Verified

Tested ingredient quantities are confirmed against label claims via a Certificate of Analysis. What the label says is what the lab found.

Formula Reviewed

The formulation has been assessed for dose adequacy and ingredient form quality against published therapeutic ranges, with the dosing rationale documented.

Evidence Reviewed

The ingredient evidence base has been retrieved, PMID-verified, and graded under the GRADE / RoB 2 frameworks, with citations a reader can audit.

Evidence Standards

How We Handle Evidence

Our evidence pipeline is designed to maximize accuracy and minimize the risk of AI-generated hallucinations.

Indexed Studies Only

We source studies from PubMed/NCBI and ClinicalTrials.gov — the gold standards for biomedical literature and trial registration. Pre-prints, blog posts, manufacturer-funded white papers, and non-indexed journals are excluded from the evidence base.

Full-Text Extraction

When available through PubMed Central or Unpaywall, we retrieve and analyze the complete article text rather than relying solely on abstracts. This provides access to methodology details, dosage protocols, and result nuances that abstracts omit.

Hallucination Verification

Every citation generated by our AI agents is verified against the NCBI database. PMIDs are checked for existence, and extracted claims are fuzzy-matched against source text with a similarity percentage. Citations that cannot be verified are automatically removed.

Study Quality Weighting

Not all studies carry equal weight. Randomized controlled trials (RCTs) and systematic reviews receive the highest weight, followed by observational studies. In vitro and animal studies are noted but weighted significantly lower for human health claims.

Our Commitment

Transparency Is Non-Negotiable

Weights Shown in Every Dossier

Every dossier displays the exact scoring weights used. You can see precisely how each dimension contributed to the final score.

Deterministic Math

The scoring engine is pure Python — the same extracted evidence always yields the same score. The upstream LLM extraction can vary slightly between runs, so a re-audit may shift a few composite points, but the math itself never gambles.

No Pay-to-Play

You cannot buy a higher score. The scoring algorithm treats every product identically regardless of which brand submitted it.

Open Methodology

This page is our commitment. Every weight, every gate, every rule that affects your score is documented publicly.

Read our Regulatory Guide

Ready to know what the evidence actually says about your product?

Analyze Your First Product Talk to Our Team

Free to start. No credit card required.