Evidence Methodology

How we score the evidence

Every compound in the Archive carries an Aphrodite Evidence Score. Here is exactly how it is built — so you can judge the score, and judge us.

Most peptide information online is written to sell something or to excite. We do the opposite: we read the published research and ask how strong it actually is. The Evidence Score is our attempt to answer, in one honest number, a question that usually takes hours of reading to resolve — how much should you trust the claims made about this compound? It is a judgment of evidence quality, not an endorsement, and never a recommendation to use anything.

The six categories

Each compound is rated 0–10 on six independent dimensions. The overall score is a weighted composite, with the categories that speak most directly to trustworthiness — human evidence and study quality — carrying the most weight.

Human evidence

Highest weight

How much of what we know comes from studies in people, versus animals or cell cultures. A compound can have hundreds of rodent papers and almost no human data — this category makes that visible.

We ask: Are there controlled human trials? How many participants? Do the human findings match the animal claims?

Study quality

Highest weight

Not all evidence is equal. A randomized, double-blind, placebo-controlled trial tells us far more than an open-label observation or a single case report.

We ask: Were studies randomized and controlled? Blinded? Adequately powered? Pre-registered?

Independent replication

High weight

A finding from one lab is a hypothesis; a finding reproduced by independent groups is closer to knowledge. This category penalizes results that rest on a single team or a single funder.

We ask: Has the key result been reproduced by others? Or does it trace back to one source?

Safety characterization

High weight

How well the risks are understood. A high score here does not mean "safe" — it means the safety picture has actually been studied. A compound with no long-term data scores low even if no harm has yet appeared.

We ask: Are adverse events systematically reported? Is there long-term and cardiovascular data? What remains unknown?

Regulatory maturity

Moderate weight

Where the compound sits in the formal review process — from purely preclinical, through the phases of human trials, to approval by a regulator for a specific use.

We ask: What phase is it in? Is it approved anywhere, for what, and by whom?

Hype-to-evidence gap

Modifier

The distance between what the internet claims and what the research supports. A large gap is a warning, not a verdict on the compound — it flags where marketing has outrun the science.

We ask: How far do popular claims exceed the published evidence? This is reported as Low, Moderate, or High rather than a 0–10 score.

What the overall score means

The composite lands each compound in one of five bands. The band, not the exact decimal, is what matters.

8.5–10

Well established. Strong, replicated human evidence and a well-characterized profile — typically approved or late-stage compounds.

7.0–8.4

Solid but incomplete. Meaningful human evidence exists, but gaps remain — often in long-term safety or replication.

5.0–6.9

Emerging. Early human data or strong preclinical signals, not yet confirmed at scale.

3.0–4.9

Preliminary. Mostly animal or laboratory evidence; human claims are largely extrapolation.

0–2.9

Insufficient. Little published evidence of any kind, or claims that the literature does not support.

Why a high score is not a green light

A strong Evidence Score reflects the quality of the research — not that a compound is safe for you, appropriate for you, or legal to use. Many compounds we score are investigational and not approved for human use. The score tells you how much to trust the claims; it does not tell you what to do.

How each score is produced

Search. We compile the published literature from PubMed, ClinicalTrials.gov, and preprint servers — aiming to capture every relevant indexed paper, not a convenient sample.

Read and summarize. Each notable study is read at the source and summarized in original language. We never paraphrase so closely that it reproduces the original.

Rate each category. The six dimensions are scored against the criteria above, with the reasoning recorded so it can be checked.

Human verification. A person checks every figure and claim against the primary source before anything is published. AI may assist the drafting; it does not have the final word.

Date and revisit. Every score carries a "last updated" date and is revised as new evidence appears. Scores can and do change.

Limitations we want you to know

Where this score falls short

It is a judgment, not a measurement. Reasonable experts could score some categories differently.
It compresses a complex evidence base into a single number — useful for orientation, not a substitute for reading the studies.
Evidence moves. A score reflects the literature on its update date and may lag new findings.
A high score never implies safety or suitability for any individual, and never constitutes medical advice.
We have no financial stake in any compound scoring high or low, and we do not sell compounds — but no scoring system is perfectly free of judgment.

Our commitment to trust

Every dossier shows its last-reviewed date and its full evidence record. We do not let any commercial interest influence a score, and we keep this educational platform strictly separate from the sale of any compound. When we get something wrong, we correct it and note the change. That transparency is the point — a score you cannot inspect is a score you should not trust.