Rankly AI
Scoring methodology

The RQ Score

Every model on Rankly AI is assigned a Rankly Quotient — a proprietary composite index that measures true AI capability across seven independent dimensions. RQ runs from 0 to 1000. Real-world scores range from roughly 400 to 980.

47 raw variables

The RQ is computed from 47 distinct signals — not a single benchmark or a single vote. No individual data point can distort the final score.

7 independent blocs

Variables are grouped into seven evaluation dimensions, each scored separately on the 0–1000 scale before being combined into the final RQ.

Updated every 12 hours

Scores are recomputed automatically twice a day. A model that ships an update at midnight will reflect it by morning.

The 7 evaluation blocs

Each bloc is scored independently on a 0–1000 scale. The seven bloc scores are then combined using a weighted formula to produce the final RQ. Weights are not published — they are calibrated quarterly based on what our data shows actually predicts user satisfaction.

B1
IntelligenceB1

Measures raw cognitive capability: logical reasoning, mathematical problem-solving, language comprehension across multiple languages, and performance on standardized coding benchmarks. This is the hardest bloc to game — it is built on reproducible tests, not self-reported metrics.

B2
PerformanceB2

Covers the operational reality of using a model day-to-day: response latency (time to first token), throughput on longer tasks, effective context window, and measured uptime over rolling 30-day windows. Tests run from multiple geographic locations to avoid regional bias.

B3
QualityB3

Evaluates output that is hard to quantify numerically: naturalness of conversation, creative range, factual accuracy on curated question sets, and the distinct personality that makes a model enjoyable to use. This bloc also incorporates weighted community satisfaction signals.

B4
UtilityB4

Scores real-world usefulness across five domains — writing, coding, research, image generation, and video — using standardized task suites. Utility reflects how broadly a model can be applied, not just how well it performs in its strongest area.

B5
AccessibilityB5

The best model in the world has no value if it is behind an expensive paywall. This bloc rewards models with a functional free tier, competitive pricing relative to quality delivered, and broad geographic availability without restrictions.

B6
TrustB6

Assesses the governance and transparency layer: open-source status, published safety research, privacy policy quality, data handling practices, and the reputational track record of the company behind the model. Trust decays quickly after public controversies.

B7
MomentumB7

A leading indicator of future performance: how frequently the model is updated, how fast its developer community is growing, and media and research attention over the past 90 days. Momentum rewards models that are actively improving, not just coasting on past success.

Scores go down, too

The RQ is not a marketing index. A model that released a breakthrough version eighteen months ago and has shipped nothing since will see its score decline — gradually at first, then sharply if competitors accelerate. Outages, price increases, privacy incidents, and lapses in output quality all produce measurable drops.

This is intentional. The most informative moments on a model's history chart are often the drops. They show when a competitor surpassed it, when something went wrong, or when community sentiment shifted. A leaderboard where everyone always improves tells you nothing.

The community signal

1

Vote by category

Votes are cast per use case — not globally. A vote for Midjourney in "Images" carries different weight than a vote in "Coding". This prevents high-volume communities from distorting scores in areas where a model underperforms.

2

Engagement-weighted

A vote from a user who has tested and rated across dozens of models carries more weight than an account created yesterday. This makes the community signal harder to manipulate and more representative of actual usage patterns.

3

Blended, not dominant

Community votes contribute to two of the seven blocs — Quality and Utility. They are one input among many, not the primary driver of the RQ. A model cannot buy its way up the leaderboard through vote campaigns.

Update schedule

Every 12 hours
Full RQ recomputation for all models. Leaderboard refreshed automatically.
Real-time
Community votes recorded immediately. Reflected in the next scheduled recomputation.
On model event
Major releases, outages, and price changes trigger an early recomputation and a chart annotation.
Quarterly
Bloc weights reviewed and recalibrated against user satisfaction data. Version bump issued.

Frequently asked questions

Can AI companies pay to improve their RQ score?

No. Paid placements are clearly labeled as "Sponsored" and do not affect the algorithmic score. The RQ computation has no commercial inputs.

Why don't you publish the exact bloc weights?

Published weights become targets. If we announced that Momentum accounts for X%, providers would optimize press releases and community activity around that number rather than improving their models. The weights are audited internally and recalibrated quarterly.

Why does a model with more votes sometimes have a lower RQ?

Vote count and vote quality are different things. A model with large but divided community sentiment can have high vote volume and a mediocre community score. The RQ also weighs six other dimensions — a model can lose ground in Intelligence or Trust even if its community votes are positive.

How do you handle brand-new models?

New models enter with limited data completeness. Their RQ is flagged as provisional for the first four to six weeks while historical data accumulates. We label these models clearly as "New" during this period.

What is rq_version?

Every score snapshot in our database records which formula version produced it. When we ship a major weight revision, we retroactively recompute all historical scores under the new formula so charts stay comparable across time.