Transparency

How Rankings Work

Every score, badge, and ranking on CancerLogix is derived from publicly available federal research data. No editorial opinions, patient satisfaction surveys, or commercial relationships influence any number you see here.

Data Sources

CancerLogix integrates three federal databases, all publicly accessible with no pay-wall or licensing restriction.

SourceWhat we collectScale
NIH RePORTERCancer-focused grants, award amounts, abstracts, fiscal years~52,400 grants (FY2020+)
ClinicalTrials.govActive and historical trials, conditions, phases, status~95,200 trials
PubMed / iCitePeer-reviewed publications, MeSH annotations, Relative Citation Ratio (RCR)~256,800 publications

NIH grants are filtered to cancer-relevant records only. Each grant abstract is matched against our cancer taxonomy vocabulary — roughly 202,000 non-cancer grants were excluded before any scoring takes place.

Cancer Taxonomy

Before we can measure specialization, we need a canonical list of cancer types. We use the NCI Thesaurus (NCIt) — the federal standard for cancer nomenclature used by NCI, FDA, and ClinicalTrials.gov itself. This eliminates any subjective decisions about what counts as a distinct cancer type.

13

Domains

e.g. Hematologic, Thoracic, Pediatric

2,556

General

e.g. Breast Cancer, Lung Cancer

7,723

Specific

e.g. HER2+ Breast Cancer, NSCLC

The hierarchy was assembled programmatically: we fetched all NCIt concepts via the NCI EVS API, then traversed parent-child relationships to build the three-level tree. Overly generic root nodes — such as "Neoplasm," "Malignant Disease," and "Cell Growth Disorder" — are excluded as stop codes, so every cancer type in our system refers to a recognizable, specific disease category rather than a catch-all umbrella.

The taxonomy is treated as stable. NCI releases periodic Thesaurus updates; our taxonomy would be refreshed in tandem with any major NCI reclassification.

Connecting Data to Cancer Types

Each of the three data streams is linked to specific cancer types through a different mechanism — chosen based on how that source records cancer information.

Publications — MeSH to NCIt mapping

PubMed annotates papers with MeSH (Medical Subject Headings), a separate controlled vocabulary maintained by the National Library of Medicine. We query the NCI EVS API to map each cancer-relevant MeSH term to its corresponding NCIt code, then cross-reference that code against our taxonomy. Where no code match is found, we fall back to a normalized name match. Result: approximately 59% of publications are linked to one or more cancer types.

Grants — abstract keyword extraction

NIH RePORTER provides the full project abstract for most grants. We scan each abstract for occurrences of cancer type names from our taxonomy and store them as tags. This is a deliberate string-match approach rather than an AI model — reproducible, auditable, and not subject to model drift. Coverage: approximately 68% of cancer grants are tagged to at least one specific cancer type.

Clinical trials — condition fields

ClinicalTrials.gov requires trial registrants to specify the conditions being studied using standardized terminology. These condition fields are matched against our cancer taxonomy at import time. Because conditions are registered at enrollment — not extracted post-hoc from free text — trial tagging is the most precise of the three streams.

Center Excellence Score (0–100)

The headline score shown on every center profile and directory card. It measures a center's overall research capacity relative to the other 108 centers in the dataset.

Formula

Excellence Score =

40% × NIH cancer funding (percentile among all 109 centers)

35% × Active clinical trials (percentile among all 109 centers)

25% × Publication impact (mean RCR, percentile among all 109 centers)

Percentile ranks, not raw totals. Each input is converted to a percentile rank (0–1.0) before weighting. This means a $100M focused specialist center and a $1B research university both receive scores that reflect their standing relative to peers — a large absolute number alone doesn't guarantee a high score.

What is RCR? The Relative Citation Ratio, computed by NIH's iCite, normalizes a paper's citation count by the typical citation rate for papers in the same field and year. An RCR of 1.0 is field-average; 2.0 means twice as cited as the field norm. We use the mean RCR across all of a center's publications as the publication impact signal.

Weight rationale. Funding (40%) is the largest weight because sustained NIH grant dollars reflect institutional research capacity over multiple years. Trials (35%) reflect active patient-facing work happening right now. Publication impact (25%) captures the scientific quality of prior output, but is weighted lower because high-RCR papers are sometimes produced by small groups within large institutions.

Cancer-Type Specialization Score (0–100)

Each center receives a separate score for every cancer type in which it has documented research activity. These scores power the ranked “Top Centers” tables on every cancer type page.

Formula

Cancer-Type Score =

45% × Active trials for this cancer type (percentile within cancer type)

35% × NIH grant funding for this cancer type (percentile within cancer type)

20% × Publication impact for this cancer type (mean RCR, percentile within cancer type)

Within-group ranking is the key design decision. Percentile ranks are computed among all centers that have any activity in that cancer type — not across all 109 centers globally. A center leading in pediatric sarcoma research is ranked against the other centers working on pediatric sarcoma, not against MD Anderson's full portfolio. This allows a focused specialist institution to legitimately top the list for its area of expertise.

Why trials are weighted highest here (45%). ClinicalTrials.gov condition fields are the most precise cancer-type signal we have — registered at enrollment, not extracted after the fact. Grant and publication tagging depends on keyword extraction, which has lower coverage. The weighting reflects that precision difference.

Research Tier Badges

Research tier badges are a simplified read of the Cancer-Type Specialization Score. Thresholds were set empirically by inspecting the score distribution across all ~121,900 center-specialization pairs.

BadgeScore thresholdMeaning
High Volume≥ 70Among the most research-active centers for this cancer type
Active≥ 35Meaningful, documented research presence
Contributing> 0, with activityTrials, grants, or publications exist but lower volume
No badgeNo documented research activity in our dataset

A center without a badge for a given cancer type may still treat patients with that condition — it simply has no indexed research output (grants, trials, or publications) linked to it in the federal databases we use.

Patient Experience Data

For cancer centers where CMS data is available, CancerLogix displays patient satisfaction scores from the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey, administered by the Centers for Medicare and Medicaid Services (CMS). HCAHPS is the first national, standardized, publicly reported survey of patients' perspectives on hospital care.

What HCAHPS Measures

HCAHPS surveys patients 48 hours to 6 weeks after a hospital discharge. It asks about observable experiences during the stay — not opinions on medical outcomes. CancerLogix displays four measures:

MeasureWhat It Asks
Would Recommend“Would you recommend this hospital to family and friends?” (% “Definitely yes”)
Overall Rating 9–10“What number would you use to rate this hospital during your stay?” (% giving 9 or 10 out of 10)
Doctor CommunicationHow often doctors communicated well (star rating, 1–5)
Nurse CommunicationHow often nurses communicated well (star rating, 1–5)

CancerLogix also stores (but does not currently display) communication-about-medicines and discharge-information star ratings. These may surface in a future update.

What HCAHPS Does Not Measure

HCAHPS does not measure cancer-specific outcomes, treatment efficacy, survival rates, complication rates, or the quality of cancer diagnosis and staging. A high HCAHPS score means patients reported positive care experiences — it is one signal among many, not a comprehensive indicator of cancer treatment quality.

Two-Track Data Approach

CMS maintains two separate HCAHPS reporting programs. CancerLogix uses both.

PPS-Exempt Cancer Hospitals (PCH)

Eleven freestanding cancer hospitals — including MSK, Dana-Farber, and MD Anderson — are exempt from standard Medicare prospective payment and report HCAHPS under a dedicated CMS program. For these centers, the HCAHPS data refers specifically to the cancer hospital.

Embedded Cancer Centers

Most NCI-designated cancer centers are housed within a larger hospital or academic medical center. For these centers, HCAHPS scores are reported under the parent hospital's CMS Certification Number and reflect the entire hospital's patient experience — not the cancer program specifically. This is clearly disclosed on each affected center's profile.

Approximately 15 centers in the CancerLogix directory — including basic-laboratory research institutes, Canadian centers, and a military facility — have no CMS hospital counterpart and will not show patient experience data.

CMS Overall Star Rating

The CMS Overall Hospital Star Rating (1–5 stars) is a composite of up to five measure groups: mortality, safety of care, readmission, patient experience, and timely/effective care. It is published in the CMS Hospital General Information dataset and reflects the full hospital, not the cancer program specifically. Star ratings are updated by CMS approximately quarterly.

What This Doesn't Measure

CancerLogix measures research activity — not clinical quality or patient experience.

  • Patient outcomes, survival rates, or treatment success
  • Clinical quality standards, accreditation, or care delivery (those are JCAHO, Magnet, and NCI designation territory)
  • Insurance acceptance or wait times. Patient satisfaction is partially covered — CancerLogix displays CMS HCAHPS survey scores for ~94 centers where CMS data is available (see Patient Experience Data above), but ~15 centers — basic-laboratory institutes, Canadian centers, and a military facility — have no CMS hospital match and will not show this data.
  • Multi-site cooperative group trial inflation. Clinical trial counts reflect every trial where a center appears as a participating location — including large NCI cooperative group studies (e.g. Children's Oncology Group, SWOG, ECOG-ACRIN) where 50–100+ institutions co-enroll patients. Smaller academic medical centers that participate broadly in cooperative networks can appear to have high trial counts relative to their independent research footprint. The Excellence Score partially corrects for this because funding and publication impact are independent signals, but the raw trial count displayed on center profiles includes cooperative group participation.
  • Taxonomy term fragmentation at the specific level. ClinicalTrials.gov trial condition fields and the NCI Thesaurus sometimes use different terms for the same clinical concept — for example, ClinicalTrials.gov favors “Recurrent Adult Acute Myeloid Leukemia” while NCI uses “Relapsed Adult Acute Myeloid Leukemia.” Because our cancer-type filter requires an exact tag match, highly specific taxonomy nodes may show fewer trials than actually exist for that concept. Browsing at the general level (e.g. “Acute Myeloid Leukemia”) always returns the broadest, most complete trial set.
  • Research in progress not yet indexed — PubMed and NIH RePORTER typically lag 1–2 years behind the current date
  • International centers (currently 109 US + 2 Canadian centers; broader coverage planned)

A high Excellence Score means a center is a major federally funded research institution. It is not a recommendation for where to seek care, and it should not be used as a substitute for consultation with a physician.

Data Freshness

DatasetRefresh cadenceNotes
NIH grantsWeekly (Sunday 2 am UTC)Via NIH RePORTER API; cancer-only filter applied on each run
Clinical trialsWeeklyVia ClinicalTrials.gov API
PublicationsWeeklyVia PubMed / iCite APIs; RCR scores refreshed each run
Cancer taxonomy (NCIt)StaticRefreshed in tandem with major NCI Thesaurus releases
Excellence scoresRecomputed after each data refreshScores are derived, not stored independently of source data
HCAHPS survey measuresAnnually (November)CMS releases a new 12-month survey window each October/November; recommend %, rating %, and communication stars are refreshed then
CMS Overall Star RatingQuarterly (Feb, May, Aug + Nov)CMS updates the composite star rating ~quarterly; CancerLogix refreshes the star column on the same cadence