Question 1

What data sources does CancerLogix use?

Accepted Answer

CancerLogix integrates three federal databases: NIH RePORTER (cancer-focused grants, FY2020 onward), ClinicalTrials.gov (active and historical trials), and PubMed/iCite (peer-reviewed publications with Relative Citation Ratio scores). All data is publicly accessible with no paywall.

Question 2

How is the CancerLogix Excellence Score calculated?

Accepted Answer

The Excellence Score (0–100) is a weighted composite of three percentile-ranked inputs: NIH cancer funding (40%), active clinical trials (35%), and publication impact measured by mean Relative Citation Ratio (25%). Each input is converted to a percentile rank among all 109 centers before weighting, so the score reflects relative standing rather than absolute size.

Question 3

How is the Cancer-Type Specialization Score calculated?

Accepted Answer

Each center receives a Cancer-Type Specialization Score (0–100) for every cancer type in which it has documented activity. The score weights active trials for that cancer type (45%), NIH grant funding for that type (35%), and publication impact for that type (20%). Percentile ranks are computed within the group of centers active in that cancer type — not across all 109 centers — so specialist institutions can rank highly in their area of focus.

Question 4

What are Research Tier badges?

Accepted Answer

Research Tier badges summarize a center's Cancer-Type Specialization Score: High Volume (score ≥ 70) indicates among the most research-active centers for that cancer type; Active (score ≥ 35) indicates a meaningful research presence; Contributing (any documented activity) indicates trials, grants, or publications exist but at lower volume.

Question 5

How often is CancerLogix data updated?

Accepted Answer

NIH grants, clinical trials, and publications are all refreshed weekly via automated GitHub Actions workflows. Excellence Scores and Cancer-Type Specialization Scores are recomputed after each data refresh. CMS HCAHPS survey measures are refreshed annually each November when CMS releases a new 12-month survey window; CMS Overall Star Ratings are refreshed quarterly. The NCI cancer taxonomy is static and updated in tandem with major NCI Thesaurus releases.

Question 6

Why do some smaller centers have high clinical trial counts?

Accepted Answer

Clinical trial counts include every trial where a center appears as a participating location — including large NCI cooperative group studies (Children's Oncology Group, SWOG, ECOG-ACRIN, etc.) where 50–100+ institutions co-enroll patients. A smaller academic medical center that participates broadly in cooperative networks may show a high raw trial count relative to its independent research footprint. The Excellence Score partially corrects for this because NIH funding and publication impact are independent signals not inflated by cooperative group membership.

Question 7

Does CancerLogix measure clinical quality or patient outcomes?

Accepted Answer

Partially. CancerLogix primarily measures research activity — NIH funding, clinical trial participation, and peer-reviewed publication output. It does not measure patient outcomes, survival rates, clinical quality standards, insurance acceptance, or wait times. However, CancerLogix does display CMS HCAHPS patient satisfaction survey scores (recommend %, overall rating, and communication star ratings) for approximately 94 centers where CMS data is available. HCAHPS measures patients' reported experiences during a hospital stay — not clinical outcomes or cancer-specific quality. A high Excellence Score means a center is a major federally funded research institution, not a recommendation for where to seek care.

Source	What we collect	Scale
NIH RePORTER	Cancer-focused grants, award amounts, abstracts, fiscal years	~52,400 grants (FY2020+)
ClinicalTrials.gov	Active and historical trials, conditions, phases, status	~95,200 trials
PubMed / iCite	Peer-reviewed publications, MeSH annotations, Relative Citation Ratio (RCR)	~256,800 publications

Badge	Score threshold	Meaning
High Volume	≥ 70	Among the most research-active centers for this cancer type
Active	≥ 35	Meaningful, documented research presence
Contributing	> 0, with activity	Trials, grants, or publications exist but lower volume
No badge	—	No documented research activity in our dataset

Measure	What It Asks
Would Recommend	“Would you recommend this hospital to family and friends?” (% “Definitely yes”)
Overall Rating 9–10	“What number would you use to rate this hospital during your stay?” (% giving 9 or 10 out of 10)
Doctor Communication	How often doctors communicated well (star rating, 1–5)
Nurse Communication	How often nurses communicated well (star rating, 1–5)

Dataset	Refresh cadence	Notes
NIH grants	Weekly (Sunday 2 am UTC)	Via NIH RePORTER API; cancer-only filter applied on each run
Clinical trials	Weekly	Via ClinicalTrials.gov API
Publications	Weekly	Via PubMed / iCite APIs; RCR scores refreshed each run
Cancer taxonomy (NCIt)	Static	Refreshed in tandem with major NCI Thesaurus releases
Excellence scores	Recomputed after each data refresh	Scores are derived, not stored independently of source data
HCAHPS survey measures	Annually (November)	CMS releases a new 12-month survey window each October/November; recommend %, rating %, and communication stars are refreshed then
CMS Overall Star Rating	Quarterly (Feb, May, Aug + Nov)	CMS updates the composite star rating ~quarterly; CancerLogix refreshes the star column on the same cadence

How Rankings Work

Data Sources

Cancer Taxonomy

Connecting Data to Cancer Types

Publications — MeSH to NCIt mapping

Grants — abstract keyword extraction

Clinical trials — condition fields

Center Excellence Score (0–100)

Cancer-Type Specialization Score (0–100)

Research Tier Badges

Patient Experience Data

What HCAHPS Measures

What HCAHPS Does Not Measure

Two-Track Data Approach

CMS Overall Star Rating

What This Doesn't Measure

Data Freshness