This interactive map shows how similar lung tumors look to artificial intelligence. Each dot represents a real lung cancer patient from the National Lung Screening Trial (NLST), a large U.S. clinical study of lung cancer screening with low-dose CT scans. Lines connect patients whose tumors look most alike — or least alike — according to AI models trained on medical images.
Researchers took CT scans of 289 lung cancer patients, each with a single identified tumor. The tumor in each scan was located using bounding boxes from the NLST-Sybil analysis. Each tumor region was then fed into 9 different AI models (called foundation models). Each model converts the tumor image into a mathematical fingerprint — a list of hundreds of numbers called an embedding. These fingerprints capture the visual features of the tumor as the AI "sees" it. We then compared all fingerprints to find which tumors look most — and least — similar to each other.
For each patient, the AI compares their tumor fingerprint against all other 288 patients to find the 5 closest (most similar) and 5 farthest (least similar) matches:
The similarity is measured using cosine distance, which compares the "direction"
of two fingerprint vectors. Think of each fingerprint as an arrow pointing in some direction.
If two arrows point nearly the same way, the angle between them is small and the tumors are very similar
(cosine distance close to 0). If they point in very different directions, the tumors are very different
(cosine distance close to 1). The similarity % shown on edges is
(1 − cosine distance) × 100 — so 95% means nearly identical fingerprints,
while 50% means quite different. Cosine distance measures the pattern of features rather than
their overall strength, so two tumors can be "similar" even if one has a stronger signal, as long as
the relative feature patterns match.
All data is publicly available from NCI Imaging Data Commons (IDC). Tumor annotations are from the NLST-Sybil collection. Source code is on GitHub.
Building connectome...