NLST Sybil Tumor Connectome

What is this?

This interactive map shows how similar lung tumors look to artificial intelligence. Each dot represents a real lung cancer patient from the National Lung Screening Trial (NLST), a large U.S. clinical study of lung cancer screening with low-dose CT scans. Lines connect patients whose tumors look most alike — or least alike — according to AI models trained on medical images.

How was it made?

Researchers took CT scans of 289 lung cancer patients, each with a single identified tumor. The tumor in each scan was located using bounding boxes from the NLST-Sybil analysis. Each tumor region was then fed into 9 different AI models (called foundation models). Each model converts the tumor image into a mathematical fingerprint — a list of hundreds of numbers called an embedding. These fingerprints capture the visual features of the tumor as the AI "sees" it. We then compared all fingerprints to find which tumors look most — and least — similar to each other.

What do the connections mean?

For each patient, the AI compares their tumor fingerprint against all other 288 patients to find the 5 closest (most similar) and 5 farthest (least similar) matches:

Most Similar: The 5 patients whose tumors the AI considers to look most alike are connected by lines. A higher similarity percentage on a line means the two tumors look nearly identical to the AI.
Least Similar: The reverse — the 5 patients whose tumors the AI sees as looking the most different. These highlight outlier relationships and unusual tumors.

The similarity is measured using cosine distance, which compares the "direction" of two fingerprint vectors. Think of each fingerprint as an arrow pointing in some direction. If two arrows point nearly the same way, the angle between them is small and the tumors are very similar (cosine distance close to 0). If they point in very different directions, the tumors are very different (cosine distance close to 1). The similarity % shown on edges is (1 − cosine distance) × 100 — so 95% means nearly identical fingerprints, while 50% means quite different. Cosine distance measures the pattern of features rather than their overall strength, so two tumors can be "similar" even if one has a stronger signal, as long as the relative feature patterns match.

Reading the graph

Dot size: Patients who appear as a top match for many others get bigger dots — they represent "typical" tumors that many other tumors resemble.
Clustering: The graph uses a physics simulation that pulls connected patients closer together. Visible clusters mean the AI found groups of similar-looking tumors.
Color: Dots can be colored by clinical features (cancer type, sex, age, etc.) to see whether AI-perceived similarity correlates with clinical characteristics.
Different models, different views: Each of the 9 AI models was trained differently and focuses on different features. Switching models may reveal different groupings — some may cluster by tumor size, others by texture or shape. This is one of the key things to explore!

How to use it

Model selector: Choose which AI model's view of tumor similarity to display.
Most Similar / Least Similar: Toggle between seeing the closest or farthest matches.
Color by: Color the dots by clinical features like cancer type, sex, or stage.
Click a node: Pins the node — its adjacency highlight stays locked and the detail panel opens with the patient's clinical data, a CT scan thumbnail, a link to view the full scan, and their top matches. The highlight persists even when you move the mouse away or hover over other nodes.
Click another node while pinned: Switches the pin to the new node.
Double-click anywhere on the graph (not on a node): Unpins the selection and closes the detail panel.
× button in the detail panel: Also unpins and closes the panel.
Drag: Pan the graph. Does not unpin.
Scroll: Zoom in/out.
Share URL: Copy a link that preserves your current view (model, match type, color, and selected patient).

Data sources

All data is publicly available from NCI Imaging Data Commons (IDC). Tumor annotations are from the NLST-Sybil collection. Source code is on GitHub.