What Iris can't do — honest limitations of AI health investigation

We built Iris because AI is genuinely useful for health investigation. We also know exactly where it breaks down. Both things are true, and pretending otherwise would put you at risk.

This article is the honest version.

Iris cannot diagnose you

Pattern recognition is not diagnosis. Iris can tell you that your fatigue correlates with poor sleep and high stress, and that it worsens in the two days before your period. That's useful information. But it's not a diagnosis — it's a set of observations that need clinical interpretation.

Diagnosis requires things AI doesn't have access to: physical examination, laboratory testing, imaging, and clinical judgment developed over years of training. A pattern that looks like thyroid dysfunction in your tracking data could be thyroid dysfunction, or it could be six other things that produce similar patterns.

The right way to use Iris's findings: as evidence to bring to your provider. "I tracked my symptoms for three weeks and here's what correlates" is dramatically more useful in a 15-minute appointment than "I'm just tired all the time." But the provider makes the diagnosis, not the AI.

Iris makes confident-sounding mistakes

Large language models don't know what they don't know. Research in Nature Medicine found that AI systems generate confident, plausible-sounding health information that is factually wrong — and that users often can't tell the difference.

Iris has safeguards. The Supervisor reviews every response for overconfidence, hallucinated claims, and safety issues. But no safeguard catches everything. If Iris says "research shows that magnesium reduces migraine frequency by 40%," it might be right — or it might have fabricated that specific statistic while the general direction (magnesium has some evidence for migraines) is correct. The claim sounds authoritative either way.

What to do: treat Iris's factual claims as starting points for investigation, not as settled truth. If a specific claim matters for a health decision, verify it — ask Iris for the source, check whether the reference exists, or raise it with your provider. The Supervisor catches many errors, but your critical thinking is the final layer.

Iris might miss context it didn't load

Iris stores your health information as persistent notes, but it doesn't load everything into every conversation. It browses your notes, reads summaries, and selectively loads what seems relevant. Sometimes it misses.

You might be discussing headaches, and Iris doesn't factor in the medication you mentioned three conversations ago — because it loaded your headache notes but not your full medication list. This isn't a bug. It's a tradeoff between comprehensive context and the model's limited working memory.

The fix: when context matters, direct Iris to it. "Check my medication notes" or "Load my sleep data from this month" tells it exactly where to look. You can always ask "What notes did you load?" to see what it's working with. If an answer feels off, a missing note is often the reason.

Iris doesn't replace your doctor

This one seems obvious but needs saying clearly, because the more useful AI becomes, the more tempting it is to skip the appointment.

Iris can help you track symptoms, find patterns, prepare for appointments, and make sense of what your provider told you. It cannot examine you, order tests, prescribe medication, or provide the clinical judgment that comes from years of medical training.

Some things specifically require a provider: new or worsening symptoms that could indicate something serious (the red flags sections in our condition guides cover these), medication decisions, interpretation of lab results in clinical context, and any situation where you're unsure whether something is dangerous.

AI makes you a better-prepared patient. It doesn't make you your own doctor.

Iris can be wrong about patterns

The Data Analyst finds correlations in your tracked data. Correlation is not causation — a phrase so overused it's lost its force, but it matters here.

If your headaches correlate with barometric pressure changes, that might be a real trigger mechanism (evidence supports this for migraines). Or it might be that barometric pressure drops coincide with weekdays when you're stressed and sleep-deprived, and the real driver is the stress-sleep combination. The correlation is real; the causal interpretation is uncertain.

Iris tries to account for confounding variables, but it's working with whatever data you've tracked. If you haven't tracked stress, it can't factor stress into the analysis. If you tracked inconsistently, the patterns it finds may reflect your logging habits more than your biology.

The right approach: treat data-driven findings as hypotheses to test, not conclusions. "My headaches might be related to sleep quality" becomes an experiment: prioritize sleep for two weeks, track whether headaches improve. That converts a correlation into evidence.

Iris works poorly without data

This is the most practical limitation. Iris's analytical power depends entirely on the data you give it. Vague, sporadic logging produces vague, unreliable analysis. Two weeks of consistent daily tracking — even brief — gives the system enough to find real patterns. Less than that, and you're getting generic suggestions dressed up as personalized analysis.

If you're not ready to track consistently, Iris still works as a knowledgeable conversation partner for health questions. But the investigation features — pattern detection, correlation analysis, personal health formulas — need structured data to function. There's no shortcut for this.

When AI is the wrong tool entirely

Some situations call for a human, not an algorithm. Mental health crises. Acute emergencies. Situations where empathy and human judgment matter more than pattern analysis. Iris has safety protocols for detecting these situations and directing you to appropriate resources, but recognizing the boundary yourself is better than relying on the system to catch it.

If you're experiencing thoughts of self-harm, severe depression, or any situation that feels like an emergency — talk to a person. A crisis hotline, a mental health professional, an emergency department. AI doesn't belong in that moment.

References

Large language models in medicine — Nature Medicine, 2023. AI reliability and confident error generation in medical contexts.
Quality of AI chatbot responses to medical questions — JAMA Internal Medicine, 2023. AI-generated medical information accuracy.
Correlation, causation, and confusion — BMJ, 2016. Distinguishing correlation from causation in observational health data.
AI ensemble methods for reliability — PNAS, 2021. Independent model cross-checking and its limitations.