Doctors already use AI scribes to record patient consultations and create notes automatically. But new research shows these systems work much better when they can see what’s happening, not just hear it.
A study from Flinders University tested AI scribes fitted with Ray-Ban Meta smart glasses. The vision-enabled system achieved 98% accuracy in documenting patient consultations, compared to 81% for audio-only versions. The research was published in npj Digital Medicine.
The difference was particularly stark when recording medication details. The AI with video captured crucial information like drug strength and form 97% of the time. Audio-only systems managed just 28%.
How does it work?
The researchers used Google’s Gemini AI model combined with Ray-Ban Meta smart glasses to create their vision-enabled scribe system. Here’s the process:
- Clinical pharmacists wore the smart glasses during mock patient consultations
- The glasses recorded both video and audio of the interactions
- The AI analyzed visual elements like medicine containers, prescriptions, and patient body language
- It combined this visual data with spoken information to create comprehensive notes
The study involved 10 pharmacists conducting 110 simulated medication-history interviews with more than 100 different types of medicines, including tablets, capsules, injections, and creams.
Why does it matter?
Medical documentation takes up huge amounts of clinicians’ time. AI scribes already help reduce this burden, but accuracy problems mean doctors still need to spend time checking and correcting the notes.
“A lot of clinically important information is visual,” says research author Bradley Menz. “Important visual cues during consultations include patients’ medicine containers, prescriptions and devices, as well as their body language.”
Better accuracy means less time spent editing AI-generated notes. This could free up even more time for actual patient care. The visual component also captures critical safety information that audio-only systems miss entirely.
The researchers stress this is an “augmented tool, not a replacement for clinical judgment.” Doctors still need to review and approve all AI-generated documentation.
The context
AI scribes have gained popularity in healthcare because they reduce administrative work that keeps doctors away from patients. But current systems have significant limitations when important information is communicated visually.
This study suggests the next generation of AI scribes will need visual capabilities to be truly effective. Associate Professor Ashley Hopkins, the senior author, believes this could “open the door to wider clinical uses” for AI scribes.
However, the researchers acknowledge several challenges before vision-enabled AI scribes can be widely adopted:
- Privacy concerns about video recording in medical settings
- Patient consent for visual documentation
- Data security for video files
- Integration with existing hospital workflows
- Need for careful governance and oversight
The study used simulated consultations, so real-world testing will be needed before these systems can be deployed in actual clinical settings. But the accuracy improvements suggest vision-enabled AI scribes could become the new standard for medical documentation.
