Researchers around the world are testing ChatGPT for its ability to answer medical questions. An artificial intelligence (AI) program could become a source of accurate and complete medical information, they say, but it is not yet fully ready, according to a new study, cited by News.ro.

ChatGPT is a tool that can make your life and work easier if you know how to use it correctlyPhoto: Dmytro Melnikov/Alami/Alami/Profimedia

According to a report published Oct. 2 in JAMA Network Open, ChatGPT’s answers to more than 280 medical questions from various specialties ranged from “mostly” to “almost completely correct” on average.

“Overall, it performed quite well in terms of both accuracy and completeness,” said lead researcher Dr. Douglas Johnson, director of the Melanoma Clinical Research Program at Vanderbilt-Ingram Cancer Center in Nashville, Tenn., United States.

“Of course, it wasn’t perfect. It wasn’t completely reliable, but at the time we entered the question, it was actually pretty accurate and relatively reliable information,” Johnson added.

Answer accuracy increased even more when a second AI program was introduced to review the answer given by the first, the results showed.

Johnson and his colleagues decided to test ChatGPT by asking the AI ​​health questions between January and May 2023, shortly after it went online.

People and doctors already rely on search engines like Google and Bing to get answers to health-related questions, Johnson says. It’s clear that AI programs like ChatGPT will be the next frontier for medical research.

Such AI programs “almost provide an answer system for many types of questions in many different fields, particularly in medicine, so we realized that both patients and potential doctors would use them,” Johnson said.

“We wanted to try to understand across all medical disciplines how accurate and complete the information they would provide,” explained the researcher.

The researchers recruited 33 physicians from 17 specialties to develop 284 easy, medium, and difficult questions for the ChatGPT.

According to the researchers, the accuracy of ChatGPT’s answers to these questions averaged 4.8 on a 6-point scale.

A score of 4 represented a “more correct than incorrect” answer, and a 5 represented an “almost all correct” answer.

The authors of the study stated that the average accuracy was 5 for easy questions, 4.7 for medium questions and 4.6 for difficult questions.

According to the report, ChatGPT also provided fairly comprehensive answers with a score of 2.5 on a 3-point scale.

According to Johnson, even at the relatively early stage of the programs, it was far from completely reliable, but still provided relatively accurate and complete information.

In some specialties, the program performed better.

For example, the researchers found an average accuracy of 5.7 for questions about general diseases and 5.2 for questions about melanoma and immunotherapy.

In addition, the program answered yes/no questions better than open-ended questions, with an average accuracy score of 6 and 5, respectively.

Questions ChatGPT answered best

For example, AI gave an absolutely accurate and complete answer to the question “Should patients with a history of acute myocardial infarction [IAM] get a statin?”.

“Yes, patients with a history of AMI should generally be treated with a statin,” the response begins, before continuing to provide an avalanche of context.

Other questions that the program encountered or even failed.

The researchers noted that when asked “what oral antibiotics can be used to treat MRSA infections” (an infection caused by methicillin-resistant Staphylococcus aureus), the answer included some options not available orally. The answer also left one of the most important oral antibiotics.

However, such misses may be in no small part the fault of the doctor because he did not frame the questions in a way that the program could easily understand, said Dr. Stephen Waldren, chief of medical informatics at the American Academy of Family Physicians. .

Specifically, the program may have encountered the phrase “may be used” in the question, Waldren explained.

If the question was “what oral antibiotics are used,” and no, they could be used, they could take that drug (omitted), he said.

There wasn’t a lot of talk in the paper about how the question should be constructed, because at this point where there are these big language patterns, it’s very important that it’s done in a way that gets the most optimal answer.

What’s more, the researchers found that initially weak ChatGPT responses became more accurate if the initial question was sent again a week or two later.

This shows that artificial intelligence is rapidly getting smarter over time, Johnson says.

“I think it’s probably improved even more since we’ve done our research,” Johnson said.

“I think at this point doctors can consider using it, but only in conjunction with other known resources. I certainly wouldn’t take any recommendation as gospel, even for the long term,” he said

Accuracy also increased if a different version of the AI ​​was introduced to review the first answer.

“One instance generated the response to the query, and the other instance became a kind of AI evaluator that looked at the content and asked, ‘Is this actually correct?’ Waldren said.

It was interesting to use the two AIs in tandem to see if that helped solve some of the inaccurate answers.

The researchers expect that accuracy will improve further if AI chatbots are developed specifically for medical use.

“We can imagine a future where these chatbots are trained on very reliable medical information and can achieve that reliability,” Johnson says. “But I think we’re still a long way from that at this point.”

The two researchers believe that while it is unlikely that AI will completely replace doctors, it could instead become another useful tool for doctors and patients.

Doctors could ask the AI ​​for more information about a complex diagnosis, while patients could use the app as a “health coach,” Johnson said.

“We can certainly imagine a future where someone has a cold or something and a chatbot can enter vital signs and symptoms and so on and give advice. “Okay, do you need to see a doctor about something?”. Or maybe just a virus? And you can pay attention to these five things, and if they happen, go to the doctor. But if not, you’ll probably be fine,” Johnson said.

There is some concern that cost-cutting health systems may try to use AI as a primary resource, asking patients to turn to an intelligent program for advice before making an appointment with a doctor, Waldren says.

“The point is not that doctors will be replaced. But the tasks performed by doctors will change. It will change what it means to be a doctor,” Waldren says of AI

“I think patients will feel financial pressure to try to move these tasks away from the most expensive implementations, and the doctor can be quite expensive.”

As such, he predicted, more patients are likely to be pushed to the AI ​​nurse chat line.

“It could be a good thing because access to health care would be expanded,” Waldren says. “But it can also be bad if we don’t continue to maintain continuity and coordination of care,” the researcher concluded.