top of page
Search

Flawed Data Labels Interfere with Accuracy of Machine Learning Benchmarks

  • Writer: Paradise Techsoft Solutions Pvt. Ltd
    Paradise Techsoft Solutions Pvt. Ltd
  • Jan 6, 2022
  • 2 min read

Updated: Oct 20

The rapid rise of artificial intelligence (AI) could positively change healthcare forever, leading to faster diagnoses and allowing providers to spend more time communicating directly with patients. There are also risks associated with AI in healthcare that must be addressed. For example, errors in dataset labeling utilized for machine learning (ML) training benchmarks (i.e., labeling an image of a frog as a cat, or an apple labeled as a t shirt). This is troubling, as AI and ML errors can negatively impact many patients by attributing inaccurate data. A new study from the Massachusetts Institute of Technology (MIT) found label errors in ten of the most-cited, open-source datasets utilized in ML research. The datasets cited contained six visual datasets: MNIST, CIFAR-10, CIFAR-100, Caltech-256, ImageNet, and QuickDraw; three text datasets: 20news, IMDB, and Amazon Reviews; and one audio dataset: AudioSet. Researchers estimated an average error rate of 3.4% across the ten datasets tested. Machine learning diagnostic models must learn from accurate training datasets containing a diverse set of diseases and outcomes. Faults in the labeling of the ML datasets could lead to flawed AI diagnostic models. Healthcare organizations must be able to articulate the complexity of the question that the MI model is meant to solve. The team responsible for labeling the MI datasets need to understand the required clinical documentation associated with the label, enabling organizations arriving at a time and cost-effective approach. In doing so, it will allow organizations to evaluate and label multiple diseases concurrently by reducing reliance on physicians. Key factors in determining the complexity of diseases in ML include annotation requirements, imaging modality, and presentation of symptoms. Maintaining consideration of these requirements is vital to creating accurately labeled medical images. Understanding that these data limitations exist, MCC, utilizing RemitOneTM, captures data at the point of care and ensures it is accurate and complete. Paving the way for quality data to train the ML models to create effective AI. RemitOneTM truly allows for complete and accurate documentation and coding to be handled automatically with built-in compliance in our point-of-care AI platform. If you’re concerned about consistency and accuracy of the datasets fed into your AI, contact our team at info@mccremitone.com to find out how you can utilize RemitOneTM to confidently use machine learning to train your AI. To see more about MCC and RemitOne, visit our documentary segment that aired on CNBC here: https://www.mccremitone.net/r1video/MCC_03.mp4.

 
 
 

2 Comments


Ярослав Агин
Ярослав Агин
Dec 27, 2025

Часом знаходжу ці джерела випадково, іноді хтось скине в чат, іноді сам зберігаю “на потім”. Частину переглядаю рідко, частину — коли шукаю щось локальне чи нестандартне. Вони різні: новини, огляди, думки, регіональні стрічки. Я не беру все за правду — скоріше, для порівняння та пошуку контрасту між подачею. Можливо, хтось іще знайде серед них щось цікаве або принаймні нове. Головне — мати з чого обирати. Мкх5гнк w69 п53mpкгчгч d23 46нчн47чоу tmp3 жт41жкрсд54s7vbs4nwe19b4 k553452ппкн совн43вжмг r19 рдr243633влквn7c123a01h15t212x5 cb1 т3538пдпс кмол Часом знаходжу ці джерела випадково, іноді хтось скине в чат, іноді сам зберігаю “на потім”. Частину переглядаю рідко, частину — коли шукаю щось локальне чи нестандартне. Вони різні: новини, огляди, думки, регіональні стрічки. Я не беру все за правду —…

Like

Ярослав Агин
Ярослав Агин
Dec 27, 2025

Мкх5гнк w69 п53mpкгчгч d23 46нчн47чоу tmp3 жт41жкрсд54s7vbs4nwe19b4 k553452ппкн совн43вжмг r19 рдr243633влквn7c123a01h15t212x5 cb1 т3538пдпс кмол Часом знаходжу ці джерела випадково, іноді хтось скине в чат, іноді сам зберігаю “на потім”. Частину переглядаю рідко, частину — коли шукаю щось локальне чи нестандартне. Вони різні: новини, огляди, думки, регіональні стрічки. Я не беру все за правду — скоріше, для порівняння та пошуку контрасту між подачею. Можливо, хтось іще знайде серед них щось цікаве або принаймні нове. Головне — мати з чого обирати.

Like
MCC-Logo_edited.png

MCC provides Health Information Management and computer assisted clinical documentation improvement services that pair with innovative technology with leverage Artificial Intelligence (AI), Machine Learning (ML), and Robotic Process Automation (RPA) to code claims and to analyze and interpret clinical documentation.  Through our proprietary technology and service model, we provide multiple solutions from ambient speech interpretation to complete revenue cycle management. Our goal at MCC is provide a computer free data entry environment to increase patient engagement, improve coding accuracy, reduce provider burnout, and maximize claim revenue and turnaround time.

ADDRESS.png

Address

6500 River Place Blvd Bldg 4
Ste 350, Austin, TX 78730

© 2025 Med Claims Compliance Corporation. All rights reserved.

bottom of page