Machine learning, for all the great things it can do for us, has a pretty big flaw: it's data-hungry.
Machine learning, for all the great things it can do for us, has a pretty big flaw: it’s data-hungry. To train a new algorithm, you need to feed it a huge pile of meticulously-labeled information so that it can learn to sort similar inputs on its own. If you have access to a giant dataset, that works great. But without one, the algorithm is way less accurate, and may not even be useful.
That’s why researchers at Nvidia (a tech company that makes computer chips and video cards) and several hospitals teamed up to create an AI specifically designed to spit out realistic brain scans of nonexistent patients, complete with a range of scary tumors. Those ersatz scans, they say, could be used to train future AIs when researchers don’t have enough real data to train their algorithms. The researchers detailed their work in a paper uploaded to arXiv.
DATA BEGETS DATA
To generate the tumors, the Nvidia team started with a generative adversarial network (GAN), a type of machine learning algorithm in which one part generates something and another critiques it (in the hopes of improving the first). The researchers fed it two datasets of brain scans: one of Alzheimer’s disease patients, to teach it what non-cancerous brains looked like, and one of patients with brain tumors.
Using those inputs, one part of the GAN generated 3D magnetic resonance images (MRIs) of brains with cancerous tumors. Then the other part tried to guess whether the image came from a real person or was AI generated. If the first part of the GAN couldn’t fool the second part into thinking the image was authentic, it would adjust its algorithm and try again.
When they were finished training the AI, the researchers tested whether it could differentiate between synthetic scans and ones from actual patients. The researchers found that AIs trained with a combination of AI-generated and authentic scans could pick out where the tumor was in the image with 80 percent accuracy, while AIs trained on authentic data alone achieved just 66 percent.
Yes, the Nvidia team’s algorithm is capable of generating datasets with which to train other AIs. But there’s a chance it isn’t the best data.
First: the researchers trained their system using just two datasets. That means that anything the GAN generates won’t go beyond whatever’s included in them. The GAN could also be particularly adept at generating certain types of tumors and not others, which could hurt the AI’s ability to diagnose a range of real-world patients.
Still, those concerns haven’t dampened enthusiasm for the AI, according to lead author Hu Chang. He told VentureBeat radiologists are eager to use it generate more images of rare diseases.
READ MORE: Nvidia Researchers Develop AI System That Generates Synthetic Scans of Brain Cancer [VentureBeat]
More on AI bias: Stop Using Discriminatory AI, Human Rights Groups Say