IBM Addresses AI Bias with Massive Image Archive

IBM revealed that it will soon make available to the global research community a dataset of 1 million images to improve facial analysis system training; plus a dataset of 36,000 facial images that algorithm designers can use to evaluate bias in their own facial analysis systems.

IBM.Summit.supercomputer

Bias is a provocative term that’s being peppered into more and more conversations—especially when it comes to discussions (or arguments) about politics and the way media companies cover news events.

But bias isn’t only about a person’s predetermined views affecting his or her opinion on a particular topic. It’s also an important factor in how accurate information from a query using artificial intelligence turns out.

IBM has built its reputation on a commitment to bringing new technologies into the world responsibly. Users must trust new technologies, or else they cannot have a positive impact. IBM’s business has been guided for decades by a set of trust and transparency principles that include the company’s belief that enterprises using AI have a responsibility to address the issue of bias head on.

Preventing Bias from Getting into AI is Imperative

With this in mind, and as the adoption of AI increases, the issue of preventing bias from entering into AI systems is rising to the forefront. No technology--no matter how accurate--can or should replace human judgment, intuition and expertise. The power of advanced innovation lies in technology’s ability to augment, not replace, human decision-making, IBM contends.

Thus it is critical that any organization using AI -- including visual recognition or video analysis capabilities -- train the teams working with it to understand bias, including implicit and unconscious bias, monitor for it, and know how to address it.

Because of that belief, and because an AI system is only as good as the data upon which it is trained, IBM revealed June 27 that it will soon make available to the global research community:

  • A dataset of 1 million images to improve facial analysis system training. This archive will be five times larger than the largest face image dataset available today, and it is specifically designed to reduce sample selection bias.
  • A dataset of 36,000 facial images--equally distributed across various attributes-- that algorithm designers can use to evaluate bias in their own facial analysis systems. This will specifically help algorithm designers to identify and address bias in their facial analysis systems. The first step in addressing bias is to know there is a bias, and that is what this dataset will enable.

The facial attribute and identity training dataset is annotated with attributes and identity, using geo-tags from Flickr images to balance data from multiple countries and active learning tools to reduce sample selection bias, the company said.

Currently, the largest facial attribute dataset available is 200,000 images. Additionally, data sets available today only include attributes (hair color, facial hair, etc.) or identity (identifying that five images are of the same person)--but not both. This new dataset changes that to make a single capability to match attributes to an individual.

Earlier this year, IBM substantially increased the accuracy of its Watson Visual Recognition service for facial analysis, which demonstrated a nearly 10-fold decrease in error rate for facial analysis.

A technical workshop is being held Sept. 14 by IBM Research--in collaboration with University of Maryland--to identify and reduce bias in facial analysis. This is in conjunction with the European Conference on Computer Vision 2018. The results of the competition using the IBM facial image dataset will be announced at the workshop. 

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor of Features & Analysis at eWEEK, responsible in large part for the publication's coverage areas. In his 13 years and more than 4,000 articles at eWEEK, he...