Google researchers have developed an experimental machine-learning model that they claim can rate photos and images based on their aesthetic qualities rather than purely quantitative ones.
The proposed model—dubbed Neural Image Assessment (NIMA)—is trained to intelligently predict which images humans are likely to rate as attractive or aesthetically pleasing. Unlike existing aesthetic prediction approaches, which simply categorize images as being of high or low quality, NIMA can rate images with a high degree of correlation to human perception, Google researchers Hossein Talebi and Peyman Milanfar said in a Dec. 19 blog.
In fact, when rating photos that had also been rated by an average of 200 people, NIMA closely matched the average scores provided by the human raters, the researchers said.
The model could be put to use in a variety of labor-intensive tasks that require subjective judgment. Potential apps include intelligent photo editing and apps that optimize visual quality or minimize perceived errors in them, the researchers noted.
NIMA builds on recent work around so-called deep convolutional neural networks (CNN), a machine-learning approach used in applications such as image classification and image recognition. Unlike models that do technical quality assessments of images based on factors such as blurring, pixel-level imperfections and compression, aesthetic assessment focuses on characteristics associated with beauty and emotions in photos.
The CNN approach uses data that has been previously labeled and rated by human scorers to train machine-learning tools to identify the attributes humans are likely to perceive as being aesthetically pleasing. Some models use what are known as reference, or ideal, images to train machine-learning models on certain quality metrics. When no reference images are available, certain previously developed statistical models are used to predict the quality of an image.
Instead of classifying an image as just being either high quality or low quality, Google’s “NIMA model produces a distribution of ratings for any given image—on a scale of 1 to 10, NIMA assigns likelihoods to each of the possible scores,” Talebi and Milanfar said. In other words, the model looks at an image and tries to predict the likelihood of human scorers giving it a rating of a 1, or a 5, or a 10, or any other score on a scale of 1 to 10. The result—the mean score—is then used to rate photos aesthetically.
“This is more directly in line with how training data is typically captured, and it turns out to be a better predictor of human preferences when measured against other approaches,” said the researchers, who published a technical paper describing the work in September.
Predicted quality scores generated by NIMA have been close to human ratings when the model was used to rank photos in certain tests of its capabilities. “In a direct sense, the NIMA network (and others like it) can act as reasonable, though imperfect, proxies for human taste in photos and possibly videos,” the Google researchers said.