Information Addition: The moment when machine learning stopped imitating reality and started reversing it / Eran Hadas

Translated by: Mor Ilan

Art has always dealt in imagination, often creating worlds of alternative and fabricated reality. In the age of AI (artificial intelligence), it is the machines – or, more precisely, software programs – that create imaginary worlds of alternate reality. Thus, art can serve as a bridge between physical and artificial reality. Conversely, art is an arena where one can visit technology and its influence on our culture from a unique standpoint. For the most part, technology is measured by economic values (profits generated) or functional measures (speed, amounts of information, etc.) Art can discuss the ways in which technology impacts our reality, its ethical and aesthetic qualities, its non-functional uses or applications other than those for which it was designed. In this text, I will discuss its implications on machine learning, the form of AI that has entered and to some degree taken over our lives in recent years.

Problems of machine learning

Machine learning is a form of learning based on generalizations derived from examples. This is the element most easily understandable to people. But people are often incredulous when they hear that this type of learning is not founded on rules relating to problems. For example, models providing optimal travel routes that include no transportation laws, or character-level text generators that have no inkling whether the assembly of strokes created between spaces actually constitutes words. Obviously, there are rules governing machine learning systems (called models), but these are purely mathematical rules that represent examples merely as numerical entities the model can process. As this is the case, are we subject to the laws of reality, or to mathematical laws that may impose a different reality altogether?

Generalization, the underlying principle of machine learning, is known to be problematic when it relates to human beings. Generalizing the qualities of a group of people when based on the conduct of a handful of individuals is considered socially taboo, as this infers the traits of some as based on traits of others, an act of pre-judgement or prejudice. In contrast, when a machine learning model needs to identify objects in a picture, we expect it to generalize information from images it has processed to those it has never encountered. For example, once a model has been trained to identify thousands of tables, it is now expected to identify tables never entered into its databanks. We know that airports are installed with machine learning systems. Are they designed to recognize how terrorists behave as based on previous examples, predicting for each individual whether they are themselves terrorists?

As mathematical rules provide only a framework of computation completely oblivious to the content it represents, the essence of machine learning lies in data. In many instances, the more data contained in the system – the better it functions. Access to larger data sets is often more available to corporations and governments than to private individuals, but even with open-source data, various levels of information processing are required in order to train data. There is a hierarchy of CPUs (central processing units) in personal computers, GPUs (graphics processing units) for companies that deal with machine learning, and TPUs (tensor processing units) for Google used in vast systems, such as that of its automatic translation model. Thus, Google is capable of performing something that would be very difficult for a smaller company, and impossible for any private individual.

Even in cases when the playing field is even, and everyone has the same computation abilities, problems still exist. There are several examples of applications designed in San Francisco’s Silicon Valley for the people employed there that generalize well when it pertains to white men, but less successfully when applied to other populations. There are even some unpleasant examples, such as webcams designed to track and identify human faces that were capable of capturing images only of white people^[1], or those incapable of tagging black people as humans^[2].

Other problems crop up even when no population segment is missed, or partial representation occurs. The clearest example of this is Word2Vec, based on the GoogleNews dataset containing hundreds of thousands of articles. This algorithm, mapping words into a mathematical space, is famous for its ability to discover analogies. Thus, questions such as: “France is to Paris is as Italy to….?” will generate the answer: “Rome”. However, it was quickly discovered that for a question such as: “Man is to doctor as woman is to…?” generated the answer: “nurse”^[3], with many additional prejudices seen in this ostensibly objective system. Such results have raised some measure of awareness in the software design world to the fact that machine learning systems cannot merely represent information, but also correct biases.

In the 2015 Print Screen Festival, Eyal Gruss and I presented a work titled “Word2Dream”^[4], where users enter a text and the computer program extrapolates from that text to offer a range of associative words based on the software algorithm. One interesting result is evident in the fact that although no bias seems to emerge initially in the data, the associative chain still leads to bias. Thus, even when specific bias points are identified and corrected, the big picture may remain unchanged. For instance, when entering Martin Luther King’s “I have a dream” speech into the model, the computer produces associations that lead to Zionism, feminism, and Nazism. Examples such as this motivated us in our project to demonstrate the artificial and general nature of our associative ability as humans, and that the data we inundate the internet with perpetuates our prejudices in artificial intelligence.

2. Machine learning and information addition

The range of problems machine learning must grapple with is increasingly expanding. In the first years of the current decade, particularly with the success of machine learning methods used to identify objects in photos, models were developed to imitate human abilities. In fact, this was the time in which the “deep learning” brand, representative of models based on neural networks, became synonymous with artificial intelligence. Meaning, problems of human perception were accepted to be impossible for computers to solve, the purview of humans alone. This is why more emphasis was placed on computer vision, speech identification, machine translation, chatbots, and other human abilities, with researchers trying to understand whether technological advancements and machine learning algorithms provide better results than previously acquired. In many cases, the answer was positive, such as with computer vision and speech recognition, where machine learning does indeed produce results similar to that of humans. In others, certain algorithms performed better than the older ones and yet still failed to meet comparable human standards, such as with automatic translation. Thus, many models of machine learning have yet to demonstrate good results.

In recent years machine learning has encountered a new set of problems, where the objective is to receive input with a certain amount of data and create output of far greater data quantity. One example of this is a coloring model that receives black-and-white images and is designed to saturate them with color, and another is a program that enhances photographs taken with a regular camera to create high-resolution images that seem as if taken by a DSLR (digital single-lens reflex) camera. An innovative study has shown how it is possible to enhance an image photographed in very poor lighting and reconstruct it to appear as it would in better lighting conditions^[5].

The proliferation of models, termed here as “information addition”, stems from developments in algorithms and machine learning technologies. The most popular models are those focused on style transfer of images, meaning those that enable using an existing image to create a new one in a completely different style. Artist and programmer Gene Kogan often uses this technique^[6]. One breakthrough in deep learning is a model called GAN (Generative Adversarial Network) in which (conceptually, at least) two machine learning entities work in tandem, reciprocally improving through the feedback provided by the other.

During the 2018 Print Screen Festival, Eyal Gross and I exhibited a project titled “The Electronic Curator”, a GAN^[7]-based work that presents a tug of war between artist and curator. The painter attempts to create a portrait of vegetables from user images, while the curator approves or rejects each attempt, and thus the two learn together. Our intent was to create a project with the understanding that artificial intelligence needs discourse, feedback, and competition to attain objectives, just as people do. Something about this tandem interaction, where one requires the other and yet also links achievement with competition, felt to us like a quality (often unperceivable) of Western culture worthy of attention.

There are those that claim^[8] that the motivation for information addition projects is actually economic – not technological. Manufacturers will favor the production of cameras made with relatively cheap hardware, installing it with expensive software that reduces noise and improves quality rather than a product requiring expensive hardware, as the cost of hardware is built into each and every device, while software development need be done only once, with no additional expense for each additional device.

The difference in these new models lies in the fact that they create a reality that does not necessarily exist. When you transform a color image into one that is black and white, it is clear to all that this representation is not real (as reality itself is not black and white), but it is indexical. That is to say, we understand that the shift from color to black and white is the product of a single-value process, one that can be easily evaluated to ascertain whether the outcome is truly representational of the original color image. Contrastingly, when one goes the other way – transforming a black-and-white image to color – the indexical value is lost in the process. If the original image depicts a grey-toned truck, we have no way of knowing whether the representation should be that of a truck colored green, blue, or red. As this information is missing, its prediction is pure invention. This is not due to any mistake in the model or the data, but something else entirely: this is the consolidation of a new, fabricated reality, one that may be an attempt to emulate a familiar reality but not that reality itself. Even should the prediction of color produce the “correct” hue, this result is not a reconstruction of reality, but the establishment of one that is different, yet similar.

3. “Reversed” reality

Machine learning requires examples. The most basic and popular method used today is that of “supervised learning”, whereby predicted input examples are fed into the program along their corresponding expected outputs. For instance, for identifying objects in a picture, the model will be fed with the picture along with the objects it contains. There are other options for learning. In training “The Electronic Curator”, for example, there is no link between inputs and outputs, but separate arrays of inputs and outputs. There is also the possibility of entering only input sets through unsupervised learning models, but the majority of models today are still supervised.

In order to create a supervised model, one need gather information, and often even generate it. In object recognition models, one may use images from the Internet with relative ease. However, providing the recognized objects within an image entails resolving the problem that motivated the development of the model to begin with. In such cases, objects tagging must be accomplished manually, or focused algorithms must be employed to solve the problem for a reduced and limited set. In any case, effort must be expended to transform input into output, as only when each input is linked to an equivalent output is the model ready for training.

In information addition models the process is similar, yet also distinctly different. While training a model to transform a black-and-white image to color can be accomplished with relatively little effort by collating black-and-white images, the task of saturating them with color is a difficult, manual endeavor. Manual coloring requires research and thought and is time consuming. However, gathering many color images and digitally manipulating into black-and-white images takes little time. So, the two elements of information needed by the model may be obtained, only this time the input (color image) is entered in as the predicted output, while the output (the black-and-white image) is entered in an input. This maneuver is not merely a fabrication of reality, but its reversal: input becomes output and output becomes input, cause becomes effect and effect becomes cause, and the “before” image is made into the “after” image, while the “after” is transformed into the “before”.

An amusing example of information addition outcomes, or that of information completion, can be seen in the project by artist Clement Valla titled “Postcards from Google Earth”^[9], where he uses satellite photographs from space that provide different angles of adjacent areas. A model was developed to create continuous images, one that completes the gaps and connects between the various image parts. This results in soaring bridges, hovering cars, and other oddities. Another example using Google mapping is Sandy Island, which actually appears in Google and is utterly real in terms of the algorithm but does not exist in reality^[10].

Beyond the quirkiness of such examples, the greater problem in models that employ the “reverse” reality method is that they establish a believable reality. Navigation apps used regularly present a false reality, and we may attempt to cross nonexistent bridges or sail to fictional islands. Moreover, in a world striding ever faster in the design of visual aids (such as Google Glass), where we will likely soon wear eyewear based on machine learning to “enhance” images, we come closer to a state whereby we no longer rely on our own senses. And this is not due to a mistaken algorithm, but because we opt for information addition models. We knowingly choose to disengage from our reality to embrace an alternate one, which – at least in terms of cause and effect – is a reverse reality.

^[1] https://www.theatlantic.com/technology/archive/2016/04/the-underlying-bias-of-facial-recognition-systems/476991/

^[2] https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai

^[3] https://www.technologyreview.com/s/602025/how-vector-space-mathematics-reveals-the-hidden-sexism-in-language/

^[4] http://eranhadas.com/word2dream/

^[5] https://arxiv.org/pdf/1805.01934.pdf

^[6] http://genekogan.com/works/style-transfer/

^[7] https://www.youtube.com/watch?v=4sZsx4FpMxg

^[8] https://www.e-flux.com/journal/60/61045/proxy-politics-signal-and-noise/

^[9] http://www.postcards-from-google-earth.com/

^[10] https://www.livescience.com/28822-sandy-island-undiscovered.html