Remember that time that we trained a computer to generate English football club names? Or how about the time I trained a predictive text algorithm on the Tottenham Hotspur writing of Barney Ronay? That was pretty great, wasn’t it?
WELL GUESS WHAT.
Neural networks and deep learning have come a long way in the past couple of years, and huge strides are being made in the field, as scientists and enthusiasts continue to develop computer programs that “think.” I’ve found them fascinating for a while now. Back when we developed the Recurrently Generated Football League, it was mostly restricted to text based output, which was funny enough for our purposes. Now it has expanded to the realm of graphics, to the point that neural networks are now generating photo-realistic portraits of fictional humans, based on a model trained on hundreds of thousands of photos of real people. These portraits are of people that do not exist, and they are almost indistinguishable from reality.
Well, that’s just great, I say. But if I fed a computer program the images of a bunch of footballers, could it create fictional Spurs players from another dimension?
I decided to find out.
Prompted, as ever, by the remarkable Janelle Shane and her blog aiweirdness.com, I used a new visual neural network entitled StyleGAN2 and tried to compile it on my nine-year old Macbook Pro. It, uh, didn’t work. But what DID work was a separate program titled Runway, that allows schlubs like me to tap into StyleGAN2 without having to compile anything finicky or use my own processing power. (You can try it yourself — there’s a free trial and subscription options.)
StyleGAN2 works best when you have a large number of photos to start with. I fired up a new Runway project and fed it all of the player headshots from Tottenham’s website, from the first team all the way down to first year scholars — everything I could find. The advantage of using this collection of data is that photos are all pretty homogenous — the players are all wearing the same kit, in the same pose, cropped in generally the same way. Computers find it easier to find patterns when things are structurally the same.
That resulted in about 60 photos or so — this was a paltry number compared the minimum 500 photos you’re supposed to use for best results. To make up the difference, I just cloned the photos a few times to get a decent sized dataset. The repeated photos might have some weird effects on the end product, but who cares — this is SCIENCE.
A few button clicks and we’re on our way! Training the model took about four hours to complete with the standard 3000 iterations, but you get to sample it as it progresses. And along the way, that’s when things started to get weird. Training takes time, and computers don’t get it right the first time. Things generally get better the longer you let the model run, which you can see as the training progresses.
Here’s a sample image from about 25% through the training. Make sure your kids don’t see this.
Oh. Oh god, what is that?! Where are their faces?! Dear Lord in Heaven, WHY?
You can see however how the model is progressing. The computer has already picked up the similarities in the jerseys, recreating readable AIA logos and blobs that could be Tottenham crests. It has picked a median skin tone and even recognizes that photos of human beings have mouths, though it currently thinks they are overly large with giant, slavering teeth. These are Tottenham players that haunt your dreams, and not in the nice “Lucas scored a hat trick in Amsterdam” way.
The good news is that things got better over time. A few hundred more iterations and we get this:
Progress! Those are definitely Tottenham home jerseys, and the players have both recognizable faces and hairstyles. There’s also some variation in both height and skin tone, even if the differences are minor.
I didn’t expect photo-realism from a small dataset, and I didn’t get it. But at the end of the 3000th iteration, I ended up with results that were, frankly, better than I anticipated.
So, some interesting points here — the computer model clearly doesn’t know what to do with eyes or mouths. In fact, the eyes are just... gone, either digitally “closed” or replace with white holes into the vacuous, empty souls of these fake players. The mouths are also more or less the same across each photo, which makes sense because none of the players are smiling in the original photos. Interestingly, the inclusion of the academy and underage teams has pretty noticeably skewed the generated players to look pretty young, probably because the facial features have been homogenized — all of the exported results look more or less like they could play for the U18s.
So the computer has a difficult time with faces, but there’s definitely differences in skin tone and hair style, and the kits are pretty sharp! There are clear variations in kit shape based on the body type of the generated player, but they’re all rendered in pretty fantastic detail. And that’s remarkable since a not-insignificant percentage of the source material included the goalkeepers in their teal keeper kits — none of the results looked anything like a keeper.
Here’s a sampling of the results, in video form.
All right, that’s... something. But could it get better? I decided to continue training the model to see if results improved with more time. And it did... in a way. About an hour into the expanded model training, I hit a high water mark. The photos are obviously not perfect — the computer still is abjectly hostile to the idea of eyes as a concept — but they’re as sharp as could be managed with the small dataset, and whoa, they’re actually recognizable as human. They’re also eerily familiar-looking. They almost want to make you shout FFS Mou, give ‘em a chance in the first team!
I let the training continue for another couple of hours, but unfortunately that’s when things started to go downhill. The thing about machine learning is that model-trained results eventually hit a zenith and then things start to fall apart. Small errors are picked up by the model and enhanced, and you can end up with some seriously wonky things that happen, visually.
So I expected that at some point I’d start to get diminishing returns. I didn’t, however, expect the images to literally catch on fire.
So what happened? Hell (pun intended) if I know. The model learns from repetition and tries to interpret new data based on the rules as it understands it. That means small errors in the results can turn into cascading effects as the simulation goes on. The faces turn green and mesh together. The kits lose the I in AIA. And certain players spontaneously combust.
I don’t know about you, but that feels like a pretty apt metaphor for Tottenham Hotspur’s season, doesn’t it?
I stopped the simulation after this — there wasn’t much point to continuing. The images weren’t going to get any better and would eventually dissolve into something more abstract, and honestly it probably couldn’t get much funnier than Spurs’ players literally going up in flames.
So what have we learned? Well, for starters if we had detailed and similar headshot of every single Tottenham player who has ever played and infinite computing power we could probably train a computer to simulate photo-realistic fake Spurs players. That’d be pretty cool! But we don’t, so we get this instead. And as I sit here, typing on a laptop computer while stuck at home under a coronavirus-induced stay at home order, I’m okay with these results. They may not be perfect, but they made me laugh out loud and grin like a maniac, and that’s something Spurs themselves haven’t made me do in months.