VERO: Ryan, let’s go back in time for a minute. The year is 2021, OpenAI has just open sourced CLIP and then you go and publish this totally groundbreaking open source colab notebook which allows people to…actually use it! You fueled this whole emergent ecosystem of independent, open source innovation in an entirely new domain. Can you speak a little about your experience?
RYAN: When CLIP came out, I wanted to know what it was “focusing” on, so I started to probe neurons using a method similar to what DeepDream did where you optimize an image to be “exciting” to specific parts of a neural network. Eventually, I realized I could use the approach to optimize the match between the CLIP image and text encoders’ outputs, which allows us to generate images from essentially any text with CLIP. Back in the day most models had very specific domains of what they could generate (like just faces or just certain classes from ImageNet) so this was pretty radical, considering that all we’d seen at that point were just a few demo images from the original DALL-E!
I open sourced the notebook for that and several others down the line, and it was a really special time for me. I loved getting to see how people were using it and creating their own tools in what really felt like an explosive few years. The community back then (and still in some places now, but it’s different when things become an industry — in some areas at least) was so happy to share, and there was so much excitement about the potential. I’m glad that this open spirit still exists in a lot of strains and places. And I really do think there was some phenomenal art being made — a lot of it by people who had been doing ML for years but also a fair amount by people who had a background in the humanities or writing who could leverage their expertise into a whole new modality-crossing medium.
VERO: Any fun anecdotes to share?
RYAN: One random anecdote that I always come back to was when I wanted to see what the notebooks would do with an impossible or unlikely image (I was prompted by Janelle Shane asking GPT-2 to identify how many eyes a horse has — and GPT-2 had no idea, saying everything from one to ten eyes), so I typed in “a horse with four eyes” expecting some kind of monstrosity. Instead the model produced an image of a horse wearing glasses, which I thought was delightful.
But it really nailed home to me that these models (as Ted Underwood likes to say) don’t just model text or images in a vacuum; they model culture. So I think pretty often about what these models know and what opinions they advance — in ways that can be charming or insidious.
VERO: Let’s talk a bit about your exploration last summer at Stochastic. How would you describe what you were/are working on? What motivates this project for you?
RYAN: Last summer at Stochastic I looked at a few projects, but my favorite right now was focused on personalized preference learning for image generation — I’m actually planning to share a little blog post summing up that thread soon! The general idea in its current form is to synthesize work in generative ML and recommendation systems to create a system that can take in user interactions with media at scale and generate new media for specific users based on those interactions. This is similar to what Joel Simon talks about in some ways: trying to avoid this sort of one-size-fits-all approach to model aesthetics in favor of fitting niches of people with shared interests and stylistic senses.
I’m imagining some of these types of systems can and will look a lot like TikTok (though they could exist for images, text, music, etc.) but instead of allowing for just algorithmic distribution, they’d also allow for algorithmic generation as well. Which all frankly looks fairly bleak & dystopic! Maybe it’s a bit fatalistic, but I think that getting ahead of ideas like this before they’re deployed (if that does happen) and providing some openness is probably preferable to the alternative of them still rolling out but in the form of corporate black boxes.
I’ve done some explorations in my own practice where I’ve focused on being in-the-loop in a system that takes in interactions (usually with a yes/no or 1-to-10 score) and produces images based on those interactions, which are then interacted with and fed back in, over and over, and it’s a kind-of weird experience. I feel like I’ve genuinely made some images that are specifically dazzling to me, and I’m still digesting whether I think the process is artistic and fulfilling or just wireheading.
VERO: You’ve mentioned previously that this kind of personalized preference learning for image generation really needs to be done thoughtfully in order to avoid ending up with an incredibly narrow system in which you can essentially only create what you’ve already liked (you have a fun metaphor for this: “the artistic equivalent of drinking sugar water”) and moreover, to actually empower our deeper sense of exploration. Do you have any specific instincts or insights about what doing this work “thoughtfully” might entail?
RYAN: I think that doing it thoughtfully really requires the right incentive structures, mostly! If a company does it, they will probably try to maximize engagement, and it’ll be a time-suck at best. But I think if people do it for themselves, there’s a good chance it could be really interesting.
VERO: What has been the value of open source in the evolution of this stuff, and what role might the open source community fill in the future?
RYAN: I think that one way to approach it is considering what the area would look like without open source. In my opinion we’d have pretty much all of the same or similar concerns over economics, social impacts, etc. — all of which I think should be taken seriously — and we’d also be paying $22.99 per month for them.
I also think we’d have much less performative tech with worse biases (academic labs, for example, have done so much important work here that isn’t really possible behind an API.) People really underestimate how important accessibility is. So I think that the role here has been huge for shaping what this technology is and what it means, and I hope that we’ll continue to value that going into the future!
VERO: On a personal level, what has open source meant to you?
RYAN: I think that what’s really beautiful to me is seeing people work on something because they find it intrinsically interesting or engaging without any guarantee of personal gain. I feel really lucky that I was in a place where I had time and energy and space to do something that I wasn’t sure would ever come back. Getting to do that in an in-person community setting like Stochastic is just a joy.