nahbaste

new media artist

rtnv

May 2023

“rtnv” (short for Art Nouveau) is a custom diffusion model, fine-tuned from Stable Diffusion 1.5. In this article, I explain the process and the possible applications that I think will change the creative industries in the upcoming years.

Dataset & Training

With a predefined visual aesthetic in mind, the initial step of the process involved generating images for training the model. To accomplish this, I utilized Midjourney, providing style references from five artists: Alphonse Mucha, Rebecca Guay, James Gilleard, Jean Giroud, and Josan Gonzalez.


It is useful that the references provided not only share common visual elements, but also thematic elements and conceptual references. Later on, we will see how this affects image generation using this model.


To develop this visual aesthetic, I started with some visual references, and rapidly moved into writing. I wrote a set of stories that helped me develop the boundaries of this world: how things behave, what kind of events happen in this world, why is that so. All of this is incredibly helpful to figure out what does the world look like.


After significant effort, I curated a dataset comprising 50 images. Presented below are 10 of these images, serving as illustrative examples.

text
text
text
text
text
text
text
text
text
text

Testing the model

In order to train the model using Deforum, it is useful to have detailed captions describing each image in the dataset. This is done so that the model incorporates the visual style, without linking it to any random object present in the image. While this process can be automated using tools like CLIP, I find that personally reviewing and writing captions yields better results.


Following the training, here is a selection of images generated by the model.

text
text
text
text
text
text
text
text
text
text
text
text
text

Implementing ControlNet

As you can see, the images that are close in theme with the dataset are closer as well to the desired style. You might be wondering why would I train a model to replicate images from Midjourney, when I can generate consistent outputs just by using Midjourney.


The answer is that now that I have trained my model, I can use ControlNet to apply it to existing images. This has enormous potential for cinema and animation pipelines. For example, les pick a frame from the storyboard of “The Grand Budapest Hotel”:

text

And now we can apply our style to this frame. The results are displayed below. The initial images were generated using the rtnv model, while the subsequent ones were produced using other custom models.

text
text
text
text
text
text
text
text

Beyond the obvious use cases of creating styling references and moodboarding, this has a lot of potential applications, not only for exploring visual styles and theme variations. Models can be trained to perform cleanup in animation files, for example. We can also train models to colour scenes, maintaining consistency for each character or background.


I really think these tools will revolutionize the industry moving forward, specially as new models are developed and we are able to achieve higher levels of consistency frame to frame.

©2024 Nahuel Basterretche