Guided text-to-image generation deals with the task of generating images from textual descriptions while constraining the output generation using a set of references. Typically, this reference consists of a set of RGB images, depicting the desired characteristics of the generated image. Motivation for this may include sketch-to-image [1], style transfer [2], character consistency [3], etc. However, it is relatively difficult to maintain similar conformity to a reference when it is provided in the form of text. In this work, I explore guided generation of images from textual descriptions with in-context reference representation. To demonstrate the effectiveness of the approach, I present a series of images generated using DALL·E 3 [4] and showcase its ability to retain character consistency and scene genre preservation across multiple generations.
The following catalogue primarily presents a key frame storyboard in cinematic style for a short story of one particular genre -- horror. Additionally, a curated set of images generated with a different style (cartoons) and an alternate genre (Rom-Com, Indo-Western) are also showcased. Last but not the least, keeping in line with the theme of the content, a few blooper outputs are also included in this list. The album is divided into multiple sections, each containing a set of images generated from a single prompt. Prompts are designed to contain (a) thematic description: style of the image and desired genre, (b) scene description: background, foreground, etc., and (c) detailed character descriptions: facial features, attire, pose, etc.
In this work, two protagonist characters and a key supporting character are used. The two protagonists are described as two university students -- a male and a female. The male character, Bob, is detailed out as a tall, lean, and fair-skinned individual with a sharp jawline who wears a yellow t-shirt and denims. He also carries a blue bag as an accessory. The female character, Alice, is mentioned in the prompt as having a dual-toned hair (blue and red) and a fair complexion. She dons a gray hoodie.
"The duo returns to the university building as Bob has forgotten to collect an essential item; Alice is annoyed and decides to wait outside the building."
"Bob looks for the item in his locker using his phone as a flashlight, when he hears a noise coming from the classroom."
"Bob is shocked to see himself and Alice taking the test along with others in the darkness of the classroom."
"After listening to Bob's ridiculous story, Alice decides to investigate as she does not believe a word."
"Bob faces stark horror as the light he assumed to be coming from the watchman's torch blurs his vision; he feels the presence of a ghostly woman."
"He looks out for some sanity in Alice but his petrified as he now faces an Alice (or her doppelganger?) whose dark-covered face bears an unsettling grin."