Meet Auraflow: A Truly Open Source AI Image Generator Aiming to Beat Stable Diffusion 3 - Decrypt
07/22/2024 19:55We pitted FAL AI's new Auraflow in a head-to-head test against Stability AI’s SD3. The results might surprise you.
There's a new contender for the title of king of open-source AI image generators: Auraflow. Released last week by the generative media company Fal AI, Auraflow is gaining traction with its standard Apache 2.0 license, which feels like a breath of fresh air compared to the restrictive licensing that Stability AI used to release Stable Diffusion 3 (SD3).
Advocates argue that open-source projects can rapidly speed up development cycles in competitive industries, since it frees developers from licensing and other legal constraints. In the absence of licensing fees, communities frequently form around competent open-source projects, and developers can tweak, modify, train and even profit from their work.
"We are excited to present you [with] the first release of our Auraflow model series, the largest yet completely open-sourced flow-based generation model capable of text-to-image generation," FAL AI said in a blog post. The San Francisco-based company, which was co-founded in 2021 by Burkay Gur and Gorkem Yurtseven—engineers who worked at Coinbase and Amazon respectively—warned that open-source AI is in jeopardy. ”Some even boldly announced that open-source AI is dead,” they said. ”Not so fast!”
During more than four weeks of intensive compute time, Auraflow underwent rigorous training, including a pretraining of images in different sizes, resolutions (256x256, 512x512, and 1024x1024) and aspect ratios (square images, landscapes, portraits, etc). The result? A GenEval score of 0.64, with a boost to 0.703 using a prompt-enhancement pipeline similar to DALL-E 3.
In other words, the model provided high-quality results when tested using synthetic benchmarks. However, as good as it is, Auraflow is still just a beta, as Fal considers it version 0.1 rather than a stable release.
The model is a VRAM eater, though. It requires a beefy GPU with around 12 GB of VRAM to run its fp16 version —Stable Diffusion 3 runs fine on just 6GB VRAM, for reference. However, the company claims that a more manageable model is in the works. “Smaller models or MoE’s might be more efficient for consumer GPU cards, which have a limited amount of compute power, so follow closely for a mini version of [this] model that is still as powerful yet much much faster to run,” Fal AI said.
Auraflow is available for download on Huggingface and can be run in ComfyUI with a custom node also available in the ComfyUI Manager.
Auraflow represents a formidable alternative to SD3, but is it good enough to beat it? We compared the two base models and tested their performances across various art styles and prompts. You can be the judge on who’s most likely to win the hearts of AI artists around the world, as we share our observations.
Art styles and creativity
Prompt: "A detailed painting of a sunset over a tranquil lake, the sky filled with hues of orange, pink, and purple, a wooden pier extending into the water, a person sitting at the end of the pier with a fishing rod, surrounded by tall grasses and wildflowers, the overall style is impressionistic with bold brushstrokes and vibrant colors."
Auraflow:
- Strengths: Captures the impressionistic style well with bold brushstrokes and vibrant colors. The hues of the sky are well-represented, creating a serene atmosphere.
- Weaknesses: The detailing of the person and surrounding nature could be more precise. The wooden pier and person fishing might lack a clear definition. The fishing rod is not presented in a natural position.
SD3 Medium:
- Strengths: Shows high attention to detail, especially in the portrayal of the person and the pier. The overall scene is more structured, with clear elements and refined outlines.
- Weaknesses: The impressionistic style is less pronounced, with the brushstrokes appearing smoother and more photorealistic than intended.
Winner: It's a tie. Auraflow follows the impressionistic style more closely, but SD3 is more detailed and structured.
Realism
Prompt: “A high-resolution photograph of a bustling city street at night, neon signs illuminating the scene, people walking along the sidewalks, cars driving by, a street vendor selling hot dogs, reflections of lights on wet pavement, the overall style is hyper-realistic with attention to detail and lighting, a neon sign says ‘Decrypt.’”
Auraflow:
- Strengths: Captures the vibrant nightlife with neon signs and reflections on wet pavement. The scene is bustling with activity, and the lighting effects are well done.
- Weaknesses: Some details, like the street vendor and pedestrians, are not sharp and look cartoonish, affecting the hyper-realistic quality. The neon signs lack clarity. It has some level of text understanding, but not enough to be trusty. (It says “Decrypt,” next to the hot dog sign, but it’s barely legible.)
SD3 Medium:
- Strengths: Provides a high level of detail and clarity, especially in the depiction of people and objects. The hyper-realistic style is well-achieved with precise lighting and reflections. The neon signs are clear and the text is readable
- Weaknesses: The scene might appear too sterile, lacking the natural chaos of a bustling city street. There is not a street vendor, just the hot dog stand
Winner: SD3 Medium offers a more detailed and hyper-realistic image, making it the better model for this prompt.
Illustration
Prompt: “Hand-drawn illustration of a giant spider chasing a woman in the jungle, extremely scary, anguish, dark and creepy scenery, horror, hints of analog photography influence, sketch.”
Auraflow:
- Strengths: Successfully creates a dark and creepy atmosphere. The hand-drawn style with sketch elements is evident.
- Weaknesses: The level of detail in the spider and woman might be lacking, making the scene less frightening and intense.
SD3 Medium:
- Strengths: Offers a highly detailed and scary portrayal of the spider and the woman. The anguish and horror elements are more pronounced.
- Weaknesses: The analog photography influence is less clear, and the sketch style might be overshadowed by the high level of detail. Some limbs in the spider are unnatural
Winner: SD3 Medium provides a more frightening and detailed illustration, making it the better model for this prompt.
Prompt adherence
Prompt: “A surreal digital artwork of a floating island in the sky, the island covered in lush greenery and waterfalls cascading into the clouds below, a small castle at the center of the island, bridges made of light connecting to other floating islands, the sky is filled with colorful hot air balloons and mythical creatures, the overall style is fantastical with dreamy elements and glowing effects.”
Auraflow:
- Strengths: Captures the fantastical and dreamy elements well, with glowing effects and vibrant colors. The floating island and waterfalls are depicted beautifully. The bridges are made of light and the mythical creatures are represented in the scene
- Weaknesses: Some elements, like the bridges of light and mythical creatures, may lack detail and clarity.
SD3 Medium:
- Strengths: Provides a highly detailed and intricate scene with a more cartoonish look.
- Weaknesses: The prompt adherence was weaker in this generation, it didn’t create bridges made of light, the bridges don’t connect to other islands, and there are no mythical creatures.
Winner: Auraflow captured all the elements in the prompt making it the better model for this prompt.
Spatial awareness
Prompt: “A dog standing on top of a TV showing the word ‘Decrypt’ on the screen. On the left there is a a woman in a business suit holding a coin, on the right there is a robot standing on top of a first aid box. The overall scenery is surreal.”
Auraflow:
- Strengths: Creates a surreal and imaginative scene. The composition and spatial arrangement are interesting.
- Weaknesses: The details of the dog, robot, and woman might be less refined, affecting the overall impact. The cross of the first aid kit leaked into a second box and the robot itself. The text generation was poor.
SD3 Medium:
- Strengths: Provides a highly detailed and clear depiction of all elements. The surreal atmosphere is well-maintained with precise spatial arrangement. The overall scene was less realistic.
- Weaknesses: The scene might appear less imaginative and more literal.
Winner: Tie. Both SD3 Medium and Auraflow provide all the elements of the generation too, and showed a good level of understanding in terms of space comprehension.
Anime and manga
Prompt: ”A female ninja fighting against a strong samurai in ancient Japan, anime, manga, highly detailed, colorful, dynamic.”
Auraflow:
- Strengths: Captures the dynamic and colorful elements of anime and manga well. The action scene is vibrant and engaging. Its style was extremely detailed, more like a cover illustration
- Weaknesses: It lacked adherence, generating only the female ninja and not paying attention to the samurai opponent.
SD3 Medium:
- Strengths: Went for a plain two-dimensional manga style, making the scene lively and dynamic.
- Weaknesses: The colors might be less vibrant, affecting the overall dynamism. It failed to capture the scenery of ancient Japan.
Winner: SD3 Medium provides a more detailed and dynamic depiction, making it the better model for this prompt. Both lacked key elements in terms of prompt adherence.
Conclusion
Auraflow excels in capturing impressionistic, fantastical, and whimsical styles, while SD3 Medium is better at providing detailed, hyper-realistic, and dynamic scenes.
Both models' weaknesses can be tweaked with fine tuning, and this is where law beats tech. Auraflow's Apache 2.0 open source license makes it attractive for fine-tuners, allowing free use, reproduction, and distribution under the license terms, unlike SD3 which is more restrictive in that regard. Therefore, it may be easier to start working on Auraflow. But until then, this is just a strategic advantage that hasn't yet been realized.
However, Auraflow requires a lot of VRAM to run, with some reports indicating up to 35 GB, which is significantly higher than SD3, which requires only 6 GB of VRAM. For reference, a 24GB RTX 4090 costs up to $1700 on Amazon whereas a 6GB RTX3050 capable of running SD3 can be found for less than $200. This is a tangible advantage that SD3 has over Auraflow right now.
Considering this, SD3 Medium is currently the better model in this comparison, serving a broader user base due to its lower hardware requirements and comparable results in terms of quality.
Nonetheless, Auraflow shows great promise. If a pruned (smaller) or quantized (less precise) version is developed in the future that reduces its hardware demands, Auraflow could become a strong contender and potentially challenge Stability's long-standing dominance with its Stable Diffusion models.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.