BlenderAlchemy

Abstract

Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with imagined reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.

Editing 3D Graphics as Visual Program Refinement

To perform edits within the Blender 3D design environment, BlenderAlchemy iteratively refines a program that defines a sequence of edits within Blender. This is done using our visual program refinement procedure, which is composed of an edit generator G and a state evaluator V, which iteratively generates and selects among different edit hypotheses, respectively. Both the edit generator and the state evaluator are guided by an input user intention, specified using a combination of language and reference images, either provided or hallucinated using an text-to-image generator within the Visual Imagination module. At each step, we allow for the system to revert back to the edit hypothesis from a previous iteration.

In the context of the material editing task, consider the task of transforming a wooden procedural material into marbled granite. The following is an illustrative sample of a sequence of edit generation and state selection steps.

Materials

Using this system, we can edit procedural materials using language descriptions. We show a few samples below, edited based on the wooden material on the left.

Below, we show the application of a set of materials synthesized by BlenderAlchemy on a diverse set of scenes based off of assets created by 3D artists. BlenderAlchemy is capable of producing usable materials guided by language descriptions and also generating variations of the same kind of material.

Old metal (Original)

Ice slats

Surface of the sun

The code edits synthesized by BlenderAlchemy represent changes to the procedural material graph of the input procedural material, ranging in changes in continuous values, node connectivity, and node types. Take the following example of editing a procedural wood material (top) into marbled granite (bottom), using the language description shown below:

Lighting

Since BlenderAlchemy works by editing programs, BlenderAlchemy can also change lighting configurations within scenes, since the parameters of each lighting source can be programmatically represented. Using the same method as for material editing, we can iteratively synthesize lighting setups that match a certain language description by cycling between automatically generating candidates of lighting candidates and selecting among them.

Employing BlenderAlchemy iteratively between optimizing lighting and materials allow a user to tweak both in the input scene to satisfy their desired intention.

Citation

If you found the paper or code useful, please cite:

@misc{huang2024blenderalchemy,
    title={BlenderAlchemy: Editing 3D Graphics with Vision-Language Models}, 
    author={Ian Huang and Guandao Yang and Leonidas Guibas},
    year={2024},
    eprint={2404.17672},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

BlenderAlchemy Editing 3D Graphics with Vision-Language Models

Abstract

Editing 3D Graphics as Visual Program Refinement

Materials

Lighting

Citation

BlenderAlchemy
Editing 3D Graphics with Vision-Language Models