BannerAgency: Advertising Banner Design with Multimodal LLM Agents

Sony Group Corporation

Abstract

Advertising banners are critical for capturing user attention and enhancing advertising campaign effectiveness. Creating aesthetically pleasing banner designs while conveying the campaign messages is challenging due to the large search space involving multiple design elements. Additionally, advertisers need multiple sizes for different displays and various versions to target different sectors of audiences. Since design is intrinsically an iterative and subjective process, flexible editability is also in high demand for practical usage.

While current models have served as assistants to human designers in various design tasks, they typically handle only segments of the creative design process or produce pixel-based outputs that limit editability.

This paper introduces a training-free framework for fully automated banner ad design creation, enabling frontier multimodal large language models (MLLMs) to streamline the production of effective banners with minimal manual effort across diverse marketing contexts. We present BannerAgency, an MLLM agent system that collaborates with advertisers to understand their brand identity and banner objectives, generates matching background images, creates blueprints for foreground design elements, and renders the final creatives as editable components in Figma or SVG formats rather than static pixels.

To facilitate evaluation and future research, we introduce BannerRequest400, a benchmark featuring 100 unique logos paired with 400 diverse banner requests. Through quantitative and qualitative evaluations, we demonstrate the framework's effectiveness, emphasizing the quality of the generated banner designs, their adaptability to various banner requests, and their strong editability enabled by this component-based approach.

Method

Interpolate start reference image.

Given a logo and request, BannerAgency begins with the Strategist analyzing banner objectives, followed by the Background Designer creating matching backgrounds, then the Foreground Designer producing element blueprints, and concludes with the Developer rendering the final design as editable components. With access to external knowledge, tool-calling capabilities, and shared memory, BannerAgency enables context-aware, harmonized decisions and supports multiple banner sizes if requested.

Editability

Interpolate start reference image.

BannerAgency supports direct editability in Figma ecosystem, enabling users to modify the generated designs with ease. The system generates the final designs as editable components in Figma or SVG formats, allowing users to adjust the designs to their preferences, such as changing the text, colors, or layout. This component-based approach enhances the flexibility and usability of the generated banners, enabling advertisers to tailor the designs to their specific needs and preferences.

Different Aesthetic Styles

Interpolate start reference image.

BannerAgency is capable of generating banners in various aesthetic styles to cater to different marketing contexts and target audiences. By leveraging multimodal large language models, BannerAgency can adapt to different aesthetic styles to create banners that resonate with the brand identity and message of the advertiser. This cross-genre aesthetic adaptation enables BannerAgency to produce diverse and visually appealing banner designs that effectively communicate the advertising message to the target audience.

Different Languages

Interpolate start reference image.

BannerAgency is capable of generating banners in multiple languages to cater to diverse audiences across different regions and markets. By leveraging the multilingual capabilities of multimodal large language models, BannerAgency can adapt the text content of the banners to different languages while maintaining the overall design and layout. This cross-cultural linguistic adaptation enables BannerAgency to create banners that effectively communicate the advertising message to audiences in different linguistic backgrounds.

BibTeX

@article{wang2025banneragency,
  author    = {Wang, Heng and Shimose, Yotaro and Takamatsu, Shingo},
  title     = {BannerAgency: Advertising Banner Design with Multimodal LLM Agents},
  journal   = {arXiv preprint arXiv:2503.11060},
  year      = {2025},
}