Project DFKI Augmented Vision Image Generation Diffusion Models

Human Image Generation through Multimodal Diffusion Models (Project)

Muhammad Usama, Muhammad Saif Ullah Khan

Abstract

Pose-guided human image generation remains a challenging yet significant task in computer vision. To tackle limitations in current Human-Object Interaction (HOI) datasets, we introduce a HOI dataset containing 29K images with structured object annotations, detailed captions, and diverse interactions. Leveraging this dataset, we propose HOIGEN, a diffusion-based multimodal model capable of generating realistic human-object interaction images conditioned on textual descriptions and object appearance. Extensive benchmarking demonstrates that HOIGEN effectively synthesizes structurally coherent, style-controllable, and photorealistic images, significantly advancing pose-conditioned image generation.

Human Image Generation

Human Image Generation through Multimodal Diffusion Models (Project)

Abstract

Topic

Tasks

Human Image Generation through Multimodal Diffusion Models (Project)

Abstract

Topic

Tasks

Related Literature