Human Image Generation

sellProject sellDFKI sellAugmented Vision sellImage Generation sellDiffusion Models

Human Image Generation through Multimodal Diffusion Models (Project)

Abstract

Pose-guided human image generation remains a challenging yet significant task in computer vision. To tackle limitations in current Human-Object Interaction (HOI) datasets, we introduce a HOI dataset containing 29K images with structured object annotations, detailed captions, and diverse interactions. Leveraging this dataset, we propose HOIGEN, a diffusion-based multimodal model capable of generating realistic human-object interaction images conditioned on textual descriptions and object appearance. Extensive benchmarking demonstrates that HOIGEN effectively synthesizes structurally coherent, style-controllable, and photorealistic images, significantly advancing pose-conditioned image generation.

Topic

Tasks

  • [1]
  • [2]
  • [3]
  • [4]