Tianao (Owen) Zeng

2026

OmniPaint Reproducibility Study for Object-Oriented Diffusion Editing

An ICCV 2025 reproducibility study of OmniPaint, validating removal metrics on the public benchmark and exposing insertion benchmark sensitivity with public substitute data.

  • generative models
  • diffusion
  • reproducibility
  • image editing

Overview

Reproduced the main numerical claims from OmniPaint, a FLUX-based diffusion framework for object removal and insertion using CycleFlow and a reference-free removal metric called CFD.

Problem

Object-oriented diffusion editing papers depend heavily on benchmark construction, mask quality, inference settings, and private evaluation data. The study separates claims that are independently reproducible from claims that cannot be directly verified with public assets.

What I Built

  • A calibrated removal reproduction on the released 300-sample OmniPaint benchmark, matching the paper's evaluation resolution and metric suite.
  • A public insertion substitute benchmark using MS-COCO val2017 backgrounds, real COCO instance masks, self-annotated placement masks, and DreamBooth reference subjects.
  • A comparison against Paint-by-Example, AnyDoor, FreeCompose, and ObjectStitch using identity, perceptual, and no-reference quality metrics.
  • Ablations over inference steps and mask-quality variants to understand when reported advantages appear most clearly.

Technical Stack

  • Python
  • PyTorch
  • FLUX.1-dev
  • LoRA
  • MS-COCO
  • DreamBooth
  • CLIP
  • DINOv2
  • LPIPS
  • FID
  • MUSIQ
  • MANIQA

Results / Outcomes

  • Reproduced 6 of 7 reported removal metrics within 15 percent on the public benchmark, supporting the paper's core removal claim.
  • Identified LPIPS as the major outlier, likely due to a ground-truth resolution mismatch and LPIPS sensitivity to resampling artifacts.
  • Found that the original 565-sample insertion benchmark is not publicly released, so the insertion ranking cannot be directly verified from public data.
  • On the public substitute insertion benchmark, OmniPaint did not always rank first, suggesting the insertion result is benchmark-dependent rather than universally reproducible.