Résumé:
In ResDiff, we introduce a groundbreaking method for image super-resolution that harnesses the strengths of two complementary techniques : the convolutional network ESPCN and the probabilistic diffusion model DDPM. Unlike conventional approaches that directly process low-resolution (LR) images, ResDiff utilizes a two-phase method. First, ESPCN creates an initial high-resolution (HR) image, concentrating on the reconstruction of low-frequency components. Next, DDPM enhances this initial output by reintroducing the missing high-frequency details through a residual process. This combined strategy not only improves the overall quality of the reconstructed image but also provides a more precise depiction of complex details. This refinement is performed using a conditional U-Net, guided by the upsampled LR image, injected noise, and the diffusion timestep, all encoded through dedicated embeddings. Additionally, ResDiff incorporates a guided optimization strategy based on a hybrid loss function (MSE + FFT + DWT), applied within the ESPCN. This guidance brings external analytical features to the learning process, combining pixel-level supervision, global frequency awareness, and multi-scale structural cues. As a result, ResDiff generates visually faithful, high-quality images while maintaining a controlled computational complexity.