Local-Global Fusion Network for Video Super-Resolution
The goal of video super-resolution technique is to address the problem of effectively restoring high-resolution (HR) videos from low-resolution (LR) ones. Previous methods commonly used optical flow to perform frame alignment and designed a framework from the perspective of space and time. However, inaccurate optical flow estimation may occur easily which leads to inferior restoration effects. In addition, how to effectively fuse the features of various video frames remains a challenging problem. In this paper, we propose a Local-Global Fusion Network (LGFN) to solve the above issues from a novel viewpoint. As an alternative to optical flow, deformable convolutions (DCs) with decreased multi-dilation convolution units (DMDCUs) are applied for efficient implicit alignment. Moreover, a structure with two branches, consisting of a Local Fusion Module (LFM) and a Global Fusion Module (GFM), is proposed to combine information from two different aspects. Specifically, LFM focuses on the relationship between adjacent frames and maintains the temporal consistency while GFM attempts to take advantage of all related features globally with a video shuffle strategy. Benefiting from our advanced network, experimental results on several datasets demonstrate that our LGFN can not only achieve comparative performance with state-of-the-art methods but also possess reliable ability on restoring a variety of video frames. The results on benchmark datasets of our LGFN are presented on https://github.com/BIOINSu/LGFN and the source code will be released as soon as the paper is accepted.
PDFResults from the Paper
Ranked #4 on Video Super-Resolution on MSU Video Upscalers: Quality Enhancement (VMAF metric)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Video Super-Resolution | MSU Super-Resolution for Video Compression | LGFN + x265 | BSQ-rate over ERQA | 13.213 | # 57 | |
BSQ-rate over VMAF | 1.341 | # 27 | ||||
BSQ-rate over PSNR | 6.646 | # 38 | ||||
BSQ-rate over MS-SSIM | 1.533 | # 31 | ||||
BSQ-rate over LPIPS | 11.399 | # 61 | ||||
Video Super-Resolution | MSU Super-Resolution for Video Compression | LGFN + x264 | BSQ-rate over ERQA | 1.704 | # 13 | |
BSQ-rate over VMAF | 0.744 | # 11 | ||||
BSQ-rate over PSNR | 1.151 | # 9 | ||||
BSQ-rate over MS-SSIM | 0.77 | # 12 | ||||
BSQ-rate over LPIPS | 1.324 | # 17 | ||||
Video Super-Resolution | MSU Super-Resolution for Video Compression | LGFN + vvenc | BSQ-rate over ERQA | 18.342 | # 76 | |
BSQ-rate over Subjective Score | 2.944 | # 43 | ||||
BSQ-rate over VMAF | 1.626 | # 38 | ||||
BSQ-rate over PSNR | 5.768 | # 24 | ||||
BSQ-rate over MS-SSIM | 0.889 | # 18 | ||||
BSQ-rate over LPIPS | 11.759 | # 65 | ||||
Video Super-Resolution | MSU Super-Resolution for Video Compression | LGFN + aomenc | BSQ-rate over ERQA | 14.631 | # 65 | |
BSQ-rate over VMAF | 1.99 | # 44 | ||||
BSQ-rate over PSNR | 9.79 | # 52 | ||||
BSQ-rate over MS-SSIM | 4.321 | # 52 | ||||
BSQ-rate over LPIPS | 5.536 | # 47 | ||||
Video Super-Resolution | MSU Super-Resolution for Video Compression | LGFN + uavs3e | BSQ-rate over ERQA | 9.279 | # 42 | |
BSQ-rate over VMAF | 1.625 | # 37 | ||||
BSQ-rate over PSNR | 5.503 | # 18 | ||||
BSQ-rate over MS-SSIM | 2.427 | # 42 | ||||
BSQ-rate over LPIPS | 4.504 | # 38 | ||||
Video Super-Resolution | MSU Video Super Resolution Benchmark: Detail Restoration | LGFN | Subjective score | 6.505 | # 6 | |
ERQAv1.0 | 0.74 | # 5 | ||||
QRCRv1.0 | 0.629 | # 3 | ||||
SSIM | 0.898 | # 4 | ||||
PSNR | 31.291 | # 4 | ||||
FPS | 0.667 | # 19 | ||||
1 - LPIPS | 0.903 | # 10 | ||||
Video Super-Resolution | MSU Video Upscalers: Quality Enhancement | LGFN | PSNR | 27.42 | # 33 | |
SSIM | 0.939 | # 47 | ||||
VMAF | 57.79 | # 4 |