Cross-sensor remote-sensing SR • Semantic-guided flow matching

Semantic-Guided Cross-Sensor Super Resolution of Remote Sensing Images: A Gated Dual Conditioning Flow Matching Model

RareFlow is a gated dual-conditioning flow-matching model for translating 10 m Sentinel-2 imagery into 2 m Maxar-like imagery while reducing unsupported hallucination.
Forouzan Fallah · Wenwen Li · Chia-Yu Hsu · Hyunho Lee · Anna Liljedahl · Yezhou Yang
Overview
Problem

Cross-sensor SR must be useful, not just sharp

Remote-sensing SR can create images that look realistic but do not match the true landscape. This is risky for rare geomorphic features such as retrogressive thaw slumps, where a plausible texture can still be scientifically wrong.

RareFlow treats the task as target-domain reconstruction: the output should preserve the LR scene layout, add plausible fine detail, and match the target sensor style.

10 m Sentinel-2 input
2 m Maxar-like output
Rare RTS features
Low-resolution input, high-resolution ground truth, and a sharp but wrong super-resolved output.
Motivation: a state-of-the-art SR result can look sharper than the LR input while missing the true morphology visible in the HR reference. An example of a failure case for the state-of-the-art model's super-resolved image is shown in (c). While it appears sharper and more plausible than the low-resolution image in (a), it fails to capture the true morphology of landscape features as shown in the high-resolution image (b).
Abstract

What RareFlow does

High spatial resolution satellite imagery is critical for monitoring fine-scale Earth surface processes, but is often limited by cost and revisit time. This work studies cross-sensor super-resolution (SR) to reduce this gap by translating 10 m Sentinel-2 imagery into 2 m Maxar-like imagery in a data-scarce, domain-shifted setting, with a focus on rare geomorphic features such as retrogressive thaw slumps (RTS). We propose RareFlow, a semantic-guided generative AI framework for cross-sensor super-resolution based on a flow-matching formulation, designed to produce visually plausible and physically reliable high-resolution images.

RareFlow uses dual conditioning to guide the generation process: (1) a gated ControlNet that preserves scene geometry from low-resolution (LR) input, and (2) text-based semantic guidance that injects contextual information when the target phenomenon is rare. To ensure high-fidelity outputs, we introduce a multifaceted loss function that anchors the output to the high-resolution (HR) ground truth by jointly enforcing frequency alignment, perceptual similarity, and color consistency. RareFlow's performance is systematically evaluated on a newly curated benchmark of multi-sensor satellite imagery for rare Earth feature detection, and its generalizability is demonstrated on two public remote sensing benchmarks, SEN2NAIP and BreizhSR. Human evaluation with domain experts is also conducted to further verify RareFlow's effectiveness in generating high-fidelity super-resolved images for scientific analysis.

LR input SR (RareFlow) from LR input
Drag to compare: Sentinel-2 LR input → RareFlow super-resolved output.
HR downsampled to LR scale SR (RareFlow) from HR↓ reference
Drag to compare: HR downsampled to LR scale → RareFlow super-resolved output.
Contributions

Key contributions

Gated dual conditioning

RareFlow balances semantic guidance from text with observation guidance from the LR image through learned alpha gates.

Consistency-guided objective

The loss combines flow-matching regression with FFT frequency alignment, CIELAB color consistency, and LPIPS perceptual similarity.

Rare-feature benchmark

The paper builds a multi-sensor RTS benchmark using Sentinel-2 LR inputs and Maxar HR targets under real sensor and time mismatch.

Generalization tests

RareFlow is also evaluated on BreizhSR and SEN2NAIP to test transfer beyond the Arctic RTS setting.

Method

Gated dual-conditioning flow matching

RareFlow keeps the VAE and SD3 MM-DiT backbone frozen. A trainable ControlNet consumes the aligned LR latent and emits residual features for selected backbone blocks. Learned alpha gates scale these residuals before injection, so the model can decide how strongly to trust LR structure at each block.

During training, the HR target is used to create the clean latent and compute consistency losses. During inference, the HR branch is removed; RareFlow only uses the LR image and a text prompt to sample the SR output.

Overview of the RareFlow framework with input data, gated ControlNet, frozen SD3 backbone, and consistency-guided objective.
Framework overview: LR latents and semantic guidance steer a frozen SD3 backbone through a trainable ControlNet and alpha-gate mechanism.
Technical architecture of RareFlow showing dual conditioning, SD3 MM-DiT blocks, and ControlNet MM-DiT blocks.
Technical architecture: learned gates scale ControlNet residuals before they are added to frozen backbone features.
Comparison of LR and HR images and their FFT spectra.
Frequency motivation: HR targets contain stronger mid- and high-frequency content, motivating the FFT alignment loss.
Main results

RareFlow improves realism while keeping structure

116.16
FID, best; 37.94% lower than the best baseline.
3.86
SAM, best; stronger spectral consistency.
0.59
SSIM, best on paired LR-HR data.
0.36 / 0.30
LPIPS / DISTS, best perceptual similarity.
ModelPSNR ↑SSIM ↑SAM ↓LPIPS ↓DISTS ↓FID ↓NIQE ↓MANIQA ↑
ZoomLDM17.230.2612.960.600.59352.1118.100.19
SeeSR18.780.5012.260.460.38302.3610.780.36
AdcSR18.590.5812.310.400.37187.188.380.28
MISR-S218.390.5012.720.540.43254.7013.550.33
SAMSR18.360.5412.800.480.39189.0111.840.32
OpenSR17.290.5112.590.410.36225.629.800.25
RareFlow18.760.593.860.360.30116.165.360.31
Qualitative comparison across LR, HR, baselines, and RareFlow.
Qualitative comparison: RareFlow better matches Maxar-like detail and style than the listed baselines.
Ablation

The full model combines semantic guidance, consistency-guided training, and alpha-gated structural conditioning. This combination gives the strongest overall result: the model achieves the best SSIM, FID, and NIQE, and remains second-best or tied second-best on SAM, LPIPS, DISTS, and MANIQA. These results suggest that the components are complementary. Pre-trained ControlNet supports structural consistency, captions improve target-domain realism, the consistency-guided objective stabilizes caption-guided generation, and the alpha-gate helps balance conditioning strength.

Visual comparison of RareFlow ablation variants.
Visual comparison of RareFlow ablation variants.
Data

RTS benchmark and data challenges

The benchmark pairs Sentinel-2 Level-1C imagery with Maxar imagery to learn a 10 m to 2 m cross-sensor mapping for retrogressive thaw slump regions across Arctic sites.

The setting is hard because LR-HR pairs can be spatially shifted, temporally mismatched, cloud affected, very small, and limited in number. The paper reports roughly 800 training images.

Spatial mismatch
Temporal mismatch
Cloud occlusion
Limited data
Examples of spatial detail mismatch, temporal misalignment, and cloud occlusion.
Examples of data challenges. From left to right, the columns illustrate: 1) spatial detail misalignment; 2) temporal misalignment (snow is present in HR but not LR); and 3) cloud occlusion in the HR image.
Human evaluation

Domain experts preferred RareFlow over baselines

Human Evaluation Spaces

Experts scored outputs from 1 to 10 with attention to RTS boundaries. RareFlow reached a mean score of 4.6, above AdcSR at 3.2, SeeSR at 2.5, and the LR input at 2.3. HR reference remained the upper bound at 6.3.

Reported agreement was substantial: Krippendorff's alpha = 0.736, Kendall's W = 0.841, and Spearman's rho = 0.673.

Expert evaluation bar chart showing scores for SeeSR, LR, AdcSR, RareFlow, and HR.
Expert scores show RareFlow narrowing the gap between LR input and HR reference.
Generalization

Tests beyond the RTS benchmark

RareFlow was trained and tested on two public cross-sensor SR benchmarks to evaluate transfer beyond Arctic thaw slump imagery.

DatasetMethodPSNR ↑SSIM ↑FSIM ↑LPIPS ↓DISTS ↓FID ↓NIQE ↓MANIQA ↑
BreizhSRRareFlow11.300.180.530.520.31245.307.950.32
BreizhSRMISR-S211.280.230.540.630.36254.109.360.22
SEN2NAIPRareFlow14.440.290.570.590.32214.506.230.19
SEN2NAIPOpenSR12.500.240.610.610.36230.956.760.20
Citation

Cite

@article{fallah2025rareflow, title={RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features}, author={Fallah, Forouzan and Li, Wenwen and Hsu, Chia-Yu and Lee, Hyunho and Yang, Yezhou}, journal={arXiv preprint arXiv:2510.23816}, year={2025} }