Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

SHF: Symmetrical Hierarchical Forest with Pretrained Vision Transformer Encoder for High-Resolution Medical Segmentation

Conference ·
OSTI ID:3009458
This paper presents a novel approach to addressing the long-sequence problem in high-resolution medical images for Vision Transformers (ViTs). Using smaller patches as tokens can enhance ViT performance, but quadratically increases computation and memory requirements. Therefore, the common practice for applying ViTs to high-resolution images is either to: (a) employ complex sub-quadratic attention schemes or (b) use large to medium-sized patches and rely on additional mechanisms within the model to capture the spatial hierarchy of details. We propose Symmetrical Hierarchical Forest (SHF), a lightweight approach that adaptively patches the input image to increase token information density and encode hierarchical spatial structures into the input embedding. We then apply a reverse depatching scheme to the output embeddings of the transformer encoder, eliminating the need for convolution-based decoders. Unlike previous methods that modify attention mechanisms or use a complex hierarchy of interacting models, SHF can be retrofitted to any ViT model to allow it to learn the hierarchical structure of details in high-resolution images without requiring architectural changes. Experimental results demonstrate significant gains in computational efficiency and performance: on the PAIP WSI dataset, we achieved a 3∼32×speedup or a 2.95%∼7.03% increase in accuracy (measured by Dice score) at a 64K2 resolution with the same computational budget, compared to state-of-the-art production models. On the 3D medical datasets BTCV and KiTS, training was 6×faster, with accuracy gains of 6.93% and 5.9%, respectively, compared to models without SHF.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
3009458
Resource Type:
Conference paper/presentation
Conference Information:
39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) - San Diego, California, United States of America - 12/2/2025-12/7/2025
Country of Publication:
United States
Language:
English

Similar Records

Adaptive Patching for High-resolution Image Segmentation with Transformers
Conference · Fri Nov 01 00:00:00 EDT 2024 · OSTI ID:2480031

A comparison of histopathology imaging comprehension algorithms based on multiple instance learning
Conference · Sat Apr 01 00:00:00 EDT 2023 · OSTI ID:1969817

ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling
Conference · Sat Nov 01 00:00:00 EDT 2025 · OSTI ID:3007902

Related Subjects