Spaceborne inspection systems face a fundamental constraint: perception models are deployed before launch, and updating weights or expanding label sets once a satellite is in orbit is operationally impractical. A new research paper from authors including Welsh, Nicholas A, and Shikhman, Lennon J, among others, investigates whether prompt-driven vision-language models (VLMs) can overcome this limitation by enabling new semantic capabilities to be specified via natural-language prompts without any modification to onboard model weights.
According to the paper, published on arXiv, the researchers evaluated zero-shot instance segmentation of spacecraft components under a strictly frozen, single-pass inference protocol on a test set of 129 images of previously unseen satellites. Using the model SAM3, with fixed global thresholds and no post-processing, the system achieved 0.385 mAP@$0.5$ and 0.267 mAP@$0.5{:}0.95$.
Scale-Dependent Performance
Performance was strongly dependent on the physical size of the target component. Large structural elements localized reliably, while small appendages proved difficult:
| Component | Average Precision @0.5 |
|---|---|
| Spacecraft bodies | 0.639 |
| Solar arrays | 0.598 |
| Antennas | 0.221 |
| Thrusters | 0.081 |
The table, derived from the paper's reported results, shows that zero-shot segmentation of fine-scale components remains challenging under orbital domain shift.
Prompt Engineering Impact
The formulation of the natural-language prompt significantly affects accuracy. The researchers found that structured prompts incorporating spatial and geometric descriptors yielded up to 82% improvement over short category-name prompts. This finding suggests that careful prompt design is critical for maximizing the utility of post-launch capability expansion.
Hardware Feasibility
The model operates within the memory and compute envelope of contemporary embedded GPUs, according to the paper. This indicates that prompt-driven grounding can provide a practical mechanism for post-launch semantic extension of dominant spacecraft structures without exceeding the limited computational resources available on orbit.
Limitations and Outlook
While the approach shows promise for large components, the performance on small appendages like antennas (0.221 AP@$0.5$) and thrusters (0.081 AP@$0.5$) highlights the limitations of zero-shot localization for fine-scale components. The authors conclude that prompt-driven grounding enables post-launch semantic expansion for dominant structures but note the need for further work to address challenges in detecting smaller elements under orbital conditions.
For enterprise technology decision-makers, this research demonstrates that prompt-driven VLMs can expand the capabilities of deployed AI systems without costly retraining or model updates — a valuable capability in any environment where post-deployment access is limited, whether in space, remote infrastructure, or hard-to-reach industrial assets.