Abstract: We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, ...
Abstract: Large image-language models(LLM) have made significant progress in zero-shot anomaly detection(ZSAD), however, the semantic gap between images and text ...