Why rely on expensive, labor-intensive annotations when AI can learn crop detection from just a few photos? This paper turns Grounding-DINO into a fast, prompt-free few-shot learner for agriculture.
The paper introduces a lightweight, few-shot adaptation of the Grounding-DINO open-set object detection model, tailored explicitly for agricultural applications. The method eliminates the text encoder (BERT) and uses randomly initialized trainable embeddings instead of hand-crafted text prompts, enabling accurate detection from minimal annotated data.
High-performing agricultural AI often demands large, diverse annotated datasets, which are expensive and time-consuming. This method rapidly adapts a powerful foundation model to diverse agricultural tasks using only a few images, reducing costs and accelerating model deployment in farming and phenotyping scenarios.
This work presents a scalable and cost-effective way to deploy deep learning in agriculture, even with limited data. It demonstrates how foundation models can be tailored to real-world domains like plant counting, insect detection, fruit recognition, and remote sensing, making AI more accessible and valuable for sustainable and efficient farming practices.