Dr. Adriana Kovashka just received a new award from the National Science Foundation (through the Robust Intelligence program), entitled Domain-robust object detection through shape and context.
Computer vision has made great advancements in object recognition and detection, but performance drops significantly when the data used at training and deployment time are very different. This is problematic because in many situations, it may be infeasible to retrain the models on a large example set in the domain of interest. For example, artificial intelligence (AI) tools may be developed in one country or region, using that region?s training data, and exported to regions with limited resources to collect new data and retrain models. Unfortunately, the visual environment in the user region may be different from the developer region: some vehicles in India look different from common vehicles in the US; houses often feature bricks on the US East Coast but less frequently on the West Coast; environmental factors (e.g., foliage and smog) may cause models to behave differently. Being robust to domain shifts is important for interpretability and trust when computer vision systems are employed in practice. This project leverages the observation that while the pixels of captured objects change when these objects are shown in different domains (e.g., photographs vs paintings), the overall shape of the objects remains the same. Further, the set of objects that co-occur with the object of interest is also relatively consistent across domains.
This project develops new visual representations that capture two global cues: shape and context. While numerous domain adaptations and generalization techniques exist, they have overlooked global cues that can potentially be more robust to domain shifts, based on preliminary experiments. The first proposed representation adapts the medial axis transform (MAT) into a hierarchical, learnable, convolutional representation. MAT computes the “skeleton” of an object, and representation is developed using a dense feature map to ensure there is enough information for the convolutional network to capture, as well as to build robustness to small shifts. Second, context is represented through graphs containing functionally or semantically related objects, and ambient cues (such as co-occurring text or speech) to improve the model’s ability to recognize objects in novel modalities. Techniques for making weakly-supervised techniques more robust to domain shifts are explored, as a way of capturing non-semantic context. Next, these global representations are combined with standard appearance-based ones and are adapted to novel domains or made domain-invariant through domain generalization techniques. The domain robustness of the resulting representations is tested in a variety of domain shift scenarios, including photorealistic and artistic datasets, different capture conditions, and controllable shift scenarios (e.g., blurring and masking), for both object recognition and detection. Code, any artificially created situations (data), clear protocols for how to train models for existing techniques, and detailed benchmarking results (quantitative and qualitative) will be released to ensure reproducibility.
Learn more about Dr. Kovashka’s award https://nsf.gov/awardsearch/showAward?AWD_ID=2006885