Interested in CR2C2 activities and engagement opportunities?

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding

A multi-agent reasoning tool for comprehensive highway scene understanding, built on a mixture-of-experts strategy. It integrates multiple critical perception tasks, including weather classification, pavement wetness estimation, and traffic congestion detection. By orchestrating specialized vision-language models (VLMs) through task-specific chain-of-thought (CoT) prompting, the tool enables robust multi-task reasoning and achieves significant performance improvements across all evaluated tasks. Experimental results demonstrate consistently high accuracy across diverse scenarios. In practical deployment, the tool can be integrated with the extensive network of existing traffic cameras. In rural areas, where traditional sensor coverage is limited and cellular connectivity may be sparse, it supports strategic monitoring by focusing on high-risk locations such as sharp curves, flood-prone lowlands, and icy bridges. By continuously analyzing scene conditions at these targeted sites, the tool enhances situational awareness and delivers timely alerts, even in disconnected environments.

Link to the tool: https://github.com/SMIL-AI/SMART-VIEW

Related Publication: Y. Yang, N. Xu, and J. Yang, “Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding,” submitted to 2026 TRB Annual Meeting (Paper Number: TRBAM-26-02697).

Framework

Example of CoT prompt generation for pavement wetness level assessment under snowy conditions

Page updated

Google Sites

Report abuse