
On March 5, at the Main Forum 2026 Intelligent Transportation Market Economy Management Forum (also known as the Intelligent Transportation Industry Leaders Forum) of the 15th (2026) Intelligent Transportation Market Annual Conference, Wu Kewei, Chairman & CEO of Sinoits, delivered a keynote speech titled Multimodal AI Reshaping a New Era of Transportation Digital Intelligence, deeply analyzing the transformative value of multimodal AI in the digital transformation of transportation.
Wu shared Sinoits’ product system built on multimodal AI capabilities, which unlocks all-dimensional scenarios of all-weather, all-factor, and all-situation transportation perception, and explores a new path for end-edge-cloud full-link empowerment of transportation digital intelligence. Combined with Sinoits’ practical cases in transportation perception, smart parking, and expressway operation, he comprehensively dissected the technical core, scenario applications, and on-the-ground implementation.
Over the past year, the AI field has witnessed numerous major breakthroughs, with multimodal capabilities advancing significantly. The Spring Festival Gala popularized Doubao widely, making it accessible to people of all ages. The embodied humanoid robot, also featured at the Gala, has developed rapidly; Seedance in video generation has recently gained huge popularity; and AI agents have demonstrated their potential to the public through viral applications like "crayfish".
While "One day in AI, a year in the human world" may be an exaggeration, for the transportation industry, "One day in AI, a year of busy work in transportation" is quite fitting. AI has moved from being exclusive to scientists into public life, realizing true technological democratization: not only can everyone use tools like Doubao, but various AI tools also enable all enterprises to participate, fundamentally changing how humans access information.
I believe there are two unavoidable keywords in the AI field in 2026: multimodal and agent. The popularity of OpenClaw offers important insights—it acts as a digital assistant that can access various files and execute programs on a computer, and the intelligent transportation system itself is a complex system with numerous seats. In the future, all transportation industry application software should reserve interfaces for agents to directly perform operations and retrieve relevant content.
Multimodal AI will bring tremendous changes to the industry. In 2024, Sevn Traffic Network published an article asking "When will transportation large models step out of the chat box?" Today, the improvement of multimodal processing capabilities has allowed AI to truly break free from the chat box. The industry commonly refers to it as "multimodal large models", but I prefer to call it "multimodal AI", as "modal" has multiple meanings:
In terms of parameter specifications, AI models were previously divided into large and small ones. Today, spatial intelligence large models and 3D/4D Gaussian technologies are increasingly applied in transportation, and world models have become standard in the autonomous driving industry.
I often reflect on the connection between AI and the transportation industry and find similarities between domestic large model enterprises and expressway owner units: both require tens of billions in investment and enable relevant entities to operate within their built systems. The essence of AI is the processing of information and data—the recombination of Bytes and Tokens; the essence of transportation is the movement of people and goods—the transportation of atoms.
Upon closer inspection, the much-discussed "one-person company" or "zero-person company" model has long existed in transportation: a single truck driver can independently complete a full set of commercial operations. The operational logic of express and logistics companies is highly similar to AI agents, offering many comparable points.
Regarding digital intelligence, adjustments in the Ministry of Transport’s terminology have created an essential distinction from informatization. Informatization centers on obtaining information—for example, content collected by cameras is mainly for human viewing and analysis. Digital intelligence, however, generates information to support machine decision-making. Over the past two decades, the industry has focused more on informatization; the future direction is to let machines make decisions autonomously.
In my view, there are two core pain points in transportation digitalization: "inaccurate perception" and "weak experience". "Inaccurate perception" targets Party A and owner units, referring to insufficient accuracy of perception data, a point mentioned by many experts. "Weak experience" targets travelers and private car users, meaning the public has not truly felt the effects of transportation digitalization.
The Ministry of Transport’s recent "Mobile+" concept aims to address "weak experience", allowing people to tangibly feel the changes brought by transportation digitalization investment. The development of multimodal AI will inevitably reshape the entire era of transportation digitalization, bringing changes to fields including enterprise internal digitalization, highway digitalization, traffic management digitalization, vehicle-road-cloud integration, "Mobile+", and smart parking.
The industry has basically reached a consensus: combining large and small models is the optimal solution to balance detection performance and cost.
With over a decade of deep experience in video detection, Sinoits has significantly improved detection accuracy in recent years by integrating large model technology, especially for long-tail scenarios such as traffic accidents and road debris. Currently, the system can accurately identify more than 20 types of traffic incidents (e.g., traffic accidents, landslides, road waterlogging) and over 10 types of traffic flow data.
The system performs 3D detection of targets, enabling automatic segmentation of road signs/markings and real-time analysis of road conditions; it can also identify over 100 dimensional features, which are crucial for vehicle identity recognition in the "Mobile+" system and support evidence collection for more than 38 types of traffic violations. In numerous field tests, Sinoits’ detection accuracy exceeds 99%. The remaining 1% improvement space will be addressed jointly with industry clients by supplementing more data to continuously optimize large model performance.
Furthermore, multimodal technology is driving transportation perception into an era of all-weather, all-factor, and all-situation coverage. Traditional traffic video detection performed poorly at night due to camera limitations, a challenge that companies like Hikvision, Dahua, and Gaoxin have been addressing.
Sinoits has taken an alternative approach: integrating visible light, thermal imaging, and millimeter-wave radar to develop a new detection device that achieves precise detection day and night. During the day, visible light cameras deliver excellent performance; at night, thermal imaging provides clearer images than visible light, and combined with the long-range detection capability of millimeter-wave radar, the fusion of three data sources achieves far superior results to single-technology solutions.
Multimodal AI also empowers in-vehicle transportation perception. Previously, in-vehicle applications mostly involved adding cameras, limited to small scenarios due to budget constraints.
Sinoits has developed a mobile law enforcement capture device for traffic police motorcycles ("Yunji"), enabling real-time recognition and application, which has been widely deployed in over 20 cities nationwide. With the rapid development of autonomous vehicles (road access granted in over 100 cities), Sinoits has partnered with Neolithic to integrate detection devices into autonomous vehicles, creating mobile capture vehicles that free traffic police from on-site patrols and offer a new solution for roadside parking management.
Additionally, multimodal AI extends transportation perception to low-altitude areas, supporting the development of the low-altitude economy.
During the construction of Guangxi Rongfu Expressway, Sinoits deployed 10 drones (each covering ~10 km) for real-time inspections. Combining automatic perception, AR overlay, and road detection technologies, the system provided real-time feedback on construction progress and safety hazards to builders, enabling precise project monitoring.
Drones also empower expressway operation monitoring, with extensive deployment by Shandong Expressway and Hebei Transport Investment Group. Sinoits provides algorithm support to accurately identify road debris, congestion, and illegal parking.
Drones are the optimal solution for expressway emergency response. Sinoits has also developed tunnel robots for tunnel emergencies. In expressway scenarios, the system automatically links drones to conduct on-site dispersal and voice broadcasts after incidents—a solution already implemented on a domestic expressway, with Sinoits providing full hardware and software support.
While the previous discussion focused on perception, the first principle of transportation digitalization is always to serve travelers—thinking from their perspective and improving travel experience, a consensus in the industry.
First, from the experience of Party A and owner units, multimodal AI can make transportation software interaction as simple as Doubao. In the past, many industry software had hundreds of functions, but only about 10 were commonly used. Large model technology can effectively change this.
Last year, Sinoits released the "Zhitong Zhuoshi" large model 2.0, which supports secondary review of traffic incidents, global situational awareness of road conditions, intelligent data querying, and automatic report generation. This year, it has been upgraded to version 3.0, integrating large model digital humans and speech recognition to create a privately deployed intelligent interaction system, bringing Doubao-like experience to the transportation industry (noting that while Doubao offers excellent experience, it is an internet service and does not support private deployment, with ByteDance having no such plans in the short term).
Multimodal AI also empowers transportation hubs and smart parking, with multiple Sinoits cases:
Vehicle identity recognition is a key industry concern, with many clients demanding 100% accuracy. I believe this goal is achievable by combining multiple technologies: besides license plates, vehicle micro-features (face, model, 3D structure) and even non-motor vehicle features can serve as recognition bases. Multimodal large models further improve recognition accuracy, empowering high-precision expressway recognition and the popular "Mobile+" business.
Sinoits has launched core technologies in this field:
Sinoits has also built image search engines for clients, provided supporting tools, and participated in digital twin toll station projects, restoring lanes and vehicles with >95% accuracy.
Through multiple technologies, Sinoits fully empowers the "Mobile+" business. As early as 2015, Sinoits built a complete "license plate payment" system, which was not widely deployed due to national promotion of ETC. Today, the industry timing has returned, bringing new opportunities for video recognition enterprises.
Sinoits provides full-scenario empowerment for expressways, achieving all-situation and all-factor perception. Around the 10 key tasks of highway digital transformation, it has built a rich product system and an AI+ transportation full-scenario map, covering toll stations, tunnels, trunk roads, urban traffic, intersections, and traffic management big data platforms.
Given the limited funding in traffic management, Sinoits has formulated a dedicated strategy:
Sinoits also empowers police terminals with large models, developing smart helmets and partnering with smart glasses companies to create traffic police-specific smart glasses. In-vehicle, the "Yunji" product reduces costs and improves efficiency for traffic police.
合作咨询
QQ 5257885在线咨询
合作咨询热线:
13779938068
