Classifying Vehicle Types from Video Streams for Traffic Flow Analysis Systems

— This paper proposes a vehicle type of classification model from video streams for improving Traffic Flow Analysis (TFA) systems. A Video Content-based Vehicles Classification (VC-VC) model is used to support optimization for traffic signal control via online identification of vehicle types. The VC-VC model extends several methods to extract TFA parameters, including the background image processing, object detection, size of the object measurement, attention to the area of interest, objects clash or overlap handling, and tracking objects. The VC-VC model undergoes the main processing phases: pre-processing, segmentation, classification, and tracks. The main video and image processing methods are the Gaussian function, active contour, bilateral filter, and Kalman filter. The model is evaluated based on a comparison between the actual classification by the model and ground truth. Four formulas are applied in this project to evaluate the VC-VC model’s performance: error, average error, accuracy, and precision. The valid classification is counted to show the overall results. The VC-VC model detects and classifies vehicles accurately. For three tested videos, it achieves a high classification accuracy of 85.94% on average. The precession for the classification of the three tested videos is 92.87%. The results show that video 1 and 3 have the most accurate vehicle classification results compared to video 2. Video 2 has more difficult camera positioning and recording angle and more challenging scenarios than the other two. The results show that it is difficult to classify vehicles based on objects size measures. The object's size is adjustable based on the camera altitude and zoom setting. This adjustment is affecting the accuracy of vehicles classification.


I. INTRODUCTION
The increase in the population of big cities like Kuala Lumpur causes massive traffic congestion [1]. Intelligent traffic light control systems attempt to improve the traditional traffic light systems. These systems are experimentally implemented in Malaysia, the United Kingdom, Germany, the USA, Australia, and Romani to handle traffic light systems management more efficiently and with lesser human intervention. They integrate additional hardware sensors, digital traffic signs, and cameras and apply artificial intelligence techniques to process the data provided by the hardware to route vehicles traffic [2] intelligently. The system adjusts the phasing and timing of the traffic lights to improve the traffic flows and reduce delays by efficiently adjusting light cycling times [3]. Intelligent traffic management systems have continuously developed and produced traffic flow analysis (TFA) and traffic control and management systems.
Intelligent traffic management systems consider local and coordinative factors such as vehicles speed, vehicles count, vehicles type, traffic congestion level, weather condition, etc., in traffic management and control. The systems are adaptive to the changes and provide optimized performance by locally and coordinatively reasoning over these factors. The local factors include the current local traffic stage, the average vehicles speed, the number of vehicles, the types of vehicles, and the location of the traffic light. The coordinative factors include the current global traffic stage, the global traffic state, the time of the day, the traffic history, and the number of traffic lights in a route or particular region [4]. The TFA systems aim to improve the traffic flow, especially in congested areas. Traffic delays are typically due to the abundance of total transport exceeding road capacity. Congestion is a common problem, particularly in big cities [5], [6]. The collection of vehicle types data helps in many traffic light analysis applications. This is possible because the number of vehicle users is high, and tolerance is lacking between drivers. Incidents usually occur at road interchanges where users are hasty and intolerant towards each other. Problems also arise when road users fail to drive on quickly when the signal light changes to green. Such incidents result in traffic congestion, usually due to large traffic vehicles rushing to cross the interchange. Consequently, a system is required to analyze and evaluate the number of vehicles entering interchange, which often causes problems in controlling the vehicles' exit conditions in congestions.
This system proposes three methods: pre-processing, segmentation, and classification tracks. Pre-processing is implemented for image and video processing. Segmentation uses the active contour method implemented in Java language to process video streams in a desktop application. Traffic routes are always crowded, and accidents occur frequently. This project aims to implement a vehicles classification model from video streams for TFA systems and support optimization for traffic signal control by identifying the types of vehicles. This project is designed to assist the Road Transport Department in identifying any problems that occur on the road that arises from a user's mistake when on the road. This system further enhances road safety and reduces traffic congestion. In the event of any traffic congestion, the system issues information about the specifications of each vehicle that goes through the area to manage the traffic delays and prevent traffic congestions and incidents from happening in the area.
The study's limitations are aspects of the design or methodology that affect or influence the interpretation of the research results. It is the process of finding bias during the development process of this project. Another limitation is the tests are conducted using the available dataset and not realworld scenarios.  1 shows the main steps of the research methodology. It includes seven steps starting with the rationale behind a case study and ending with implementing the required application model. The rationale behind the case study is to analyze a phenomenon, come up with hypotheses, and validate a method. The case study method may well be an acceptable method for research design. Many research papers have been studied to identify methods that give the best accurate results to be applied in this project. Fig. 1 The steps of the research methodology A problem that is clearly defined is partly solved. The research process starts with discovering the issue and its identification, which is the first step toward its resolution. In most cases, issue phrasing is more important than its solution. The issue statement reveals the project's hidden requirements. This project is implemented for intelligent traffic management systems. It identifies the problem that has to be solved, using a video stream to build a vehicle classification model for TFA systems and evaluate the model's performance accuracy [9]. The issue description stresses that the goal of this initiative is to reduce traffic congestion and incidents. Aside from that, its future goal is to identify traffic congestion and the incidents that often occur, especially when drivers still remain at the interchange when the traffic light has changed colors [10].

A. Research design
Research design lays out the techniques and processes for gathering and analyzing data. The required solution in this project may need to eliminate duplicate cars to get an appropriate vehicle categorization result. As a result, the keyframe extraction method is used to choose and extract a single frame.

B. Dataset
The Sochor et al. [11] project includes the BrnoCompSpeed dataset. This dataset consists of normal video-stream records taken from six distinct sites. A total of 18 full-HD movies are included in the BrnoCompSpeed dataset. Each video is about one hour long. There are a total of 20 865 cars (cases) in the film, all of which are tagged with LiDAR and validated using several sources such as GPS traces. The dataset is concerned with the precision and errors of vehicle counts in various scenarios. The movies and metadata for evaluation are included in the dataset, which is downloadable. A sample image of the dataset testing is presented in Fig 2, which shows that the classification is implemented based on the size of the detected vehicles. C. Methods 1) The bilateral filter is a non-linear, edge-preserving, noise-reducing smoothing filter for pictures used in image processing [12]. It uses a weighted average of intensity data from adjacent pixels to replace the intensity of each pixel. Its ability to divide a picture into various scales without producing haloes after a change has led to applications in computational photography, including tone mapping, style transfer, relighting, and denoising.
2) Active contour: Video-based Vehicles Classification (VC-VC) system proposes the active contour method by comparing the current image with a background image. It is used to obtain deformable models or structures with constraints and forces in an image for segmentation. Contour models describe the object boundaries or any other features of an image to form a parametric curve or contour. Active contours are defined as a segment with a width of one pixel and a length of one or more pixels [13]. For object representation and object identification, contours and boundaries detection methods offer the critical information of an image. They utilized the length and form of their contour pixels, for example, to distinguish objects from their backgrounds, compute object sizes, categorize shapes, and identify the feature points of objects. In addition, in graphics and vision, contour information may be used to preserve the shape of objects and return them to their original forms for a variety of applications. As a result, numerous research on contour-tracing algorithms for extracting and tracking an object's contour have been conducted. The accuracy of contour tracing, processing time, data size to preserve the traced contour information, and the ability to correctly restore and expand the original contour using the stored data are the four criteria that are used to assess a contour tracing method.

3) Payload-based classifier:
The most widely used payload-based method involves examining packet contents and comparing them against a deterministic set of signatures [14]. The findings of this categorization technique are very precise. Payload inspection is extensively utilized in a variety of commercial and open-source technologies, such as the Linux kernel firewall implementation.

4) Kalman filter:
The Kalman filter is used to forecast a vehicle's location in the next frame. The velocity of its blob determines the vehicle's velocity. The vehicle's location in the next frame is anticipated using the vehicle's velocity, the vehicle's position in the current frame, and the time passed since the previous frame. The filter is also renowned for its ability to cope with sensor noise and is suitable for any dynamic system that undergoes frequent changes and does not need a large amount of data to be stored in memory [15].

5) Gaussian mixture distribution:
Background subtraction is implemented using a Gaussian mixture distribution method [16]. This method isolates the vehicles from the background, allowing them to be numbered and categorized more simply. The distribution limit for the sum of several independent random variables is computed using this method. Although certain variables may be qualitatively characterized by their resemblance to parametric distributions like the Gaussian (normal) or lognormal distribution, no universal theory can predict the shape of probability distributions for variables in reality.

D. Data Collection and Analysis
Data is the most important input in any decision-making process in a research project since it gives the statistics of the study's significance. The data may be divided into two categories: main and secondary data. The quality of knowledge is defined in terms of features (characteristics of an object), which may be guaranteed by using the correct data collecting technique. Data analysis aims to understand data variations and sources that are related to any phenomenon, action, or event. Because variation is present in all phenomena, understanding it leads to better choices in a few of the phenomena that generate the data. From this perspective, statistics allows the decision-maker to understand how to draw inferences about a large populationbased on data acquired from a sample. The datasets are collected using four videos. In this project, data collection includes a mix of real-world and ground truth data. The video is around five minutes long. The video was taken using a video camera on the left, in the center. The video records two lanes in order to determine the number of cars in each lane. The number of vehicles is then categorized into cars, vans, buses, lorries, and so on.

E. Evaluation Metrics
The results are analyzed between the particular number and ground truth number [17] [18].
The formula for accuracy is also used to find the accuracy of the result. The error percentage is calculated by subtracting the number of vehicles found from the actual number of vehicles and multiplying with 100 as shown in Eq. (1). The error determines whether or not the technique used is accurate to classify the number of vehicles.
The average error is obtained after the percentage error is calculated. To get the average error, the sum of the percentages of error is divided by the number of tested videos, n.
The average error is used to get the accuracy by subtracting it from 100. The accuracy shows how accurate is the number of actual vehicles in the video with the ground truth method.
The last formula applied is precession. Eq. (4) shows the formula applied in calculating the precession. The precession formula calculates the precise number between actual vehicles number and ground truth.

F. Video Content-based Vehicles Classification (VC-VC) model
The design of the proposed model includes five main processing components. They are pre-processing, segmentation, detection, classification, and tracking, as shown in Fig. 3. They are explained in detail below. 1) Pre-processing: The set of techniques used prior to the application of a data mining method. Since data is likely to be imperfect, inconsistencies and redundancies are not directly applicable to starting a data mining process [19]. The larger amounts of data collected require more sophisticated mechanisms to analyze it [20]. Data pre-processing is able to adapt the data requirements posed by each data mining algorithm, enabling to process data that would be unfeasible otherwise [21], [22]. Pre-processing prepares raw data for further processing. There are two types of pre-processing that are used: image processing and video processing.
2) Video-processing: It is chosen because multiple format video signal processing systems are operating in conjunction with a display device timing system. It supplies synchronized video and timing signals to be used by a hard and fast horizontal scanning frequency display device. A clone is employed to avoid the information redundancy of identical forms of vehicles during this process. It classifies one vehicle type just once.
3) Image processing: It is used in a wide variety of applications, primarily for two somewhat different purposes. The first one is to improve the visual appearance of images for a human observer, including their printing and transmission. The second purpose is to prepare images to measure and analyze the features and structures inside these images.
 Segmentation: Vehicles must be correctly separated from the backdrop using a method that must be quick enough to calculate in real-time. It must be insensitive to lighting and weather conditions, and it must only need a small amount of additional data. Splitting or separating an image into sections, termed segments, is known as image segmentation. It is mainly helpful for compression, and beholding applications since processing the whole picture is wasteful in these cases [23]. As a result, image segmentation is used to separate the picture's components for further processing. The primary goal of segmentation is to simplify a picture so that it can be represented in a meaningful and easy-tounderstand manner. The detection technique is used throughout this research.  Detection: This part aims to segment vehicles from the video sequence to identify vehicles types. The method is vision-based, and it depends on the content of the image. The detection includes frame separation, context subtraction, optical motion, and others [24]. Vehicle detection is the first step of a vision-based traffic monitoring process with one static camera. The algorithm that the detection technique uses is:

4) Classification:
It is crucial for calculating the percentages of different vehicle classes that utilize statefunded streets and roads. The adoption of an automated method may result in a more precise pavement design with clear cost and quality benefits. Data on vehicle classifications that utilize a specific roadway is required, especially in urban regions. A categorization system, such as the one presented here may offer crucial information for a specific design situation. A single camera can identify and categorize cars traveling in several lanes.

5)
Tracking: Is given as input to a segment of video data. The video is analyzed, and then the tracking result of its content (i.e., the detected vehicle object) is generated by associating a unique ID to the same vehicle in each video frames [21]. The sequence of vehicle positions in frames for a uniquely identified vehicle is also known as a track. The method chosen for tracking is the Kalman filter, which is used to predict the position of a vehicle in the subsequent frame and predict more accurately where the bounding box appears and extract useful information from the tracking process. The following describes one tracking cycle:  Calculating vehicles' positions: Each vehicle path is traced about its present position using a heuristic to cover as many blobs connected to this vehicle as feasible. This is assumed to represent the vehicle's real location.  Estimation: The preceding subsections calculate object dimensions, which are positions in the picture coordinate system. The prediction parameters are adjusted to minimize the error between the anticipated and measured locations of the vehicle.

III. RESULT AND DISCUSSION
This paper proposes a Video Content-based Vehicles Classification (VC-VC) model that supports TFA based on vehicle classification data. TFA aims to enhance traffic management and safety. It further enhances road safety and reduces traffic congestion. In the event of any traffic congestion, the system issues information about the specifications of each vehicle that goes through the area to control traffic delays, traffic congestions and prevent incidents from happening in the region it operates in. The VC-VC model can identify vehicles types by processing video streams. The dataset we used to test our proposed model belongs to the Brno University of Technology. The dataset used in testing the VC-VC model contains videos that include the number of actual found and the number of ground truth vehicles of a specific vehicle type. The duration of the videos is around 15 minutes. The videos are recorded by using a video camera situated at different angles. Two lanes are classified based on the video to find the class of vehicles in each lane, including cars, vans, buses, lorries, and others. The TFA parameters that the model extracts and collects are the number of vehicles in a particular lane, number of cars, number of vans, number of buses, number of lorries, and others (any moving object which is not a vehicle). Table II shows data collection samples by the model during the testing phase.  The table shows that in Video 1, cars are the highest compared to other vehicles, followed by lorries, vans, and no buses. The same result is observed in video 2 and video 3, which possess the same highest category of vehicles, followed by lorries, vans, and buses. As a deduction, the car has the highest frequency in all three video categories compared to other vehicles. Table III shows the results of error, accuracy, and precession for the three videos tested. They are calculated based on the ground truth and valid classification results collected from the previous parameters for each video. The average is calculated together from all the videos.

IV. CONCLUSIONS
This paper introduces and proposes a Video Content-based Vehicles Classification (VC-VC) model for vehicles classification from video streams. This model extends several methods to extract Traffic Flow Analysis (TFA) parameters, including the background image processing, object detection, size of the object measurement, attention to the area of interest, objects clash or overlap handling, and tracking objects. The VC-VC model can identify vehicles more correctly and prevent certain common blunders that lead to inaccurate classification. In conclusion, the increase in the number of vehicles worsens the traffic problems and becomes harder to handle. People take many alternative ways to avoid traffic congestion. This proposed system application helps to reduce traffic congestion. The application detects the vehicles accurately, and it shows that all three videos have high classification accuracy of 85.94% on average. The precession for the classification of the three tested videos is 92.87%. We shall consider extracting more related traffic flow parameters by improving the proposed model in future work.