Hand Gesture Recognition Based on Continuous Wave (CW) Radar Using Principal Component Analysis (PCA) and K-Nearest Neighbor (KNN) Methods

—Human-computer interaction (HCI) is a field of study studying how people and computers interact. One of the most critical branches of HCI is hand gesture recognition, with most research concentrating on a single direction. A slight change in the angle of hand gestures might cause the motion to be misclassified, thereby degrading the performance of hand gesture detection. Therefore, to improve the accuracy of hand gesture detection, this paper focuses on analyzing hand gestures based on the reflected signals from two directions, which are front and side views. The radar system employed in this paper is equipped with two sets of 24 GHz continuous wave (CW) monostatic radar sensors with a sampling rate of 44.1 kHz. Four different hand gestures, namely close hand, open hand, OK sign, and pointing down, are collected using SignalViewer software. The data is stored as a waveform audio file format (WAV) where one data consists of 20 segments, and the data is then examined by using MATLAB software to be segmented. To evaluate the effectiveness of the classification system, principal component analysis (PCA) and k-nearest neighbor (KNN) are integrated. The PCA findings are depicted in Pareto and 2-D scatter plot for both radar directions. The Leave-One-Out (LOO) method is then used in this analysis to verify the accuracy of the classification method, which is represented in the confusion matrix. At the end of the analysis, the classification results indicated that both angles achieved near-perfect accuracy for most hand gestures.


I. INTRODUCTION
Human-computer interaction (HCI) is a field of study that focuses on how humans and computers interact. This field of study is critical to understanding human gestures since it is an important form of human communication that occurs without physical contact and contributes to the development of the human language. Hand gesture recognition is a critical component of HCI. Understanding human hand gestures are critical in light of technological advancement. It has been demonstrated to be beneficial in a variety of fields, including computer gaming and electronic device control. Human hand gestures also benefit medical research, as they were previously used to study hand gestures used to assist stroke victims [1], [2]. This technology has become ingrained in daily life due to its ability to facilitate non-physical interaction between humans and electronic devices. Human hand gestures have been studied as an interface for human-machine interaction over the past years. The methods of acquiring hand gestures are divided into three types: vision, sensor, and radar-based approaches. For the sensor-based method, the sensor will be mounted on a wearable device which can cause discomfort and inconvenience to the user [3]- [8]. In contrast, the visionbased method that uses camera images performs well but is susceptible to illumination and occlusion [9]- [16]. On the other hand, the radar-based method that analyzes the information of the reflected signal by hand is contactless and does not require images. The current human gesture recognition using radar technologies is blooming and improving every year with the help of the Artificial Intelligence (AI) deep learning features and other machine learning algorithms.
Several types of radar were employed in hand gesture recognition projects, such as the Doppler radar [17]- [27]. Due to the advancement in radio frequency (RF) technology, the inexpensive Doppler radar sensor is becoming more common. The hand gesture's velocity data can be determined by Doppler radar using the Doppler effects. The Micro-Doppler signature of the hand motions is obtained from the reflected signal without the range information [18]. However, when several scatter such as fingers, appear in the detection path, they are represented as overlapping signatures in the timefrequency domain. Thus, a detailed investigation is required to distinguish revealing signatures associated with gestures.
An alternative such as continuous wave (CW) radar also was used in past studies with different techniques employed. CW radar is suitable for hand gesture recognition because its performance does affect by the stationary object. In hand gesture recognition, the hand is the only moving part targeted by the radar, and the range and distance information are not necessarily significant depending on the selected gestures. The authors in Bannon et al. [28] presented a simple, extremely low-cost integrated CW radar module for human hand gesture recognition. In the same experimental setup, slightly higher classification accuracy is achieved using CW radar than the frequency modulated continuous wave (FMCW) radar.
Numerous techniques have been practiced in the past studies based on deep learning and machine learning for hand gestures recognition and classification. The deep learning approach requires the predefined characteristics parameter or features because neural networks can learn the features independently from an input signal during the training process. In [29], the author proposed the convolutional neural network (CNN) as the classifier and utilized a 3-dimensional (3D) tensor consisting of a Range-Doppler frame sequence as the data to train the neural network. The drawback of deep learning is that it requires many samples or datasets. Deep learning also needed a powerful graphics processing unit (GPU) to speed up the learning time of the neural network.
On the other hand, the machine learning approach required specific predefined characteristics parameters or features to be extracted from the raw signals to be used as data for the classification by the algorithm. These predefined characteristic parameters or features can be extracted using Unsupervised Machine Learning such as principal component analysis (PCA) [30], [31]. Sun et al. [32] employed k-nearest neighbors (KNN) and proposed five micro-Doppler-based handcrafted features for classification. The results in this study achieved good recognition accuracy despite the hand gestures samples mainly in the axial direction.
The development of hand gesture recognition combining radar and classification algorithms has improved and progressed so much in the last decade. As a result, a technique such as converting trajectory images to low-resolution joint photographic experts' group (JPEG) images to train the neural network was proposed [33]. However, the existing system is considered mature and focuses only on one angle of detection, and thus, the advancement of the systems has almost become stagnant. It is an excellent initiative to explore different strategies that can benefit and complement the existing solutions.
Hence, this paper explores using two 24 GHz CW radar sensors with two transceivers. The captured signals are analyzed using MATLAB software, and the classification is performed by combining the PCA and KNN methods to study and analyze the effects of signals from two different angles, from the front and side of detection in hand gesture classification using machine learning.

II. MATERIAL AND METHOD
The whole process of recognizing the radar-based deaf sign language is shown in Fig. 1. It includes hand gesture acquisition, gesture signal processing, feature extraction, segmentation, and classification of deaf sign language. The classification is performed on two types of input features: original and segmented spectrogram of gesture signals.
The CW radar signals collected in this study are preprocessed to achieve high classification accuracy. Two radar modules are used to collect a target's scattered signal from two different directions, the front and left sides. Each radar produces in-phase (I) and quadrature (Q) components. Timedomain output signals are generated using MATLAB software coding and syntax.
These unprocessed time-domain signals are sampled. To reduce the computation time required by PCA and KNN, only signals containing the desired movement are extracted from the original raw time-domain signals. This PCA employs four Principal Component markers and generates a 2-dimensional (2D) scatter plot of the hand gesture types. Finally, the KNN classifier is used to classify both front and left-hand gesture signals.

A. Experimental Setup and Configuration
The experimental setup of the radars used in this experiment is shown in Fig. 2. This experiment uses two active RF beam radar modules, ST100 Starter kit, with a 24 GHz operating frequency. The radar is comprised of a transceiver that can transmit and receive RF signals in an antenna. Each radar generates the I and Q components of the reflected hand gestures. These two components are acquired at a sampling rate of 44.1 kHz from the radar. The resulting spectrum has a maximum Doppler frequency component of 2.2 kHz, depending on the sampling rate. One of the radars is placed on the front side (labeled as A), facing the participant, and collecting the scattered signal from the front view of the hand gesture, while the other one is placed on the left side (labeled as B), collecting the scattered signal from the side view of the hand gesture. Both radars are placed perpendicularly to each other, with a distance to the participant's hand of 25 cm (labeled as O). The radars are lifted above the table's surface by approximately 5 cm to avoid the signal being reflected from the table. The participants are provided with a chair to be seated and perform appropriate hand gestures. Due to the small size of the room used in this experiment, approximately 100 ft2, there may have been some interference in the reflected signal caused by the moving person. However, these circumstances provide a more realistic outcome for the collected hand gesture samples.

B. Data Acquisition
The ST100 Starter kit includes a SignalViewer software interface, as shown in Fig. 3, for real-time viewing and analysis of Doppler signals. The software records the scattered signals from the radar and saves the signals in a waveform audio file format (WAV). The division of the yaxis that represents the voltage of the graph is set to be 0.5 V/div, while the x-axis that represents the time is set to be 2 ms/div.  Table I, four distinct right-handed hand gestures are collected for this experiment. Those gestures represent common hand movements that participants can perform. The following types of hand gestures are chosen for this study:  Close hand,  Open hand,  OK sign,  Pointing down. Ten male participants are involved in this experiment for data collection, with a range of ages between 19 and 24 years old. Each participant repeated each hand gesture type ten times throughout the session with a delay of approximately two seconds. Thus, 400 data points on hand gestures are gathered using a single radar or one-sided sensor. The total number of hand gesture samples collected for both radars is 800 data. Prior to the data collection process, all participants were demonstrated an example of each hand gesture type. Then they performed the hand gestures freely at their own pace and style. By doing so, the data used in the neural network learning process is more diverse.
One WAV file is generated for a participant after performing a gesture in ten repetitions per session. Thus, a total of 80 WAV files are collected for both radars containing 800 data of hand gesture signals from ten participants.  Fig. 4 shows the block diagram of the data processing, which starts with loading and reading the raw data (WAV format) using MATLAB software. The raw signal in WAV format contains 20 segments of the hand gesture, as shown in Fig. 5, which represents ten hand gestures, including the initial position. The signals are then manually extracted using the Signal Analyzer application, and only one segment containing the hand gesture Doppler is selected to be analyzed while the noise and other interference are ignored, as shown in Fig. 6.

C. Data Processing
The time-domain segmented signal is then transformed into a frequency-domain signal by using the fast Fourier transform (FFT) technique to comprehend the frequency information of the signal. The frequency-domain signal is then transformed into a power spectral density (PSD) signal, and the amplitude is normalized to one, bringing all the variables into the same range while simplifying the signal to be analyzed for the propose of features selection.
PSD signals can be used directly as neural network inputs. However, due to the high dimensional of the data, the classification result is not optimal. Apart from that, the number of features is comparable to the length of the overall signal. The computational time will be increased as a result of the over-fitting problem. Thus, by exploiting the correlation between the features, the principal components analysis (PCA) is used to reduce the dimensional of the spectra feature vector. It is important to select an appropriate feature rather than a large number of them, as many features do not always yield satisfactory results in machine learning. The output of PCA is then will be used as either training or testing data in the classification system.

D. Data Classification
The KNN algorithm is a form of Supervised Learning that is frequently used in classification and regression. Additionally, KNN is a flexible algorithm capable of resampling datasets and imputing missing values. Typically, classification system data is divided into training and testing sets. However, due to the small number of samples in this experiment, all hand gesture signals are classified as training data. The number of nearest neighbors, k is three, while the remaining parameters remain unchanged.
As a result of the foregoing, the Leave-One-Out (LOO) method is used to verify the classification accuracy of KNN. This method is better with a small database because only one hand gesture signal is used as the testing datasets at a time, while the remaining hand gesture signals are used as the training datasets. As a result, the training data outnumber the testing data by 399:1 in the LOO method for the front or left side datasets at a time.
The KNN is classified as a Non-Parametric and Lazy Learning Algorithm in this work due to the underlying assumption about data distribution because all data is in training. Alternatively, the datasets dictate the model structure. Due to the fact that the vast majority of real-world datasets violate mathematical theoretical assumptions, this will be extremely useful in practice. KNN is a well-established and simple non-parametric technique for classifying samples. It calculates the approximate distances between PSD signal vectors obtained through PCA and then assigns unlabeled points to the class of their KNNs.
The classification process algorithm is shown in Table II, where there are three main processes. The process is started with initializing the datasets. The signals in time-domain Doppler features, which are in WAV format, for four types of hand gestures measurements are converted into two components, which are I and Q components. The sampling rate of the signals is 44.1 kHz. Then, the datasets are segmented into one segment over 20 segments. The signal is then converted to a frequency-domain signal using FFT technique and transformed into PSD signal. From the PSD signal, the correlation between the signals is exploited using the PCA method and then divided into two groups: training and testing. Using the KNN method, the signals are resampled, and the missing dataset values are imputed. Lastly, the classified signals are verified using LOO method.  Classification process a) Exploit the correlation using PCA method. b) Divide the output of PCA into two groups -training and testing. c) Resampling and imputing missing datasets values using KNN method. d) Verify the classification using LOO method.

III. RESULT AND DISCUSSION
The power spectral analysis of four-hand gesture signals for the front and left side radars is shown in Fig. 7 and Fig. 8, respectively. Each graph contains 100 spectral, representing ten times of gestures from ten participants. In Fig. 7, there is a significant difference in spectral power between each type of hand gesture from 0 to 100 Hz, whereas Fig. 8 shows the significant difference between the hand gestures in the range of 0 to 50 Hz. Beyond the ranges, the signals are essentially noise floor.
Additionally, different radar positions produced distinct spectra containing additional information easily distinguishable due to their different frequency characteristics.
The PSDs for each hand gesture class show slight differences due to the participants' varying speeds and styles of hand gestures. From the figures, it can be seen that there is a significant difference in the spectral of each hand gesture, verifying that each type of hand gesture type has its unique pattern of spectral.  Fig.  11 shows the variance explained by the first nine PCs, while the left side radar in Fig. 12 shows the variance explained by the first three PCs. In both figures, the first PC is higher than the other PCs, creating a downtrend pattern as PCA attempts to pack as much information into the first component as possible, then maximize the remaining information in the second, and so on. Due to the significant gap in the signal characteristics between each hand gesture type. This is because the left side radar experienced difficulty distinguishing hand gestures from that view. Additionally, the hand gestures performed in this experiment involved finger movements rather than larger wrist movements, which resulted in a small amount of information in the captured signals. This statement is reinforced further by referring to the PSD signals for the left side radar in Fig. 10, where there are fewer differences between the power spectral signals that occurred in the low-frequency range than the PSD signals for the front side radar in Fig. 9.  The 2-D scatter plots of four different hand gesture classes from the front and left side radars are depicted in Fig. 13 and Fig. 14, respectively. The figures show that the hand gesture signals captured from the front side radar exhibit less consistency and accuracy in clustering than the hand gesture signals captured from the left side radar. The scattered plot could be a result of the speed and style of each participant's hand gesture, which is sensitively captured by the front side radar sensor. The separations between each class of hand gestures for front angles are quite distinct, indicating that the training model is performing well. Unlike the signals captured by the left side radar, the scores merge between each hand gesture type as the radar sensor barely distinguishes the hand gestures. Additionally, it is said that the differences in the power spectral signals observed at the low frequencies hampered the separation of the clusters.  Even though the clustering in the 2-D scatter plot is not optimal, the average error of the model and the classification performance of both front and left side radars are satisfactory. The predicted class for front and left side radar is obtained using the KNN machine learning algorithm and the LOO method, as shown in Fig. 15 and 16, respectively. The hand gesture signals for the front side radar in Fig. 15 demonstrate a high accuracy percentage, while the hand gesture signals for the left side radar in Fig. 16, only close hand gesture achieves 99% accuracy, while the others recorded 100% accuracy. The discrepancies may occur when hand gestures are not properly aligned with the radar sensor, particularly when they involve similar movements for each gesture.  This study proposed the signal recognition of hand gestures from two distinct views from two CW radars. The classification results indicate that both hand gestures signal from the front and left side radars achieved high accuracy, despite the scores not clustering appropriately in the 2-D scatter plot. Only one class of hand gestures from the left side radar achieved a classification accuracy of 99%. Nonetheless, some adjustments and enhancements are possible in the future. The experiment can be enhanced by conducting it in a more controlled environment to avoid background noise and interference. This will aid in the collection of ideal hand gesture signals for classification via simple machine learning techniques such as KNN. Additionally, hand gesture signals from other views such as from the top, bottom, or right side can be collected for multiple inputs of deep learning neural networks, resulting in more accurate and realistic classification results, as various views of radars can convey varying amounts of information of the hand gesture's characteristics.