ON INFORMATICS

— The number of times students attend lectures has been identified as one of many success factors in the learning process in many studies. We proposed a framework of the student attendance system by using face recognition as authentication. Triplet loss embedding in FaceNet is suitable for face recognition systems because the architecture has high accuracy, quite lightweight, and easy to implement in the real-time face recognition system. In our research, triplet loss embedding shows good performance in terms of the ability to recognize faces. It can also be used for real-time face recognition for the authentication process in the attendance recording system that uses RFID. In our study, the performance for face recognition using k-NN and SVM classification methods achieved results of 96.2 +/- 0.1% and 95.2 +/- 0.1% accordingly. Attendance recording systems using face recognition as an authentication process will increase student attendance in lectures. The system should be difficult to be faked; the system will validate the user or student using RFID cards using facial biometric marks. Finally, students will always be present in lectures, which in turn will improve the quality of the existing education process. The outcome can be changed in the future by using a high-resolution camera. A face recognition system with facial expression recognition can be added to improve the authentication process. For better results, users are required to perform an expression instructed by face recognition using a database and the YOLO process.


I. INTRODUCTION
In several studies, one of many success factors in the academic process is the amount of how many times students attend the lectures. The present level of students in lecture sessions can increase the level of absorption and dissemination of the course given by lecturers. The level of absorption and dissemination of the course influences the success rate of student studies, proved by research conducted by Fadelelmoula [1], a student with a high rate of attendance had better academic performance and a high score test. Tetteh [2] found that the attendance rate and learning time amount significantly affect the study's outcome. Bartanen [3] shows that the attendance rate of students in a course increases academic performance besides how the teacher delivers the course. After conducting several observations and studies, we conclude that the attendance rate may affect the performance and success in academic fields. However, the problem is how to make the students have consistency in attending the lecture sessions by providing a robust attendance system which is expected to be difficult to be faked and falsified.
In many universities, student attendance rates at lectures session are very low. It results in low student academic performance. To increase student attendance rate, we need a smart attendance system, a difficult system to manipulate. This system can ensure the presence of students in a class, in this case: the authentication process. An authentication process can be done in various ways, such as sign-on paper or by using Radio Frequency Identification (RFID) card as an identity. RFID can easily be manipulated and falsified by bringing the other friend's card into class. For that reason, we need another method to make the authentication process in the attendance system difficult to manipulated or falsified. One method of authentication which is difficult to falsify is by utilizing biometric marks from each student. Biometric identification, such as using fingerprints, retina, and face, is quite difficult to be manipulated. They are very individualbased. Each method of authentication has advantages and disadvantages. Authentication using fingerprints or retina has a higher level of security against falsified, yet a high cost is required to implement this method. Another option is to use face recognition which using biometric marks that are on the face. Zang et al. [4] research demonstrated that facial recognition is good for the authentication method. Thus, the author decides to implement facial recognition in the student attendance recording system to change students' behavior, so that they can always attend the lessons. It can also provide an increased level of academic teaching and learning systems into an intelligent level system.

A. Related Work
Face recognition as an authentication method has become popular because it has advantages over identity matching using other biometric features. Face recognition is natural and does not intrusive because face image capturing can be done in the distance. Face recognition defines as ".... a biometric method of identifying a person based on a photograph of their face; biometric methods use biological traits to identify people" [5]. In general, facial recognition using a computer is a process of pattern recognition from a face image captured by a camera. Face images can represent either 2 dimensional (2D) or 3 dimensional (3D) objects in varied lighting conditions, poses, and expressions [6].
The implementation of face recognition in daily life is utilized in several fields such as:  Identification: driver's license card, immigration, resident card, passport, voter registration.  Access control: border crossing control, facility access, vehicle access, kiosk, ATM, computer access, program access, computer network access, online program access, online transaction access, distance learning access, online exam access, exam access (Ye & Hu, 2017), database access, and so on [7], [8].  Security: terrorist warnings, secure boarding system, stadium visitors scanning, computer security, computer application security, database security, file encryption, intranet or internet security, medical history (medical records), terminal stock trading security, payment method, surveillance at nuclear power plants, park monitoring.  Law enforcement: detention of suspects, recognition of shoplifters, recognition of criminals, tracking of suspects and investigations, background checks of suspects, environmental monitoring, identification of fraud at gambling establishments, post-event analysis, social security crimes (insurance fraud/social security) [9].  Information retrieval: indexing database by face, and searching record (retrieval), auto labelling on the face, face classification.  Multimedia Management: face-based search, face segmentation video, event detection.  Human-Computer Interaction: interactive game, proactive computing.  Authentication: Smart Home, Smart Office, Smart City.  Social security: identification of recipients of the social fund, missing person search.  Anti-Criminal [10]. The face recognition approach has many methods so far. Some of them are using linear projection appearance-based, non-linear projection, and neural network. In a linear projection appearance-based, features are projected into a linear subspace. Example of this method is Principle Component Analysis (PCA) [11], [9], Independent Component Analysis (ICA) [12], Linear Discriminant Analysis (LDA) [13], [14], Two Dimensional PCA (2DPCA) [15], and Linear Regression Classification (LRC) [16], Fisher Linear Discriminant [17].
Non-linear appearance-based projection approaches map face input images into high-dimensional space, and the manifold of the face is linear and simplified. Examples that using this method are kernel-PCA(KPCA), kernel-LDA [18].
Other approaches use the Local Appearance Feature: Local Binary Pattern (LBP) [19], [20], [10]. Gabor Phase Pattern [21], and Local Gabor Binary Pattern [22], Discriminative Common Vectors (DCV) [23], Support Vector Machine (SVM) [24], Neighborhood Preserving Projections (NPP)/ Orthogonal Neighborhood Preserving Projections (ONPP) [25], Combine between Kernel Discriminant Analysis(KDA), K-Nearest Neighbor and Support Vector Machine [26], a modified LBPH algorithm based on pixel neighborhood gray median(MLBPH) [27]. This method makes use of local patterns from the face and its surroundings to get unique facial features. Both Non-linear Appearancebased projection and the Local Appearance Feature approach are more accurate than the classic methods, but they are still not enough to implement in the attendance system.
The newest approach in face recognition is to use neural networks, which provide more accurate results. There are two types of neural networks: shallow neural network and deep neural network. A shallow neural network has a shallow layer in architecture, and a deep neural network has many layers in architecture to extract features. An example that uses a shallow network approach can be found in [28], and one that uses a Deep neural network can be found in Boltzmann machines [29], DeepFace [30], DeepID2 [31]. A deep neural network has high accuracy, but the approach is not easy to implement and requires a high level of computation. An approach using neural networks yet not too deep in architecture has a quite high accuracy. Xu et al. [32] proposed a 3D-aided 2D face recognition system (3D2D-PIFR) that robust to pose variation as large as 90•. Convolutional Neural Network faces recognition was also proposed by Qiao & Ma [33], [34]. Convolution neural network using multiple distance face [35].
It is also easy to implement as well, such as proposed by Schroff et al. [36], which used triplet loss embedding proposed as a feature that projected face feature into one single point. Our research uses this method because this system has high accuracy while recognizing a face.
Facial recognition was implemented in many attendance systems. For example, in research by Sharanya et al. [37], they use the haar cascade method to detect face and face recognition by the Local Binary Pattern (LBP). This method is not good enough to detect and recognize a face in the attendance system. Another method using deep learning using arcFace was carried out by Son et al. [38]. The results were quite satisfying but still lacked in the response time, which was not so fast. The implementation of face recognition using face recognition was also carried out by Zhi-heng et al. [39] by using a multi-layer perceptron deep learning to recognize a face, but in this study, Zhi-heng encountered a problem when the number of faces recognized in one frame. Another implementation of face recognition using Eigen Face Recognition [40], [41], but this method has low accuracy. Another approach method using triplet loss embedding was conducted by Schroff et al. [36], namely FaceNet. They use triplet loss embedding value to calculate positive pair and negative pair value, to determine one pair of images are some person or a different person. This method has high accuracy, simple architecture, low computation resource and can be implemented for real-time face recognition. Based on the advantages, we propose a simple, fast, accurate, and robust and can be run on the low-end computer framework of the attendance system with face recognition feature using a triplet loss embedding feature in FaceNet [36] this research.

B. Research Method
To make a real-time attendance system using triplet loss embedding face recognition, based on FaceNet architecture by Schroff et al. [36] with a fast and accurate authentication process, we adapted code from the OpenFace program using Python programming language, Dlib, and Keras library in the implementation. For the model, we use a pre-trained nn4 small model; a model trained using Keras Library.
In the real-time face recognition process, the process contains two-stage, the first stage is the face detection process, and the second stage is the face classification process. Face image was captured in the first stage and then was classified in the second stage. For the face detection process, we use the Haar Cascade Face detection method by Viola and Jones. This method is used because, in the process of authentication, our system does not require the ability to detect and capture multiple faces at one time, only one face one time, and the user was also cooperative during authentication. And also, we only need a captured face image in a frontal condition up to a 45-degree slope. Haar Cascade method is fair enough for this.
For the reference dataset, we collected several photos from each identity with variations in poses and expressions. We limit the poses not to exceed 45 degrees due to method limitations. After collection, we divided the database into two-part for data train and data test, with a ratio of 5:1. The dataset contains 7550 images from 94 identities, with an average of 80 images for each identity.
Our framework process begins with the process of reading each face image from the reference dataset. The next step is face detection. After the face was detected, the process continued with localization of inner eyes and bottom lips, outer eyes, and nose as a reference point in the alignment process. The alignment process uses facial landmarks based on cascade regressors [42] in the Dlib library; the face image is transposed gradually to the frontal position. Each image before the aligned process bounded by a box and cropped resized to 96 x 96 pixels to make the final result of triplet loss embedding has a standard, and this standard is also used in the process of capturing images in real-time. The same process is applied for an image in real-time face recognition, face captured, and aligned, transpose into frontal position, bounded by box, and resize into 96 x 96 pixels. This same process aims to make both images from the reference dataset and captured face form camera have the same standard, which indirectly increases the accuracy and robustness of the created attendance recording system framework.
To get triplet loss embedding, FaceNet [36] uses a deep convolutional base on Zeiler and Fergus [43] a style network, and recent inception [44]. FaceNet employs triplet loss which directly reflects the task face verification, recognition, and clustering. The architecture strives an embedding ( ), from an image into a feature space ℝ , which is the distance between all face images, independent image condition, if the same identity squared distance is small, squared distance between pair of the image from different identity is large. Using a pair of positive and negative used in [45] in eq (1).
Triplet loss from [45] is suitable for face verification; the loss encourages all face one identity to be projected onto a single point in the embedding space, the loss tries to margin between each pair of faces from all other faces, one identity projected into one manifold, and still enforcing the distances to other identities. Triplet loss embedding represent by ( ) ∈ ℝ . Its embedded image into d-dimensional Euclidean space. Embedding to live on d-dimensional hypersphere, i.e. || f(x)||2 = 1. This loss motivated to nearestneighbor classification, ensure image (anchor) of a specific person is closer to all other images (positive) other same person, and far for any image . (negative) Where is is a margin that enforces between positive and negative pair, is the set all possibility triplets in the training set and has cardinality N. The loss minimized by: To produce all possible triplets that can be easily fulfilled by using the constrain in ea. (1). To ensure fast convergence it is crucial to select triplets that maximize the triplet constraint in ea. (1). It means with the given , architecture want to select (hard positive) using !" # $ % ( ) − and similarity (hard negative) with !"&' # $ ( ‖ ( ) − ( )‖ . FaceNet directly learns a mapping from face images to a compact Euclidean space where distance directly corresponds to a measure of face similarity. When space was produced, the tasks face recognition, verification, and clustering that can be easily implemented using the standard technique with embedding feature vectors. FaceNet using triplets of roughly aligned matching/nonmatching face patches was generated using an online triplet mining method based on LMNN [46]. This method learns a Euclidean embedding per image using a deep convolutional neural network, was trained to get squared L2 distance in embedding space directly corresponding with face similarity. The embedding FaceNet has great representational efficiency using only 128 bytes of representational per face. Faces of the same person have a small distance and face different person have large distances. After embedding created the task of face verification simply involves thresholding the distance between the two embeddings; in original FaceNet recognition becomes k-NN classifications problem, and clustering can be achieved by using k-means or agglomerative clustering.
After an alignment process was calculated, each image resulted in triplet loss embedding, which this value calculated for embedding vectors for each image using the pre-trained model used. Use that embedding vector L2 distance to calculate and store onto the L2 distance matrix and the positive-negative matrix pair built for each image in the database. From each score in L2 distance, the matrix calculates the F1 score for each distance below the threshold value. In this paper, we use three threshold values from 0.3 to 1.0, with a 0.01 skip. F1 score was interpreted as a weighted average of the precision and recall, F1 score reached the best value at 1, and the worst score at 0. For multi-class and multilabel cases, this is the average of the F1 score of each class with weighting depending on the average parameter. We also calculate the accuracy score for each distance below the threshold value. From all value F1 score matrices, we try to find the max value and get the threshold from that max value. Then we also calculate the accuracy score from all values under that threshold value.
We classified triplet loss by using an SVM classifier when this triplet loss falls into triplet loss embedding space an identity. If triplet loss embedding from the captured image has a small distance compared with somebody triplet loss embedding, we can assume that the captured image has the same identity. This research also evaluates prediction accuracy and uses triplet loss embedding value classified using k-NN classifier and SVM classifier. The reason we use SVM Classifier is this method is very effective in separating the face embedding vector.

A. Framework
In the proposed system framework, the attendance system uses RFID and combined facial recognition systems as an authentication process. Before the lecture session begins, the student should tap the RFID card on the terminal provided. The terminal reads the student id contained in the card. This id will be sent to the server to validate and get other student information. The attendance system would command the terminal to begin the face recognition process if the server gave a response that the student is valid.
In face recognition, the attendance system asks students to face the camera to capture their facial images. Capturing process, this image can be done in-camera or video model; at this time, the author uses video mode. The face detection process will be conducted in the captured video to detect the presence of a face. If a face is detected, the next process is to bound this face using a box and align the face into a frontal position.
When reading the image database in the system, each image will be projected into embedding spaces. All faces from one identity are projected onto a single point in the embedding space. In this case, all faces in the database will be projected into 94 single points embedding space because we use 94 identities in the database. Triplet loss tries to margin between each pair of faces from all other faces. The same identity is projected into one manifold and enforcing the distance to other identities from a different person. Triplet loss which is represented by ( ) ∈ ℝ . Embedded each image is represented by 128-dimension Euclidean space. Embedding to live on the 128-dimensional hypersphere, this loss is motivated to classify using k-nearest or SVM classification, to ensure the image (anchor) of a specific person is closer to all other images (positive) another same person, and farther away for any image (negative) from a different person. The embedding space from the reference dataset will be used as a reference for recognizing some personal identity.
The next process is authentication. Each student who attends a lecture session is required to record their presence by tapping the RFID card in the recording device in the classroom (see Figure 1).

Fig. 1 Authentication Presence Process
After the RFID id card was recorded, the attendance system would read student records from the database as well as the face images. After the student record details are loaded, the process of capturing face images through the camera or video camera begins for 10 seconds. Face images are captured, face image ) presented into 128-dimensional Euclidean space and projected into 128-dimensional hypersphere. A triplet loss calculates from this 128-dimensional Euclidean space. This triplet loss classifies using k-nearest or SVM classification, ensure the image * (capture image) of a specific person fall into single point embedding or nothing at all. If image * (capture image) fall into a single point embedding. It may be the captured image have identity same with an identity where the image * embedding fall. The next phase is to compare the identity process. This authentication process is done with identity got from the attendance recording phase. If the identity is matched, the authentication process is successful as long as the captured image and images from the reference dataset fall into the same triplet loss embedding vector for the same person. If the face captured image and student's face are predicted as the same person, the system will record the student attendance.
This authentication process will be done over and over again until it is assumed that the image from the database is considered the same as what was caught on the camera, or this check is idle for 10 seconds, or there is no one in front of the camera. This attendance system does not automatically capture images or videos but is triggered by the RFID tapping process. It is hoped that this mechanism will reduce the workload of the system because the request for the face recognition process is only made after the student has tapped and the identity of the RFID is declared to be in the database, so the system does not run all the time to perform facial recognition. Authentication will be declared valid if the identities that appear with the identities obtained from the face recognition process are the same. Next is recording student attendance in the attendance database data; after attendance is recorded, the attendance recording process is complete. For a clear understanding of that process, we showed the flow of how two images compared the 128D classification activity diagram in Figure 1.

B. Evaluation
In the first phase, we tested the system using face images contained in the dataset and captured images in real-time face recognition. Using the k-NN classification, we get a classification accuracy at 96.2+/-0.1% and using SVM classification at 95.2+/-0.1% as mentioned in table 1. This performance is below baseline OpenFace (0.9963 ± 0.009 accuracies). The indication is because of the different databases that we used. We used small pre-trained model (nn4 small) which has less accuracy because that model used fewer parameter in which this is good for speeding up our face recognition process. SVM and KNN are used as classifiers because our method uses the triplet loss embedding features, which must be mapped to a vector space. The suitable ones are SVM and KNN.
In the second phase, we tested the system using face images captured from a camera or video camera (we used 480 x 640 pixels resolutions), where the captured image is certainly not in the database. The system almost recognizes everyone whose identity is recorded in the reference dataset. From 100 captured image/video the system can recognize 95%±0.2 accuracies (normal face, without occlusion), 84% accuracy when the head is tilted around 45%, 94% accuracy for small detected faces (80 x 80 pixels), 75% accuracy with small occlusion on the face, and 20% accuracy for high occlusion.
In the third phase, we tested the system using face images captured from a web camera and adding them into the face image dataset as part of the database (7550 images + 100 images). In this test, we compared the image sample which each embedded in the test. The system was able to recognize almost all photos that were randomly chosen. The performance that happens in the form of accuracy when using face images in a database is 96% ± 0.2 using k-NN and 95+/-0.1%. Through this experiment, it can be concluded that the face recognition authentication process can be implemented on an attendance recording system, and triplet loss embedding in FaceNet can be used as a robust feature for face recognition in the attendance system because triplet loss has high accuracy. From 100 captured image/video the system can recognize 95% ± 0.2 accuracies (normal face, in conditions without occlusion), 84% accuracy when the head is tilted around 45%, 94% accuracy for small detected faces (80 x 80 pixels), 75% accuracy with small occlusion on the face, and 20% accuracy for high occlusion. This method is also suitable to implement in real-time face recognition because the architecture is quite lightweight and easy to implement. Low accuracy in high occlusion is not a problem in this attendance system, because the student can repeat the process until authentication success. The attendance recording system using RFID and combined facial recognition systems as an authentication process will increase student attendance in lectures. The system is difficult to fake because it uses face recognition as an authentication process. Facial biometric marks use in this attendance system will be very difficult to fake, as a result, students will always be present in lectures, which in turn will improve the quality of the existing education process.
In the future, the result can be improved by using high camera resolution. For a better authentication process, a face recognition system can be added with facial expression recognition. With this feature, users are required to perform an expression instructed by face recognition, using database and YOLO method for better performance.