The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by azliza, 2022-09-13 20:00:12

E-proceeding PEERS'22

E-proceeding PEERS'22

Future studies may expand the dataset from the alumni dataset with
more attributes and annotate the attributes with information like correlation
factors between the current alumni and previous alumni. The researcher also
looked at integration datasets from different sources of data, for instance,
graduate profiles from the alumni organization in the respective educational
institutions. Having this, next, the researcher should plan to introduce
clustering as part of pre-processing to cluster the attributes before attribute
ranking is performed. Other data mining techniques such as anomaly
detection or classification-based association may be implemented on alumni
employability in Malaysia.

6. References

A’rifian, N. I. N. B., Daud, N. S. A. B. M., Romzi, A. F. B. M., & Shahri,
N. H. N. B. M. (2019, November). A comparative study on
graduates’ employment in Malaysia by using data mining. In
Journal of Physics: Conference Series (Vol. 1366, No. 1, p.
012120). IOP Publishing.

Alomari, K. M., AlHamad, A. Q., Salloum, S., & Salloum, S. A. (2019).
Prediction of the digital game rating systems based on the ESRB.
Opcion, 35(19), 1368-1393.

Arunachalam, A. S., & Velmurugan, T. (2018). Analyzing student
performance using evolutionary artificial neural network algorithm.
International Journal of Engineering & Technology, 7(2.26), 67-73.

Fernandez-Chung, R. M. & Ching, L. Y. (2018). Phase III – Employability
of Graduates in Malaysia: The Perceptions of Senior Management
and Academic Staff in Selected Higher Education Institutions.

International Labour Office. (2020). Global Employment Trends for Youth
2020: Technology and the future of jobs. International labour office.

Organisation for Economic Co-operation and Development (OECD).
(2018). The future of education and skills: Education 2030. OECD
Education Working Papers.

Ünal, F. (2020). Data mining for student performance prediction in
education. Data Mining-Methods, Applications and Systems.

Vinutha, K. & Yogisha, H. K. (2020). Employability Prediction of
Engineering Graduates Using Machine Learning Algorithms.
International Journal of Recent Technology and Engineering
(IJRTE), 8(5).

Wan, C. D., Sirat, M. & Razak, D. A. (2018). Education in Malaysia
Towards a Developed Nation. Economics Working Paper.

Zoric, A. B. (2020). Benefits of Educational Data Mining. Journal of
International Business Research and Marketing, 6(1).

Perceptual Aliasing Analysis Utilizing Bag of
Visual Words for Optimal Loop Closure
Detection

Talha Takleh Omar Takleh1*, Shuzlina Abdul-Rahman2,
Sofianita Mutalib3, Siti Sakira Kamaruddin4

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
UiTM Shah Alam, Selangor, Malaysia

[email protected]*, [email protected],
[email protected]

4School of Computing, Universiti Utara Malaysia,
06010 UUM Changlun, Kedah, Malaysia
[email protected]

Abstract: Bag of visual words is a method that extracts visual words from
an image and represents it in a histogram to provide a different path when
used for loop closure detection in areas such as Simultaneous Localisation
and Mapping (SLAM) and Content-Based Image Retrieval (CBIR).
However, this method is prone to perceptual aliasing, creating false positives
in identifying a similar location during the loop closure detection. This paper
demonstrates the investigation of the bag of visual words method,
particularly the effects of branching factor on perceptual aliasing during
loop closure detection and determines the optimal number for the branching
factor with the least perceptual aliasing effect. The experiment carried out
in this paper utilises several community datasets performed in Matlab to
simulate an input image that goes through a SLAM or CBIR solution
utilising a bag of visual words and similarity score for loop closure
detection. Comparison between the input and loop closure detection images
regarding their location, similarity score, strongest features clusters, and
putative points will demonstrate the perceptual aliasing effect during the
loop closure detection. The results will show that different branching factor
values in the bag of visual words method influence the perceptual aliasing

effect during the loop closure detection. Therefore, the effect of perceptual
aliasing during loop closure detection can be minimised. The optimal
branching factor can improve the accuracy of solutions using the bag of
visual words method.

Keywords: Bag of visual words, Branching factor, Loop closure, Perceptual
aliasing, SLAM

1. Introduction

Loop closure detection is the ability of a system or solution to detect
similar images within its storage when compared with an input image.
Examples of solutions utilising loop closure detection are Simultaneous
Localisation and Mapping (SLAM) and Content-Based Image Retrieval
(CBIR) (Hu et al., 2019; Xia et al., 2022). Concerning SLAM, loop closure
detection is used to assist in autonomous navigation by suggesting similar
images linked to the input image that can provide assumptions for the next
possible movement of the object, such as a self-driving vehicle. On the other
hand, in the case of CBIR, loop closure detection is used for retrieving
similar images from the input image, typically from vast storage of images.
Various methods can be utilised for loop closure detection. One of these
methods is called the bag of visual words, which has been utilised in various
SLAM and CBIR solutions with promising results (Huishen et al., 2018;
Wang et al., 2020).

A bag of visual words is a method that extracts global or local
features from an image in the form of visual words and represents it in a
histogram (Garcia-Fidalgo & Ortiz, 2018). The benefit of this feature is the
significant reduction in the data size when used for loop closure detection
and storage, thus allowing for more data to be processed. Together with an
efficient loop closure detection method, the processing time can be reduced
as well. These benefits grant the development of SLAM and CBIR solutions
that can operate in real-time due to the faster processing speed and smaller
size per unit of the image for its storage, translating for more images to be
stored (Labbé & Michaud, 2013; Xu et al., 2019). However, there is a
problem with the accuracy of the performance that utilised this method,
which comes in the form of perceptual aliasing.

Perceptual aliasing is when a location, usually represented as an
image, is wrongfully identified as a location similar to the input location

during the loop closure detection process. The bag of visual words method
is prone to this problem (Garcia-Fidalgo & Ortiz, 2015). This problem can
be linked to visual word descriptors within the bag of visual words method,
such as its tree properties element. It is due to the correlation between the
tree properties and the total visual words represented in the histogram. The
tree properties investigated in this paper are the branching factor. This paper
will demonstrate the effect of the branching factor on the perceptual aliasing
effect during loop closure detection.

This paper investigates this problem by experimenting with the
influence of the branching factor in the bag of visual words method on the
perceptual aliasing effect during the loop closure detection. The paper is
divided as follows; section 2 discusses the background information related
to the bag of visual words method and perceptual aliasing during loop
closure detection. Section 3 demonstrates the experiment that this paper
conducted. Section 4 discusses the results and findings of the experiment,
and finally, the conclusion of this paper will be presented in section 5.

2. Background

As mentioned in the previous section, the bag of visual words is a
method used in SLAM and CBIR solutions due to its ability to reduce the
size of an image, representing the image as a histogram that can be used
during loop closure detection comparison. Due to this advantage, more
locations can be stored and analysed, resulting in a faster processing time
(Bumpis et al., 2018). The bag of visual words achieves this by considering
image features, whether global or local, as visual words and quantising these
visual words based on the set of the representative features known as visual
vocabulary. The quantisation groups the descriptor (global or local) to the
closest image visual word found in the respective image. Fig. 1 represents
this idea.

Fig. 1 Bag of visual words representation

The visual vocabulary is determined by the branching factor and the
number of levels from the tree properties, as shown in Fig. 1. This visual
vocabulary will form the visual dictionary for the bag of visual words used
during the quantisation process, forming the image's histogram. The
characteristic of the bag of visual words mentioned above makes it suitable
for SLAM and CBIR applications. However, there is a drawback to this
method. It comes in the form of perceptual aliasing; as mentioned in the
previous section, perceptual aliasing can detect false positives during loop
closure detection. It is due to the quantisation property of the feature
extraction (local or global) in the bag of visual words. Examples of feature
extraction are colour histogram for global features and Speeded-Up Robust
Features (SURF) for local features extraction. The visual vocabulary
construction in the bag of visual words method can gather noisy words,
translate to a loss of spatial relations between these visual words in the image
and quantise the visual words incorrectly (Angeli et al., 2008; Xia et al.,
2022). The experiment conducted in the next section will demonstrate this
issue.

3. Experiment

The experiment conducted in this paper aims to determine the
optimal branching factor value for the bag of visual words method during
the loop closure detection. One of the unique features of the bag of visual
words method is the customisation of the feature extraction method. Thus,
it allows the user to create a bag of visual words solution suitable for the
intended application. In this experiment, we utilised the Colour Layout
Descriptor (CLD) as the feature extractor in the bag of visual words method.

It is due to the global descriptor characteristic that is more efficient in
searching image collection made up of scenes, which in our case, images of

the scene found in NewCollege (NC) and CityCenter (CC) databases (Labbé

& Michaud, 2019). Apart from the efficiency, CLD is also known for its
quick feature extraction and is often utilised in fast browsing and search

applications.

The experiment was conducted in Matlab, operating in macOS Big
Sur and performed in Mac Mini (M1) with the specification of an Apple M1
processor and 16 GB of memory. The image databases used in this
experiment (NC and CC) are the usual databases used in other literature
(Zhu & Huang, 2021; Kümmerle et al., 2013). The flow of the experiment
in simulating the loop closure detection is as follows. The experiment starts
with offline training of the images in the database. These images will go
through the bag of visual words method, which uses the CLD as the feature
descriptor with a customisable number of levels and branches for the tree
properties. The number of levels and branches will determine the number of
visual vocabulary influencing this experiment's loop closure detection
performance. Next, the experiment takes an input image which will go
through the bag of visual words method, indexed, and produce the visual
word histogram based on the visual vocabulary from the CLD.

The number of visual vocabulary for an image depends on the
number of levels and branches. Eq. (1) represents the previous statement,
with b being the total branching factor and L being the number of levels.
Next, the experiment will go through a loop closure detection scenario by
comparing the input image histogram with the images in the database. The
five most similar images will be selected as the images forming the loop
closure detection. Once the loop closure images are obtained, local features
in the form of SURF from the input image and the loop closure images will
be extracted and analysed. Comparing separately local features, such as
SURF, is due to the longer processing time if local features are used from
the beginning of the experiment. The flow of the experiment is summarised
in Fig. 2.

Total number of visual words = bL (1)

Fig. 2 Experiment methods for loop closure detection

4. Discussion

Based on the experiment results, we observed several key
information. First, the total number of visual vocabulary correlates directly
with Eq. (1). In the experiment, we chose five values for the branching
factor, 10, 50, 100, 500, and 1000, while maintaining the number of levels,
L to 1. It is due to the significant increase in processing time when the
number of levels is more than 1. It suggests that it is better to maintain the
level of branching factor to 1 if we want to improve the solution further to
fit the online criteria requiring the solution to operate under a real-time
condition with a limited processing time window. Certain literature
suggested real-time constraints to be around 30 Hz (Labbé & Michaud,
2013).

Next, in terms of the accuracy of the loop closure detection based
on the similarity score utilising the retrieveImages function from Matlab and
the visual inspection, we observed that the lowest and highest branching
factor have the highest perceptual aliasing effect. It means that low and high
visual vocabulary can have a noisy effect influencing the inaccurate
quantisation of the visual vocabulary during the histogram creation.
Although in terms of similarity score, the lowest average score belongs to
the highest branching factor, 1000, from the visual inspection, the perceptual
aliasing effect was the worst for branching factors 50 and 1000. This

observation is supported when we investigated the locations of the loop
closure detection images compared with the input image. The locations were
far off in most loop closure detection images compared to the input image.
These observations demonstrate the perceptual aliasing effect producing
incorrect images during loop closure detection. The optimal value of the
branching factor with the least perceptual aliasing effect in this experiment
is 100, which has the highest similarity score, and the visual inspection
confirmed this statement. Thus, from the experiment, a low value of
branching factor produces a low number of visual vocabulary that may not
be able to be appropriately distinguished during loop closure comparison. In
contrast, a high value of branching factor produces a high number of visual
vocabulary that may result in garbage value that can influence incorrect
comparison.

Finally, due to the CLD feature extraction used in this experiment,
there were instances in which certain distinguished colours in the image,
such as blue, contributed to the perceptual aliasing effect in the loop closure
detection. In some cases, an input image with a significant blue colour, such
as a car, can be inaccurately identified in the loop closure due to the high
value of the visual vocabulary associated with the blue colour. Similar
situations were observed in illuminated images, such as images with bright
sunlight. The illuminated environment mentioned above is problematic for
the bag of visual words, and a similar problem can be seen in this literature
(Lajoie et al., 2019). Table 1 summarises the results of this experiment.

Branching Factor Table 1. Bag of visual words experiment 1000
10 50 100 500 1
No. of Clustering 0.8
Steps 1 111
1000
Strongest 0.8 0.8 0.8 0.8 0.786
Features 10 50 100 500 101.4304
0.8296 0.9534 0.9603 0.877 1000
No. of Clusters
61.3439 97.1183 91.4953 116.3781
Avg. Similarity
Score 10 50 100 500

Processing Time
(s)

No. of Visual
Dictionary

5. Conclusion

In conclusion, the experiment conducted shows the optimum
branching factor with the least perceptual aliasing effect during loop closure
detection. It gives an optimal value of quantisation of visual vocabulary
group that can be represented in a histogram with the least possibility of
grouping the noisy visual words. Together with normalisation methods, such
as term frequency – inverse document frequency (TF-IDF), these noisy
visual words can be minimised further, thus having a better spatial
relationship between the visual vocabulary, translating to a more accurate
histogram. It can improve the accuracy of detecting a closer and similar
image during loop closure detection. The next step for this research is to
combine a bag of visual words with a normalisation method that can
improve current loop closure detection for SLAM and CBIR solutions.
Therefore, the optimal value of the branching factor correlating with the
visual word vocabulary can help reduce the detection of noisy words
contributing to the perceptual aliasing effect during loop closure detection.

6. Acknowledgement

The authors would like to thank Universiti Teknologi MARA,
Malaysia and the Ministry of Higher Education Malaysia for the facilities
and financial support under the national grant 600-IRMI/FRGS 5/3
(461/2019).

7. References

Hu, Z., Qi, B., Luo, Y., Zhang, Y. & Chen, Z. (2019). Mobile robot V-
SLAM based on improved closed-loop detection algorithm. 2019
IEEE 8th Joint International Information Technology and Artificial
Intelligence Conference (ITAIC) (pp. 1150-1154). doi:
10.1109/ITAIC.2019.8785611.

Xia, Z., Jiang, L., Liu, L, Lu., and Jeon, B. (2022). BOEW: A Content-Based
Image Retrieval Scheme Using Bag-of-Encrypted-Words in Cloud
Computing. IEEE Transactions on Services Computing, vol. 15, no.
1 (pp. 202-214). doi: 10.1109/TSC.2019.2927215.

Huishen, Z., Ling, X., Huan, Y. & Liujun, W. (2018). An improved bag of
words method for appearance based visual loop closure

detection. 2018 Chinese Control and Decision Conference (CCDC)
(pp. 5682-5687). doi: 10.1109/CCDC.2018.8408123.
Wang, H., Xia, Z., Fei, J. and Xiao, F. (2020). An AES-Based Secure Image
Retrieval Scheme Using Random Mapping and BOW in Cloud
Computing. IEEE Access, vol. 8 (pp. 61138-61147). doi:
10.1109/ACCESS.2020.2983194.
Xu, Y., Zhao, X. and Gong, J. (2019). A Secure CBIR Method Based on
Bag of Visual Words Model Under Cloud Environment.
Proceedings of the 2019 3rd International Symposium on Computer
Science and Intelligent Control (ISCSIC 2019) (pp. 1-8). doi:
10.1145/3386164.3389099.
Garcia-Fidalgo, E. & Ortiz, A. (2018). iBoW-LCD: An Appearance-Based
Loop-Closure Detection Approach Using Incremental Bags of
Binary Words. IEEE Robotics and Automation Letters, vol. 3, no.
4, (pp. 3051-3057). doi: 10.1109/LRA.2018.2849609.
Labbé, M. & Michaud, F. (2013). Appearance-Based Loop Closure
Detection for Online Large-Scale and Long-Term Operation. IEEE
Transactions on Robotics, vol. 29, no. 3, (pp. 734-745). doi:
10.1109/TRO.2013.2242375.
Garcia-Fidalgo, E. & Ortiz, A. (2015). Vision-based Topological Mapping
and Localization Method: A Survey. Robotics and Autonomous
Systems, vol. 64, (pp.1-20), Feb. 2015. doi:
10.1016/j.robot.2014.11.009.
Bampis, L., Amanatiadis, A., & Gasteratos, A. (2018). Fast loop-closure
detection using visual-word-vectors from image sequences.
International Journal of Robotics Research 37(1) (pp. 62-82). doi:
10.1177/0278364917740639.
Angeli, A., Filliat, D., Doncieux, S. & Meyer, J. (2008). Fast and
Incremental Method for Loop-Closure Detection Using Bags of
Visual Words. IEEE Transactions on Robotics, vol. 24, no. 5 (pp.
1027-1037). doi: 10.1109/TRO.2008.2004514.
Xia, Z., Ji, Q., Gu, Q., Yuan, C. and Xiao, F. (2022). A Format-compatible
Searchable Encryption Scheme for JPEG Images Using Bag of
Words. ACM Transactions on Multimedia Computing,
Communications and Applications, Volume 18, Issue 3 (pp. 1-18).
doi: 10.1145/3492705.
Labbé, M. & Michaud, F. (2019). RTAB-Map as an open-source lidar and
visual simultaneous localization and mapping library for large-scale
and long-term online operation. Journal of Field Robotics 36(2) (pp.
416-446). doi: 10.1002/rob.21831.

Zhu, M. & Huang, L. (2021). Fast and Robust Visual Loop Closure
Detection with Convolutional Neural Network. 2021 IEEE 3rd
International Conference on Frontiers Technology of Information
and Computer (ICFTIC) (pp. 595-598). doi:
10.1109/ICFTIC54370.2021.9647341.

Kümmerle, R., Ruhnke, M., Steder, B., Stachniss, C and Burgard, W.
(2013). A navigation system for robots operating in crowded urban
environments. 2013 IEEE International Conference on Robotics
and Automation (pp. 3225-3232). doi:
10.1109/ICRA.2013.6631026.

Lajoie, P., Hu, S., Beltrame, G. & Carlone, L. (2019). Modeling Perceptual
Aliasing in SLAM via Discrete–Continuous Graphical Models.
IEEE Robotics and Automation Letters, vol. 4, no. 2 (pp. 1232-
1239). doi: 10.1109/LRA.2019.2894852.

Enhancing MAAD in Streaming Data

Muhammad Yunus Bin Iqbal Basheer1, Azliza Mohd Ali2,
Nurzeatul Hamimah Abdul Hamid3, Muhammad Azizi Mohd Ariffin4,

Rozianawaty Osman5, Sharifalillah Nordin6

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
Shah Alam, Selangor, Malaysia

[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]

Abstract: Real time processing allows analysing live data immediately and
enables us to act or make decisions promptly. Offline processing seems not
enough to tackle issues like crimes. This is because real time processing can
handle real time events which are very vital rather than processing data that
are from the past. Fault detection enables detecting suspicious behaviour in
any domain. Data typically change dynamically in a pipeline, especially
when involving streaming data. Furthermore, accumulation of data over
time may consume memory, which needs to be used efficiently, since
memory is limited. Therefore, this paper presents a new approach to enhance
multithreaded autonomous anomaly detection (MAAD) algorithms which
reportedly work very well in detecting anomalies as well as have
characteristics to handle challenging streaming environments. MAAD
allows asynchronous processing. Asynchronous processing enables parallel
processing which will save time as well as capturing data from time to time
without stopping. However, in this paper, the MAAD algorithm will not use
the pipeline architecture, since it will waste important time.

Keywords: AAD, Autonomous, MAAD, Streaming Data

1. Introduction

Real time processing is very important, since it allows us to analyse
live events. Analysing live events is very difficult, since data come without
stopping (Bomatpalli & Vemulkar 2016). Furthermore, data arrival will

accumulate over the time making the algorithm become slower throughout
the time. Not only that, data in the streaming environment changes
dynamically and is totally unpredictable (Rettig et al., 2015). In addition,
streaming data should be processed as soon as it arrives (Tellis & D’Souza
2018). Detecting anomalies in streaming data is very important, since it can
help to detect suspicious behaviour of data in real time. However, it seems
difficult, since building an algorithm that can work in streaming data is a
tedious job. The algorithm should compute efficiently, because as time goes
on, the algorithm may become slower. Parallel computing is also an
important aspect in anomaly detection algorithms because the mechanism is
needed to capture every single data at a time. For example, an anomaly
detection algorithm computes data that has arrived at the algorithm. During
this operation, other data may already be available to be inserted to the
algorithm. But the process cannot be undone since the anomaly detection
still carries out the anomaly detection process. Therefore, parallel
computing or asynchronous processing will allow us to capture important
data from time to time.

Multi-threaded Autonomous Anomaly Detection (MAAD) was
introduced in (Iqbal Basheer et al., 2021). However, this version of MAAD
contains a pipeline architecture which sends and receives data between two
parties, which are the data source and the MAAD algorithm. In this research,
MAAD and the data source are independent. MAAD does not require
communication with data source nor does data source communicate to
MAAD. MAAD is very suitable to be implemented in streaming data. It
does not make any assumptions, since streaming data changes dynamically
in the pipeline. It uses a recursive mechanism which saves memory and is
computationally efficient (Wang et al., 2017). It also utilises parallel
processing which enables it to receive data from streaming data source
without any miss. This paper presents MAAD without a communication
pipeline. The paper starts with a brief introduction. This is followed by
section two describing simple data source architecture. Section three
describes MAAD built without a communication pipeline. The paper
concludes in section 4.

2. Data Source Architecture

This research involves the internet of things (IoT) as a case study
which acts like streaming data. However, this time there is no pipeline
architecture used. This is because pipeline architecture which is used in

(Iqbal Basheer et al., 2021) may make the algorithm work slow. For
example, during the communication between IoT device and MAAD, new
data will continue to be produced. As a result, the newly produced data may
not be entered into MAAD. In this paper, a new approach is proposed where
the data obtained from the data source will keep being sent to the MAAD
algorithm. Therefore, no communications are carried out. Figure 1 shows
simple architecture of the data source.

Fig. 1 Data Source Architecture

Based on figure 1, the data source will keep publishing data read to
MAAD. MAAD will not request the next data, as it will only receive data
from the data source. Therefore, the relationship between the data source
and MAAD is independent. Furthermore, the architecture is simpler than
that proposed in (Iqbal Basheer et al., 2021). Data sources will not have to
subscribe to any message from MAAD, and MAAD also will not publish
any data-to-data source. As a result, the data structure architecture is
simplified to received streaming data continuously. The data source may be
any IoT device which is able to send or publish data to MAAD. In the next
section, the novel MAAD approach is thoroughly explained.

3. Multithreaded Autonomous Anomaly Detection (MAAD)

In this research, MAAD was built differently. MAAD will not
request next data after checking the second thread status. It will continue
receiving new data. This is handled by the first threads. The architecture of
the first thread is shown in figure 2.

Fig. 2 First Thread Architecture

In this thread, the algorithm connects to hive MQTT, which is an
internet broker. Then hive MQTT will receive new data read from data
source and send it to MAAD. Based on figure 2, firstly, the first thread in
MAAD will connect hive MQTT. Then, it will subscribe to a data source
topic in which the data source will be publishing streaming data. If the total
number of data is more than 27, the first thread will evaluate the second
condition. This is because MAAD runs AAD (Gu & Angelov 2017) in the
background. AAD will only work when there are more than 27 data points.
AAD is consist of four different phases which is shown in figure 3. After
that, in figure 2, if the second condition or second thread has finished its
process, it will send the whole current data to the second thread. After that,
the first thread will continue to receive the next data. If the first condition or
second condition is not satisfied, it will continue to receive the next data.
This is because streaming data cannot be paused to prevent data from being
missed by the MAAD algorithm. This is then followed by a second thread.
This is shown in figure 3.

Fig. 3 Second Thread Architecture

Based on figure 3, the second thread involves the AAD process
developed by (Gu and Angelov 2017). AAD performs anomaly detection
which starts with Chebyshev inequality. In Chebyshev inequality, there are
two processes conducted, which are data density and local data density. This
processes can also run parallelly (Angelov and Gu 2017) as each process is
independent. From these two processes, less influenced data are chosen to
be potential anomalies. Then, all other non-potential anomalies will be
declared as normal by the algorithm. As a result, MAAD becomes
computationally efficient, since it only focuses on potential anomaly data
rather than the whole data, which then will be confirmed as anomaly in the
next process. After that, a potential anomaly declared will enter a second
stage named offline autonomous data partitioning (Angelov & Gu 2017; Gu
et al., 2018). There are three types of autonomous data partitioning approach
proposed in (Angelov & Gu 2017). But only one is chosen to be suited in
AAD algorithm (Gu & Angelov 2017). In this stage, potential anomalies
will be clustered. This clustering is also an autonomous algorithm which
runs based on empirical observation of potential anomaly data. The final
product of the clustering will be a centre of a cluster, which then consists of
a number of members. Hypothetically, any cluster with a low number of
members will be considered anomalous, since the cluster has less influence
than available data. Then, the centre of each cluster enters the Voronoi
tessellation phase where data clouds are formed. The data clouds will look
like Voronoi tessellation which is not circle shape. Finally, is the anomaly
detection phase, which will declare any data cloud that contains less support
as a confirmed anomaly. Support means the number of data a cluster holds.
Therefore, MAAD uses both the first thread and second thread to detect

anomaly data asynchronously. Furthermore, there are also system interfaces
built in this research. Figure 4 shows the system interface (Iqbal Basheer et
al. 2021).

Fig. 4 System Interface
There are six sections in the system interface (see figure 4). The first
section displayed the current detected anomaly. This will be calculated after
AAD had finished its process or at the end of second threads process. Then,
total differences detected, and state change will display the number of
anomaly and normal data detected as well as number of data transitions
between normal and anomaly respectively. This will be held in the first
thread, after confirming that the second thread process is completed. Then,
there is a section displaying the total number of anomalies detected. The
fifth section consists of X, Y and Z which represent attributes contained in
the data. This system interface is built specially for three-dimensional data.
Next, a three-dimensional scatter plot graph displays the scatter point of the
data. In this graph, blue represents normal data followed by red as anomaly
data. The current result button will show current result based on data that
had entered MAAD before. The displayed data can be saved in any format,
such as comma separated value (CSV), which can be used to analysed in the
future. Finally, the terminate button will end the MAAD process. After a

while, a data sheet will appear showing the results of data entered into
MAAD. This data can also be saved in any format for future analysis.

4. Discussion

Based on the proposed algorithm in this paper, MAAD has been
improved to worked in streaming data without communication pipeline. The
proposed MAAD will not publish any data-to-data source. Data sources will
also not subscribe any data from the MAAD algorithm. As a result,
independent relationship can be concluded between MAAD and the data
source. It will eventually suppress the previous anomaly detection algorithm
in streaming data proposed in (Iqbal Basheer et al. 2021). Previously,
MAAD was tested in terms of speed. Reportedly, MAAD suppressed
previously invented AAD which work only in offline data. Furthermore, it
is also known that AAD can only work when the number of data points is
greater than 27 when it is online. But the main crisis is the communication
pipeline which can be simplified to make it much faster. Then this paper
enhances the previous MAAD by eliminating pipeline architecture. As a
result, MAAD is successfully build by eliminating robust communication
pipeline in (Iqbal Basheer et al. 2021). It is believed to be faster in
processing streaming data rather than previous MAAD and AAD.
Furthermore, MAAD was build based on the empirical data analysis (EDA)
mechanism proposed in (Angelov et al., 2017). EDA used empirical
observation of data which does not assume the distribution of data. This
means that there are no thresholds or assumptions in the proposed algorithm.
This is good since as mentioned, data in streaming environment may depend
on others and may exist dynamically which is totally unpredictable of its
existence. Therefore, a line cannot be simply drawn to differentiate between
normal and anomaly data in streaming environment.

5. Conclusion

In conclusion, this paper presents MAAD algorithms which are
improved from (Iqbal Basheer et al., 2021). Previous MAAD was also
enhanced from AAD (Gu & Angelov 2017). AAD was said to have a
recursive mechanism which consumes small memory as well as being
computationally efficient. This is because, recursive mechanism will reuse
previously stored variable without accumulating it over the time.
Furthermore, the proposed MAAD algorithm in this paper uses AAD and

EDA mechanisms. EDA does not make any assumptions on distribution of
data and works based on empirical observation of data. In addition, the

proposed MAAD is completely autonomous and does not require human

intervention to process. This algorithm has the potential to spur future
researchers to use parallel processing when involving streaming data

processing.

6. Acknowledgements

The authors would like to express gratitude to the Ministry of
Higher Education for the FRGS-Racer Research Grant
(RACER/1/2019/ICT02/UITM//4) and Faculty of Computer and
Mathematical Sciences, Universiti Teknologi MARA for all the given
support.

7. References

Angelov P, Gu X (2017) Empirical Approach to Machine Learning

Angelov P, Gu X, Kangin D, Principe J (2017) Empirical Data Analysis.

2016 IEEE Int Conf Syst Man, Cybern SMC 2016

Bomatpalli T, Vemulkar GJ (2016) Blending IoT and Big Data Analytics.

Int J Eng Sci Res Technol 5:192–196.

https://doi.org/10.5281/zenodo.48868

Gu X, Angelov P (2017) Autonomous anomaly detection. IEEE Conf Evol

Adapt Intell Syst 2017-May:1–8.

https://doi.org/10.1109/EAIS.2017.7954831

Gu X, Angelov PP, Príncipe JC (2018) A method for autonomous data

partitioning. Inf Sci (Ny) 460–461:65–82.

https://doi.org/10.1016/j.ins.2018.05.030

Iqbal Basheer MY, Ali AM, Abdul Hamid NH, et al (2021) Detecting

Anomaly in IoT Devices using Multi-Threaded Autonomous

Anomaly Detection. 2021 4th Int Symp Agents, Multi-Agent Syst

Robot 111–118.

https://doi.org/10.1109/isamsr53229.2021.9567894

Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online

anomaly detection over Big Data streams. Proc - 2015 IEEE Int

Conf Big Data, IEEE Big Data 2015 1113–1122.

https://doi.org/10.1109/BigData.2015.7363865

Tellis VM, D’Souza DJ (2018) Detecting Anomalies in Data Stream Using

Efficient Techniques: A Review. 2018 Int Conf Control Power,
Commun Comput Technol ICCPCCT 2018 296–298.
https://doi.org/10.1109/ICCPCCT.2018.8574310
Wang X, Mohd Ali A, Angelov P (2017) Gender and age classification of
human faces for automatic detection of anomalous human
behaviour. 2017 3rd IEEE Int Conf Cybern CYBCONF 2017 - Proc.
https://doi.org/10.1109/CYBConf.2017.7985780

Malware Detection for Window Registry using
Machine Learning

Nurwahida Binti Misran1, Kamarul Ariffin Bin Abdul Basit2

Faculty of Computer And Mathematical Sciences,
Universiti Teknologi MARA,

40450 Shah Alam, Selangor, Malaysia
[email protected], [email protected]

Abstract: Nowadays, as the number of malware variations and new types
of malwares grow, it is increasingly difficult to identify and respond to
malware. The Windows operating system has traditionally been the most
attractive target for malware developers. Malware often creates a key in the
registry to run the malware every time the computer starts. It infects a
computer system and uses a lot of computer memory to run the program.
Also, Windows slows down and uses a lot of computer memory to run the
program. Malware evolves and becomes more sophisticated to avoid
detection. One method for detecting malware on Windows is to employ a
machine learning approach based on behavior. This technique utilizes
training to assist the malware detection system in identifying new variants
of previously undiscovered malware. A machine learning algorithm
provides more options and space for developing an accurate model by
considering the characteristics of benign and malware samples. This paper
performs malware detection for Windows Registry using machine learning.
BayesNet, K-Nearest Neighbour (KNN), and J48 were employed as
machine learning classifiers in this work to detect malware. Additionally,
this study compares the experimental results of three machine learning
classifications in terms of accuracy rate and false positive rate (FPR). After
analyzing the performance of three different classifiers, it was concluded
that the K-Nearest Neighbour classifier performed the best form of malware
detection, with a 97.59 per cent accuracy and 0.041 false positive rate.

Keywords: Detection, Dynamic Analysis, Malware, Machine learning,
Window registry,

1. Introduction

In 2021, the new malware development for Windows would be
72.91M and this will continue to increase (av-test, 2021). New malware has
become a threat because of the mass amount of data and files. However,
current malicious software has become hard to detect due to its
characteristics of being built to avoid detection (Granneman, 2013).
Therefore, to overcome this problem, the idea of Malware Detection for the
Windows Registry File Using Machine Learning is proposed in order to
provide better detection of malware for Windows. Malware can slow down
information systems gradually, encrypt files, and make a system unusable.
The development of malware detection and analysis techniques is crucial to
reducing potential security vulnerabilities in information systems. Malware
detection refers to the process of inspecting a computer and its files for
malware. It is effective at detecting malware because it employs a variety of
tools and methods. Today, there are two basic techniques used in malware
detection and analysis (Jaswinder Singh & Jagsir Singh, 2020). These are
the machine learning and traditional detection methods. Most machine
learning algorithms’ performance depends on the accuracy of the extracted
features (Galal, 2015). Decision Tree (DT), K-Nearest Neighbor (KNN),
Naïve Bayes (NB), and Support Vector Machine (SVM) are the most
common machine learning algorithm used in malware detection (Zhaowen
Lin et al., 2019). Operating System (OS) and program configuration
information are stored in the Windows registry. To improve performance
and store application information, the registry was designed as a hierarchical
database of information. According to Sikorski and Honig (2012) malware
often uses the registry for configuration data by adding entries into the
registry that will allow it to run automatically when the computer boots. The
paper starts with a brief introduction. This is followed by Section 2
describing malware detection. Section 3 describes behavior monitoring. The
paper concludes in Section 4.

2. Malware Detection

Fig. 1 Malware Detection using Machine Learning
Fig. 1 shows malware detection using machine learning. The scope
of this study only focuses on malware detection in the Windows registry file
using machine learning. This study is limited to the extraction of behavioral
features which is registry changes. Malware samples can be downloaded
from virusshare.com. Then the samples are analyzed using online malware
analysis tools, ANY.RUN. The extracted features are used to train using the
malware detection tools. Next, classify these features into main categories
which are registry activities. This figure represents all the possible ways to
develop behavior-based malware classifiers using dynamic features. In this
section, all the presented behavior-based techniques have been proposed
using this architecture.

3. Behavior Monitoring
This process is done by submitting each and every sample to a free-

online automatic dynamic analysis service of ANY.RUN, which is an online
sandbox. It has been utilized in this study to monitor virus behavior and

collect important data. In the behaviour-based approach, the detection of
malware is done on the basis of malicious activities performed by malware
during execution. Binary submission and execution of ANY.RUN result in
the generation of a report file. All the CSV report files were parsed to select
the most relevant and important attribute values (feature selection). The
feature selection is on registry keys. Registry keys is a run-time feature
which can provide significant information to develop more accurate
malware detection. From a report of registry keys generated by ANY.RUN,
which contains all the attribute values that were analyzed and selected. The
term was manually keyed into the CSV file. Each registry key operation and
inserted values of registry keys were recorded. Example of registry keys are
regkey_read, regkey_opened, regkey_written, and regkey_deleted. The next
step is to conduct learning and classification based on the CSV files.
Machine learning techniques were applied for the learning and classification
of the CSV files. The tests and experiments were conducted using WEKA
for the Windows OS version. For malware detection using machine learning,
the analysis is done by selecting three test options available in Weka
Explorer. The selected Test Options are ‘Use Training Set’, ‘Cross
Validation’, and ‘Percentage Split’. Each of these test options gives a
different result. Thus, the test option plays major role in determine the
accuracy rate of the classifier. The significance of using different test
options in Weka is to avoid overfitting and to effectively evaluate the model
performance. After analyses using the test option, the highest accuracy rate
is used as the final analysis result. These tests and experiments were
conducted based upon a binary-weight vector model with feature selection.
Each data set was applied to three different classifier algorithms, which were
K-Nearest Neighbour (KNN), BayesNet, and J48. Fig. 2 shows the data
extraction process overview.

Fig. 2 Data Extraction Process Overview

3.1 Discussion

For malware detection using machine learning, analysis was
performed by selecting the three test options available in Weka Explorer.
Each of these test options give a different result. Thus, test option plays

major roles in determine the accuracy rate of the classifier. The significance
of using different test options in Weka are to avoid overfitting and to
effectively evaluate the model performance. Thus, the highest accuracy rate
will be used as the final analysis result. Model classification refers to
predicting the class of given data points. As for malware detection, it can be
identified the dataset as a classification of malware or benign.

In this study, three model classifications, namely J48, BayesNet and
KNN, are used. It is important to use model classification because to
correctly identify whether the sample is malware or benign. From here, we
can analyze which classifier offers better accuracy in malware detection.

Fig. 3 shows the result of accuracy rate. From the graph K-Nearest
Neighbour has the highest value for accuracy rate which is 97.59%,
followed by J48 where the accuracy rate is 96.22% and BayesNet is 94.85%.
Figure 4 shows the value of false positive rate when using training set. For
this study, it is found that the lowest false positive rate is 0.041. The highest
false positive rate is J48 classifier with the value of 0.077. The results of
ROC area are shown in Fig. 5. From the figure, the highest classifier of ROC
area is K-Nearest Neighbor with a value of 0.999. The lowest classifier of
ROC area is J48, with a value of 0.948.

J48 BayesNet KNN

98 97.59

97.5

97
96.5 96.22

96

95.5 94.85
95

94.5

94

93.5

93

Fig. 3 Accuracy Rate Result of Selected Classifier

J48 BayesNet KNN

0.09
0.08 0.077
0.07 0.065
0.06
0.05 0.041
0.04
0.03
0.02
0.01

0

Fig. 4 False Positive Rate Result of Selected Classifier

J48 BayesNet KNN

1.01 0.999
1

0.99 0.98
0.98
0.97
0.96
0.95 0.948
0.94
0.93
0.92

Fig. 5 ROC Area Result of Selected Classifier

After analyzing the performance of three different classifiers, it was
concluded that K-Nearest neighbor classifier performed the best form
malware detection with a 97.59 percent accuracy, a 0.04 False Positive Rate,
and 0.99 ROC area value.

This study pertains to the Windows registry. This study is limited in
terms of behavioral feature extraction and focuses exclusively on registry

updates. The study's weakness is that it examined a diverse set of malware
samples from several malware families to ascertain their characteristics and
damaged registry path. As a result, several malware samples did not self-
install in the registry. Additionally, the sandbox used in this study is open
source and free, with capabilities available for analyzing malware and
benign samples limited to 60 seconds, and Windows 7 has been used
throughout the procedure. The other constraint is primarily concerned with
data-related issues, such as big volume and format conversion. Additionally,
this study focuses exclusively on three types of machine learning
classification: Decision Tree (DT), K-Nearest Neighbour (KNN), and
Support Vector Machine (SVM).

4. Conclusion

The purpose of this research is to offer a method for detecting
malware in the Windows Registry using machine learning. We developed
this technique by setting up a dynamic analysis environment and running
malware and benign samples through ANY.RUN, an interactive internet
sandbox. Following that, behavior actions associated with registry
modifications were retrieved. The extracted features were entered into a
CSV file and then sent to Weka Explorer to determine the malware detection
accuracy rate. To conclude, K-Nearest Neighbor classifier gives better
performance for malware detection in the Windows registry using machine
learning.

5. References

av-test. (2021). av-test.org. Retrieved from https://www.av-test.org:

https://www.av-test.org/en/statistics/malware/

Galal, H. S. (2015, June). Behavior-based features model for malware

detection. Retrieved from https://www.researchgate.net/:

https://www.researchgate.net/publication/277667123_Behavior-

based_features_model_for_malware_detection

Granneman, J. (2013, February). https://www.techtarget.com. Retrieved

from www.techtarget.com:

https://www.techtarget.com/searchsecurity/feature/Antivirus-

evasion-techniques-show-ease-in-avoiding-antivirus-detection

Singh, J. & Singh, J. (2020). A Survey on Machine Learning-Based
Malware Detection in Executable Files. Journal of System
Architecture.

Sikorski, M. & Honig, A. (2012). Practical Malware Analysis. San
Fransisco, CA: NO Starch Press, Inc.

Xiao, F., Zhaowen Lin, Yi Sun , & Yan Ma. (2019, January 21). Malware
Detection Based on Deep Learning of Behavior Graphs. Beijing,
China.

Company Bankruptcy Prediction Using
Supervised Learning Techniques

Amin Iman Muhd Jelaini1, Azlin Ahmad2

1,2Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
40450 Shah Alam, Selangor, Malaysia

[email protected], [email protected]

Abstract: Company bankruptcy is an event that all businesses will try to
avoid. Company bankruptcy can be defined as the inability of businesses to
pay their liabilities to creditors. A complex economy further increased the
complexity of sustaining companies and institutions, including the well-
developed ones. If a recession happens, small-medium enterprises (SMEs)
and new entry businesses may face a risk of bankruptcy. Thus, the objective
of this study is to identify the significant features in predicting company
bankruptcy. Specific machine learning techniques were used and compared
to select the best model for this study. The models used were XGBoost,
AdaBoost, Ridge Regression, and LASSO Regression. This research is
based on a Taiwanese dataset company that got through Kaggle, with 95
independent variables and one target binary variable. AdaBoost was found
to be the best model for the analysis for its consistency in all performance
measurements. The performance measurements score for AdaBoost on the
accuracy, sensitivity, specificity, and precision were in the range of 94% up
to 99%, with the highest score on sensitivity in training data (99.45%).
Several profitability and solvency variables majorities were chosen as
significant variables. Thus, profitability and solvency become essential in
the financial area since the variables would affect any future planning in
companies for further certainty or uncertain events.

Keywords: Company Bankruptcy, Extreme Gradient Boosting, Adaptive
Boosting, Ridge Regression, LASSO Regression.

1. Introduction

Bankruptcy is defined by Investopedia as the inability of some
businesses or individuals to pay their liabilities to creditors (Tuovila &
Brock, n.d.). Due to this inability, creditors can liquidate the business or
individual assets to repay the debts. To put it simply, bankruptcy is an entity
that is deeply in debt and is unable to meet its obligations. Thus, all of their
assets may be frozen pending the next calculation or measurement outcome,
which may result in asset liquidation. Every country has its rules on how an
individual or company can be declared bankrupt. The popular term used in
most countries is credit scores. These scores act as indicators to determine if
an individual or business can be declared bankrupt.

The sustainability of companies, even the well-developed ones, at
times can be difficult. While their security can be well-planned, uncertain
events such as COVID-19 or economic recession could overturn a situation.
The financial aspect proved more challenging when the economy becomes
more complex nowadays (Zoriˇ et al., 2019). Unplanned finances towards
uncertainty could lead to bankruptcy. On the other hand, efficiency in
company management can flourish a country’s economic development. One
study by Rahayu (2020) found that a company’s bankruptcy can harm a
country’s economic growth. If most companies in the country attain marginal
profitability, fewer are able to pay taxes at high rates. Consequently, the
country’s income will decrease, thus hampering economic growth. For the
companies to expand their business, taking loans to secure funds becomes an
option, but unfortunately banks have to decide if the company can pay the
loan. Having an investor is another option, yet to investors, many
prospectuses need to be analysed to assess a company’s future profitability.
Thus, it becomes the responsibility of companies to generate high profits that
can lead to more dividends for the investors.

Without uncertainty, not many companies will face bankruptcy.
Anticipating a bankrupt state can prevent any disturbance that would affect
business activities (Buzgurescu & Elena, 2020). Therefore, good
management will strive to ensure that their businesses will run smoothly and
generate high profits. Data on companies that went bankrupt are rare, and the
numbers of companies that experienced bankruptcy may be at a minimum
percentage. The same constraint was faced in this study, which found low
numbers of bankrupt companies compared to non-bankrupt companies.

In this study, company prediction was performed using the following
supervised machine learnings: Extreme Gradient Boosting (XGBoost),
Adaptive Boosting (AdaBoost), Ridge Regression, and Least Absolute
Shrinkage and Selection Operator (LASSO) Regression. The analysis was
performed with high numbers of independent variables. A balanced concept
on target variables was implemented to observe the change in the
performances between unbalanced data by using the Synthetic Minority
Oversampling Technique, SMOTE. Since the number of independent
variables was high, features selection was used to compare the performance
of prediction. Through supervised machine learning, important or significant
variables were identified, and several variables were selected as essential
features for the prediction.

2. Literature Review

Numerous studies are cited in Altman (1968) as a paper that
prompted the interest of previous researchers to develop or expand
knowledge of company or corporate bankruptcy. Previous researchers
(Ptak-chmielewska, 2021; Perboli & Arabnezhad, 2021; Hosaka, 2018)
have coined the concept of a financial ratio and discriminant analysis as the
required prior understanding in broadening bankruptcy knowledge.
According to Altman (1968), the classic concept is to determine the z-score
by combining financial ratios. This method helps to analyze the financial
ability of certain companies or corporate sectors. While considerably
backdated, Altman’s study has pioneered subsequent studies on the subject.
More recent studies have also employed the main concept of discriminant
analysis. Vel & Zala (2019) demonstrated how these models are still
working in their analysis. However, they also noted that the weakness of the
method is that it cannot guarantee that a certain range of z-score can indicate
that a company could stay in the market or be bankrupt.

Many methods or models have been used in previous studies.
Obtaining the best performance prediction necessitates a comparison of
different types of machine learning. Unlike a single method, every model or
method has its advantages and disadvantages. For example, one study by
Rahayu (2020) compared 13 methods to obtain the best methods. Another
study (Aliaj et al., 2018) used nine methods (Naïve classifier, Multinomial
Bayesian classifier, Logistic Regression, Gradient Boosting, Random Forest,
Decision Tree, Bagging, AdaBoost, and combination methods based on
multiple classifiers). Other studies used ensemble terms as a combination of

methods or machine learning models. Kim & Upneja (2021) just ensembled
three methods with DT, NN, and SVM.

Some previous works used the hybrid concept to increase
performance. This hybrid is different from an ensemble because the
ensemble is a combination of certain numbers of models, while the hybrid is
the handling method of imbalanced data (Le et al., 2019). In countering the
imbalanced data in their work, Le et al. (2019) used a combination or hybrid
between the under-sampling and oversampling concepts. The hybrid could
minimize both sampling disadvantages, and therefore, could be considered
to be applied in this project as it can be used other than under-oversampling.
By combining the higher accuracy classifier, CBoost, and SMOTE-ENN,
had achieved 87.1% of AUC score. However, in Le et al. (2019) only used
five different models. The work of Tumpach et al. (2020) used SMOTE
algorithms instead of under-oversampling. SMOTE was used merely on a
training set and not for validation or testing of the set of data. From both
papers (Le et al., 2019; Tumpach et al., 2020) that handle imbalanced data
with certain algorithms, the performances increase up to more than 80% of
accuracy.

On the other side, there is a study by Zoriˇ et al. (2019) which not
only used balanced data algorithms to handle their imbalance but also the
settings of the threshold for each model. This measure contradicts with
previous studies that chose to handle imbalanced with algorithms.
Nevertheless, the performances still got good results with prediction score
from 76% up to 96% by using one-class Least-Squares Anomaly Detection,
LSAD (Zoriˇ et al., 2019). In other researchers’ analysis on prediction, they
combined structured data and textual data to carry out a strong analysis and
good interpretation for decision making (Mai et al., 2018).

3. Methodology

Dataset was obtained from the open-source Kaggle (retrieved from
https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-
prediction). The dataset was compiled from the 1999 to 2009 issues of the
Taiwan Economic Journal. Corporate regulations of the Taiwan Stock
Exchange were used to define company bankruptcy. The dataset contained
6,819 records and included 95 independent variables or features, and one
label variable (which can be referred to in the Appendix). According to

Liang et al. (2016), the 95 independent variables can be divided into seven
(7) financial ratio categories.

For data preparation, the feature selection stage was used to reduce
the number of variables, since the dataset has a high number of variables
and SMOTE was used to balance the imbalanced dataset for the target
variables. The data preparation was divided into four categories: 1. dataset
without SMOTE and features selection, also known as original dataset; 2.
using SMOTE without features selection; 3. using features selection without
SMOTE, and lastly; 4. using both SMOTE and feature selection. Then, the
data were separated into two partitions known as training and testing
datasets. Features selection was based on Ridge Regression, and LASSO
Regression were selected for its ability to run through features selection
library on Python programming. Seventy percent of the data were chosen
randomly for training, and the remaining was used for testing. This
procedure was necessary because the training data were used to fit the
model, whereas the testing data were used to assess the model’s
effectiveness.

The primary objective of this study is to develop a good prediction
model for predicting company bankruptcy based on large financial data. The
data modeling prediction techniques suggested are Lasso Regression, Ridge
Regression, AdaBoost, and XG Boost. The subsequent step was to build
prediction models in Python. The Python library sci-kit-learn was used to
retrieve all four machine learning algorithms. Ridge Regression is one of the
families of regression. This technique refers to the concept of Ordinary Least
Square (OLS). From the concept of geometric understanding, Ridge
Regression uses Contours and OLS estimates. According to Rachmawati,
Sari, & Yohanes (2021), the LASSO method can shrink some parameters of
the coefficient and use them for selected variables. The shrinkage then will
go towards a central point such as the value of the mean. Subsequently, the
Lasso regression method produces L1 regularisation, which imposes a
penalty equal to the absolute value of the magnitude of the coefficients in
the regression. AdaBoost is an abbreviation for Adaptive Boosting, one of
the well-known algorithms that are based on the concept of Ensembles
Schemes. According to Wang & Sun (2021), AdaBoost iteratively prepares
a set of weak classifiers on the weighted data to improve their performance.
The outputs of all weak classifiers were then combined into a weighted
summation, which was then used to present the final boosted classifier in
the final step. In Ma et al. (2021), EXBoost is referred to as a combination
of multiple or other machine learning techniques to have better results in

terms of performance. The algorithms of XGBoost are already designed to
become more efficient and flexible, as well as able to provide the result from
the best-integrated decision tree techniques. Although AdaBoost and
XGBoost are derived from the same concept with its boosting concept,
Adaboost cannot be optimized in terms of speed compared to XGBoost
(Nikulski, n.d.).

In this paper, the best result of performances for every machine
learning technique will be shown. The machine learning model used were
XGBoost, AdaBoost, Ridge Regression, and LASSO Regression. All
findings derived from the execution were analyzed and compared using
confusion matrix performance measurements which are accuracy,
sensitivity, specificity, precision, and error rate.

4. Results
The main goal of the study is to determine which machine learning

techniques have the best performance measurements score and, through that,
find significant variables in predicting the outcome. The results derived
could explain the output of SMOTE on an imbalanced dataset, the best result
performances on every machine learning technique, and the essential
features by using python programming.

Fig. 1 Comparison Between Imbalanced and Balanced data

As shown in Fig. 1, by using SMOTE, the records of bankrupt
companies increased drastically to the same frequency as the records for

non-bankrupt companies. As mentioned in Tumpach et al. (2020), SMOTE
should demonstrate a way better than undersampling techniques because

undersampling would decrease the frequency of majority records. Thus, it

helps for the researcher to compare with the dataset categories to have better
performances in terms of confusion metrics.

Table 1. Best performances on every model, including its criteria

Performance/ XGBoost AdaBoost Ridge LASSO

Model Regression Regression

(criteria) (With (With (With (With
SMOTE, SMOTE, SMOTE, SMOTE,
Training No No No No
Accuracy Features Features Features Features
Sensitivity Selection) Selection) Selection) Selection)
Specificity
Precision 97.5861 98.3654 88.6014 63.3470
Error 98.6415 99.4522 90.7537 60.0570
96.5554 97.3042 86.4998 66.5597
96.5473 97.2990 86.7798 63.6849
2.4139 1.6346 11.3986 36.6530

Testing 96.5909 96.4394 89.6970 63.4848
Accuracy 97.9361 98.2801 92.1376 60.6880
Sensitivity 95.1688 94.4935 87.1169 66.4416
Specificity 95.5417 94.9668 88.3184 65.6566
Precision 3.4091 3.5606 10.3030 36.5152
Error

Table 1 shows that all the criteria generally have similar
characteristics with the need for SMOTE to have a consistent performance
on all score measurements. LASSO regression was the worst technique
because all the performance scores recorded were unsatisfactory compared
to the scores for the other three machine learning techniques. AdaBoost
recorded the best score by focusing on the accuracy in training data
(98.37%) and XGBoost at testing data (96.59%), respectively. Both
AdaBoost and XGBoost recorded high accuracy when compared with the
Ridge regression.

In terms of sensitivity, AdaBoost conquered the training and testing
data scores with 99.45% and 98.28%, respectively. This finding indicates

that AdaBoost is excellent for predicting a company’s bankruptcy. For
specificity and precision, AdaBoost received a slightly higher score in
training data than XGBoost. The testing data for XGBoost also recorded
better specificity and precision among the three representatives. Thus, the
best modeling technique for predicting a company’s bankruptcy is
AdaBoost, with the dataset needing SMOTE algorithms, which is a balanced
dataset. For analysing variables, the importance of AdaBoost features with
the SMOTE criteria would be used for further explanation. Based on criteria
for a dataset with the aid of SMOTE, the performance of other than accuracy
can be increased to more than 80% for all measurements. From here, the
performance measurement can be an optimal balance among all confusion
matrix measurements (Le et al., 2019).

Fig. 2 variables for features importance in AdaBoost using an original
dataset

Fig. 2 shows that the non-industry income and expenditure/revenue
come from the profitability variable category, the most important variable
to consider. This variable recorded the highest score of importance (0.12),
followed by operating profit per person and borrowing dependency (shared
score of 0.1). The second and third variables were solvency and other
category variables for borrowing dependency and operating profit per
person. Income to total assets and persistent EPS in the last four seasons
came from the category profitability variable. The sixth place is from the
cash flow variable with is cash flow to equity. ROA(C) before interest and
depreciation before interest until equity to liability received the same
features importance value of 0.04. The 17th until 20th positions are the last
essential variables, receiving only a score of 0.02.

To summarize, out of 20 important variables shown in Figure 2,
seven were selected from the profitability category, followed by solvency

with six variables. For category capital structure, turnover and others only
had two and cash flow, only one variable chosen. This finding suggests that
profitability-and-solvency is the financial category requiring attention.
These findings correspond to Hu et al. (2020) and Liang et al. (2016) who
also found that the profitability-and-solvency category constitute the
essential variables in predicting a company’s bankruptcy. The findings,
therefore, have answered the objective to identify significant features for
this study.

5. Conclusion and Recommendation

This study focused on utilizing gradient boosting (XGBoost and
AdaBoost) and two machine learning models from families regression
(Ridge and LASSO) to predict company bankruptcy on a high number of
variables. By comparing the performances on every type of data preparation
with the selected machine learning model, the best model was chosen based
on high and consistent scores among the performance measurements.
AdaBoost was not chosen as the best and most efficient model not only due
on accuracy but also due to sensitivity, precision, and specificity. The
precision recorded a high score level on both the training and testing data.
Thus, the AdaBoost model could be used for other datasets.

Although the variables number was high, the datasets were
categorized by certain financial terms. The essential features were obtained
by executing the importance features library on the best machine learning.
From this point, profitability and solvency were important since many of the
significant variables were from that category. Boosting recorded excellent
performance for the machine learning model, and AdaBoost was selected as
the best machine learning technique. The study, therefore, could help
bankruptcy researchers and financial analysts understand more prediction
concepts, particularly companies’ bankruptcy. Since this study used 95
independent variables, it becomes difficult to analyze more advanced
statistics for every variable, including correlation on every independent
variable. Future researchers, therefore, may focus on including additional
criteria.

Since this study used only four different machine learning models,
other researchers may execute more advanced machine learning in the
future. Analysis of company bankruptcy could be challenging, particularly
on uncertain events. Thus, new data can be generated in the future when

uncertainty happens. Such can be an opportunity for researchers to predict
more on companies’ bankruptcy.

6. References

Aliaj, T., Anagnostopoulos, A., & Piersanti, S. (2018). Firms Default

Prediction with Machine Learning.

Altman, E. (1968). Financial Ratios, Discriminant Analysis And The

Prediction Of Corporate Bankruptcy. Xxiii.

Buzgurescu, O. L. P., & Elena, N. (2020). Bankruptcy Risk Prediction in

Assuring the Financial Performance of Romanian Industrial

Companies. 104, 19–28. https://doi.org/10.1108/s1569-

375920200000104003

Hosaka, T. (2018). Bankruptcy prediction using imaged financial ratios and

convolutional neural networks. Expert Systems With Applications.

https://doi.org/10.1016/j.eswa.2018.09.039

Hu, Y.-C., Jiang, P., Jiang, H., & Tsai, J.-F. (2020). Bankruptcy prediction

using multivariate grey prediction models. Grey Systems: Theory

and Application, 11(1), 46–62. https://doi.org/10.1108/gs-12-2019-

0067

Kim, S. Y., & Upneja, A. (2021). Majority voting ensemble with a decision

trees for business failure prediction during economic downturns.

Suma de Negocios, 6(2), 112–123.

https://doi.org/10.1016/j.jik.2021.01.001

Le, T., Vo, M. T., Vo, B., Lee, M. Y., & Baik, S. W. (2019). A Hybrid

Approach Using Oversampling Technique and Cost-Sensitive

Learning for Bankruptcy Prediction. 2019.

Liang, D., Lu, C. C., Tsai, C. F., & Shih, G. A. (2016). Financial ratios and

corporate governance indicators in bankruptcy prediction: A

comprehensive study. European Journal of Operational Research,

252(2), 561–572. https://doi.org/10.1016/j.ejor.2016.01.012

Ma, M., Zhao, G., He, B., Li, Q., Dong, H., Wang, S., & Wang, Z. (2021).

XGBoost-based method for flash flood risk assessment. Journal of

Hydrology, 598(April), 126382.

https://doi.org/10.1016/j.jhydrol.2021.126382

Mai, F., Tian, S., Lee, C., & Ma, L. (2018). Deep Learning Models for

Bankruptcy Prediction using Textual Disclosures. European Journal

of Operational Research. https://doi.org/10.1016/j.ejor.2018.10.024

Nikulski, J. (n.d.). The Ultimate Guide to AdaBoost, random forests and

XGBoos. Retrieved July 1, 2021, from

https://towardsdatascience.com/the-ultimate-guide-to-adaboost-

random-forests-and-xgboost-7f9327061c4f

Perboli, G., & Arabnezhad, E. (2021). A Machine Learning-based DSS for

mid and long-term company crisis prediction. Expert Systems With

Applications, 174(July 2020), 114758.

https://doi.org/10.1016/j.eswa.2021.114758

Ptak-chmielewska, A. (2021). Bankruptcy prediction of small- and medium-

sized enterprises in Poland based on the LDA and SVM methods.

22(1), 179–195. https://doi.org/10.21307/stattrans-2021-010

Rachmawati, R. N., Sari, A. C., & Yohanes. (2021). Lasso Regression for

Daily Rainfall Modeling at Citeko Station, Bogor, Indonesia.

Procedia Computer Science, 179(2020), 383–390.

https://doi.org/10.1016/j.procs.2021.01.020

Rahayu, D. S. (2020). Ensemble Learning in Predicting Financial Distress

of Indonesian Public Company.

Tumpach, M., Surovičová, A., Juhászová, Z., Marci, A., & Kubaščíková, Z.

(2020). Prediction of the bankruptcy of slovak companies using

neural networks with SMOTE. Ekonomicky Casopis, 68(10), 1021–

1039. https://doi.org/10.31577/ekoncas.2020.10.03

Tuovila, A., & Brock, T. (n.d.). Bankruptcy Definition. Retrieved June 26,

2021, from https://www.investopedia.com/terms/b/bankruptcy.asp

Vel, R., & Zala, P. (2019). Bankruptcy Prediction using Multivariate

Discriminant Analysis - Empirical Evidence from Cases Referred

to NCLT. 9, 13–17. https://doi.org/10.35940/ijitee.I7496.078919

Wang, W., & Sun, D. (2021). The improved AdaBoost algorithms for

imbalanced data classification. Information Sciences, 563, 358–

374. https://doi.org/10.1016/j.ins.2021.03.042

Zoriˇ, M., Gnip, P., Drotár, P., & Gazda, V. (2019). Bankruptcy prediction

for small- and medium-sized companies using severely imbalanced

datasets. April.

Review on the Evaluation of Digital Storytelling
Evaluation Model (DSEM) as a Learning Tool

Hayati Abd Rahman1*, Nor Izzati Nor Zamrai2

1, 2 Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
40450 Shah Alam, Selangor, Malaysia

[email protected]; [email protected]

Abstract: Digital storytelling has evolved as an effective tool for teaching
and learning involving teachers and students. It is a method that can be
applied in learning tools because it involves engagement and interaction.
Thus, it can be adopted as a learning tool in of the educational curriculum.
Most digital storytelling applications have been assessed based on
practically, availability and usability. There is lack of evaluation models to
evaluate digital storytelling as a learning tool. Digital Storytelling
Evaluation Model (DSEM) is a proposed model adapted from Kirkpatrick
Model and CIPP Model. It inherits some processes that can be used in
performing an evaluation on the digital storytelling as a learning tool. In this
study, the evaluation model is improvised, and some evaluation attributes
which suit the evaluation’s goal are presented.

Keywords: CIPP Model, Digital Storytelling, Digital Storytelling
Evaluation Model (DSEM), Kirkpatrick Model, Learning Tool

1. Introduction

Storytelling has traditionally been used as a pedagogical tool to
instil morals, love, and respect for other people’s cultures. In today’s
schools, how information is presented is just as important as the information
itself. When digital storytelling is utilised as a teaching tool, it will first
attract students’ attention, then stimulate their interest in learning and
transform them into active participants in the projects that they absorb via
the use of technological tools. Moreover, Sadik (2008) has reported that

students are encouraged to arrange and communicate their thoughts and
knowledge in a personal and meaningful way through digital narrative
narration. Creating a digital narrative is a creative process, because it mixes
traditional stories with audio recording, editing, the use of a video camera,
and individual PC technology (Ohler, 2013). Furthermore, digital
storytelling not only benefits students but the teachers. Teachers who are
more comfortable with technology will benefit more from the digital tale
narration process. Educators connected to technology will gain more
technical expertise because of this procedure (Seker, 2016). Many studies
have demonstrated that digital storytelling can help students improve their
skills in areas such as narration, writing, creativity, critical thinking, and
problem-solving (Wu et al., 2017).

The heart of making a digital story is writing the story, with
technology serving solely as a tool to help construct this story. According to
Gimeno-Sanz (2015), learning elements that can be extracted through digital
storytelling are pronunciation skills, speaking skills, listening skills, reading
comprehension, and writing skills. Moreover, students can utilise their
imagination and think about challenges from many viewpoints when taught
creatively and originally. Digital stories can increase student motivation to
read and write by allowing them to personalise the learning experience;
receive in-depth and meaningful reading experiences; and gain information
skills about the technical parts of the language.

Therefore, an evaluation model is required to evaluate whether
digital storytelling is suitable for learning tools. The issue is to find a suitable
evaluation model to evaluate digital storytelling as a learning tool. The
evaluation will be based on the evaluation attributes, such as effectiveness,
usability, learnability, and functionality. The following section will discuss
these evaluation attributes to evaluate digital storytelling as a learning tool
based on the evaluation attributes.

2. Related Work

Many studies have examined digital storytelling as a learning tool.
Thus, the evaluation methods used seem to be an important element in
measuring the effectiveness of storytelling in learning. Seker (2016), has
studied an evaluation using digital stories created for social studies in
teaching. He used three (3) types of evaluation, namely the digital story
evaluation scale, semi-structured interview form, and individual

performance grade, to evaluate teacher performance after using digital
storytelling.

The digital story evaluation scale was designed for the evaluation of
long-term semester projects. Seven (7) categories have been identified as
parameters: project’s purpose; quality of the text; quality of audio narration;
quality of the photos; use of proper digital images; the fitness of the software
used; and choice of appropriate content. Each category was given five (5)
points, ranging from no score to poor, fair, good, and excellent. For a semi-
structured interview, open-ended questions have been provided for the
students to get their opinion about the learning. Several open-ended
questions are asked to the teacher to evaluate his/her performance for
individual performance grades.

Wu et al. (2017) also evaluated to assess the usability of digital
storytelling as a teaching tool. The Digital Storytelling Teaching System-
University (DSTS-U) was developed to help students from creating the
stories with structural architecture and allowing the variation of the contents
by using different story structures. The usability evaluation focused on three
(3) dimensions: teaching environment assessment, the system features
evaluation, and self-story assessment. Three experts were invited to evaluate
DSTS-U usability.

For reflective learning, Jankins & Lonsdale (2007) has compared
Moon’s map learning and McDury and Alterio’s model. According to Seker
(2016), teachers and students are exposed to building digital storytelling and
then interviewed to gain feedback on the exposure in building digital
storytelling.

However, none of the existing research studies used an evaluation
model to evaluate digital storytelling as a learning tool. The authors
evaluated by using the evaluation attributes, either usability or effectiveness.
The evaluation model is important, acting as an outline for the evaluation
processes.

2.1 Existing Evaluation Models

Educational evaluation encompasses various activities, such as
participant assessment, programme evaluation, school staff evaluation,
school accreditation, and curriculum review. The term evaluation is

frequently used ambiguously in conjunction with other terms, such as
assessment and testing. Although assessment instruments, such as tests, can
be used in the evaluation, the term evaluation refers to the same thing as
assessment and testing (Anh, 2018). Several evaluation models are
commonly used in educational evaluation. Different evaluators will use
different evaluation models to fulfil the goal of their evaluation studies. The
existing evaluation model used in education will be described in detail to
see how each evaluation model varies.

2.1.1 Kirkpatrick Evaluation Model

Kirkpatrick assessment model has been widely recognised and
utilised since thereafter (Akbari et al., 2016). It has withstood intense
scrutiny and has become one of the most generally recognised and
influential models. Kirkpatrick developed a logical framework to analyse
outcomes and effects from individual and organisational performance
viewpoints (Reio et al., 2017). Kirkpatrick’s approach has been used for
evaluating technological communication goods and services, such as
assessing learning outcomes in higher education. In his approach,
Kirkpatrick offered four (4) levels of evaluation for training: response
criteria, learning criteria, behaviour criteria, and outcomes criteria.
Kirkpatrick’s four-level model has made important contributions to the
advancement of training assessment thought and practice (Alsalamah &
Callinan, 2021).

Fig. 1 Kirkpatrick Evaluation Model

Fig. 1 shows the Kirkpatrick Model, illustrating the four (4) levels
involved in conducting an evaluation. These levels are reaction, learning,
behaviour, and results.

Reaction

This level’s aim is straightforward. It examines how individuals react to the
training model by asking questions to solicit the learners’ ideas. It is

significant because favourable responses to a training session may motivate
participants to participate in subsequent programmes.

Learning

This level evaluates whether the participants gained the desired information,
skills, or attitudes because of their involvement in the training or
intervention (Paull et al., 2016). Learning measures are measurable markers
of learning that occurred after the training programme and are generally
measured by participants’ self-evaluation of their learning.

Behaviour

This level assesses how participants use their knowledge and skills in the
workplace (Alsalamah & Callinan, 2021). As a result, it is critical to
determine if the information, abilities, or attitudes gained in the programme
can be applied to the job or not.

Results

The results indicate if the training course fixed the existing problem and
aided in achieving corporate goals (Akbari et al., 2016). Level four,
commonly regarded as the programme’s major aim, assesses the overall
effectiveness of the training model by evaluating criteria.

2.1.2 CIPP Evaluation Model

The CIPP approach for curriculum assessment was introduced by
Daniel Stufflebeam (Aziz et al., 2018). This technique may be used to assess
the quality of instruction at school properly. The CIPP model is a continual
attempt to discover and fix errors in evaluation practice and a method to
create and test new processes for more successful practices (Anh, 2018). The
approach emphasises both summative and formative assessments. CIPP
evaluations are formative when collecting and reporting information for
improvements, but summative when assessing finished project or
programme activities or service performance (Anh, 2018).

Fig. 2 CIPP Evaluation Model (Poblete, 2014)

Fig. 2 shows the CIPP model consisting of four (4) steps: context,
input, process, and product. The diagram is also divided into two (2)
sections: formative and summative.

Context

Context relates to the school’s background, history, aims, and ambitions
(Aziz et al., 2018). Context evaluation is used to justify why a programme
or curriculum must be adopted.

Input

Inputs are the material and human resources required for the school to
function properly (Aziz et al., 2018). This assessment aims to give
information used to assess the resources used to achieve the programme’s
goals.

Process

The term process relates to the adoption of certain school procedures (Aziz
et al., 2018). The purpose of process evaluation is to offer feedback to the
individuals so that they may take responsibility for the actions of the
programme or curriculum.

Product

The term product refers to the outcomes obtained during or after the
conclusion of curriculum implementation (Bashri et al., 2020). Product is

the quality of the programme or a tool towards the participants, either they
can apply or not.

Formative evaluations are typically interim reports provided to
stakeholders and customers at various phases of an assessment project to
enlighten them. Formative evaluations are concerned with the programme’s
implementation and process phases. They are used to understand better the
school’s procedures, implementation, and operation, as well as to analyse
instructional materials, the structure of learning tasks, and courses for future
changes (Aziz et al., 2018). Formative evaluations are usually used before
a programme starts to test the knowledge and interest of the participants
towards the programme.

Summative assessment is used at the end of a programme to
determine the programme’s strengths and challenges after the curriculum
have been thoroughly created (Houston & Thompson, 2017). The judgement
on the merits of a completed product or programme is known as summative
evaluation. It essentially offers proof of good work or needed adjustments
and if the targeted goals are achieved (Aziz et al., 2018). This assessment is
usually used to test the understanding of the participants and whether they
have gained knowledge.

3. Method

The evaluation model plays an important role in outlining the
evaluation processes. This section discusses the evaluation model and
evaluation attributes for digital storytelling. The proposed evaluation model
for evaluating digital storytelling as a learning tool is an improvised model
derived from the Kirkpatrick Model and the CIPP Model consisting of four
(4) steps: input, knowledge application, feedback, and output. This
evaluation model will be used after the users experience the digital
storytelling application. Fig 3 shows the evaluation model for evaluating
digital storytelling as a learning tool. This model will be used as a guideline
in evaluating the usage of digital storytelling applications.

Fig. 3 Digital Storytelling Evaluation Model (DSEM)

3.1 Input

Input in the DSEM is to evaluate how the users understand the
application. It also tests the users’ reactions after experiencing the digital
storytelling application.

3.2 Knowledge Application

This step is to test users’ knowledge and see how they apply the
knowledge. In this step, the important part is to see the users’ understanding
level after learning through a digital storytelling application.

3.3 Feedback

This step is to know how the users feel about the digital storytelling
application. This is to ensure that the knowledge gained through the digital
storytelling application is useful for learning and study.

3.4 Output

This part is the result of the users after experiencing the digital
storytelling application. It is to declare whether the digital storytelling
application can be helpful as a learning tool.

Formative and summative evaluations will be implemented in this
evaluation model for courseware purposes. Formative evaluation will be
implemented at the first stage of the digital storytelling evaluation. The main
purpose of formative evaluation is to collect the information. It is used at the
input stage to evaluate the users’ reactions after using digital storytelling.

Thus, the information will be collected and compared with the summative
evaluation at the end of the digital storytelling evaluation model.

Summative evaluation will be implemented, starting from
knowledge application to the output of the evaluation model. This
evaluation is to investigate whether digital storytelling helps the users in
learning and studies. It is to see how the users experience digital storytelling
and finally align the results with the information gained from the formative
evaluation.

The evaluation model acts as the outline of the evaluation model.
However, the parameters or the rubrics to test the digital storytelling
application are based on the evaluation attributes. There are four (4)
evaluation attributes commonly used to evaluate the digital storytelling
application: usability, effectiveness, learnability, and reliability. These
evaluation attributes can be used for every step in the digital storytelling
application model.

3.5 Usability

Usability testing is used to evaluate the usefulness of a digital
storytelling system. Usability evaluation necessitates criteria addressing the
system’s product, service, process, or environment. The parameters used in
usability testing are related to awareness of the audience, the excitement of
the story, voice, intonation, the use of multimedia elements, fun elements,
interface, duration, and the aim of the story. Moreover, using the Likert
scale ranging from 1 to 4, each value has been described accordingly based
on the parameters.

3.6 Effectiveness

Effectiveness is the extent to which something succeeds in
producing the anticipated consequences. The usefulness and appropriateness
of the result are the primary goals of effectiveness since they have
immediate effects on the client (Wagner & Deissenboeck, 2019). According
to Wu et al. (2017), the parameters used in evaluating the effectiveness of
digital storytelling require the following 16 dimensions: collaborative
learning, creativity and innovation, multiple representations, motivation,
cultural sensitivity, gender equality, cognitive effort, learner control,


Click to View FlipBook Version