The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by azliza, 2022-09-13 20:00:12

E-proceeding PEERS'22

E-proceeding PEERS'22

Quayle, E. (2020). Prevention, disruption and deterrence of online child
sexual exploitation and abuse. ERA Forum, 21(3), 429-447.
doi:10.1007/s12027-020-00625-7

Schrepp, M., Hinderks, A., & Thomaschewski Jörg. (2017). Design and
Evaluation of a Short Version of the User Experience Questionnaire
(UEQ-S). International Journal of Interactive Multimedia and
Artificial Intelligence, 4(6), 103.
https://doi.org/10.9781/ijimai.2017.09.001

Subbarao, A., & Mahrin, M. N. (2020). Research Framework of Evaluation
Model to Assess the

Effectiveness of Coordination Processes in Global Software Development
Projects. Journal of Physics: Conference Series, 1529, 052064.
https://doi.org/10.1088/1742-6596/1529/5/052064

The Factors that Influence the Implementation
of Automation Technology Application Towards

Warehouse Productivity for Logistic Industry

Nur Ika Natasha Roslan1*, Fauziah Ahmad2, Nor Shahida Mohamad
Yusop3, Norjansalika Janom4

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
UiTM 40450 Shah Alam, Selangor, Malaysia

[email protected]*, [email protected],
[email protected], [email protected]

Abstract: The revolution of logistics 4.0 mainly focuses on integrating the
various types of automation technology to enhance the supply chain's
productivity and effectiveness. However, there is still inadequate
implementation of automation technologies among the logistics industry
players in the current warehouse productivity. Hence, the potential of
efficiency in tracking and tracing would not be achieved, thus delaying the
revolution of logistic 4.0. This study aimed to investigate the factors that
influence the implementation of automation technology applications toward
warehouse productivity in the logistic industry. This paper uses the
Technology Organization Environment (TOE) framework to organize and
conceptualize the factors that influence the implementation of automation
technologies. Systematic review methods are used to identify, analyze,
categorize, and map critical keywords to form a conceptual model to explain
the crucial factors that influence the implementation of automation
technologies for warehouse productivity in the logistic industry. There were
169 criteria from the 74 selected, filtered articles, and nine categorizations
of factors were considered. The exploration of the exhaustive overview of
digitalization in logistics and the identified influential factors would help the
logistic industry players strategize the implementation of the automation
technology better, hence enhancing their mission in achieving the revolution
of Logistic 4.0.

Keywords: Automation Technology, Industry 4.0, Logistic Industry, TOE
Model, Warehouse Productivity

1. Introduction

In general, in every company and particularly in the industry,
logistics and supply chain are among the most significant business aspects.
The facilitation of trade and transport is at the root of stimulating economic
development. Many nations have established detailed national strategies for
logistics. A precondition of national competitiveness is well-functioning
domestic and international logistics (Arvis et al., 2018). According to Amr
et al. (2019), there are four phases of the revolution of logistics 4.0. Logistics
1.0 focused on mechanizing transportation for the initial stage, where it
transformed the production process from manual to machining the transport
(Sternad et al., 2018). For Logistic 2.0, it was focused on handling the
importance of the mass output, particularly on automation of the cargo
(Eljazzar et al., 2018). It also focused on engaging in the managerial
coordination of processes instead of focusing on the operations alone. The
third Logistic 3.0 started when the first industrial robot was manufactured,
and it was known as a pliant flow that acts as an evolution in logistics.
Finally, the Logistic 4.0 is focused on the digitalization that integrates
various types of technology to improve the supply chain's productivity and
effectiveness, moving organizations' attention to the value chains,
enhancing the value provided to consumers by increasing competitiveness
levels (Tjahjono et al., 2017). This Logistic 4.0 is related to the 9th goal of
SDG, the 9th goal provides a roadmap for the achievement of the industrial
digital revolution, industrial automation for increased efficiency, process
and process innovation, and the creation of robust infrastructure and
facilities (McCollum et al., 2018).

However, in the current warehouse productivity towards logistics
4.0, there are still insufficient implementation of automation technologies
among the industry players. It seems the industry player are having
difficulties in implementing the automation technologies' effectiveness in
their areas. By not integrating the automation technology into the warehouse
activities, the potential of efficiency in tracking and tracing would not be
achieved, thus delaying the revolution logistic 4.0 (Türkeș et al., 2019; Otles
et al., 2019). Furthermore, in the current logistic industry, particularly in the
warehouses, not all have efficient access to goods, and optimizing
warehouse productivity is still crucial. Thus, existing warehouse resources
confront significant challenges in increasing utilization, expanding
efficiency, and lowering logistics costs (Hao et al., 2020).

This study aims to identify the factors that influence the
implementation of automation technology application towards warehouse
productivity for logistic industry. The influential factors are derived from
many aspects, such as in technological, environmental, and organizational
contexts. Thus, in this study, the Technology Organization Environmental
(TOE) framework model has been used to measure how adoption and
implementation technological innovations can be influenced by those three
contexts.

The remainder of this paper is organized as follows. Section 2
briefly explains the powerful automation technology that mainly influence
warehouse productivity and its' implementation factors in the logistic
industry. Section 3 will discuss the methodology used in this paper, where
systematic literature review method is used in identifying the automation
technology and its' influential factors through identification, analysis,
categorization, and mapping of critical keywords. The Technology
Organization Environment (TOE) framework would be used to organize and
conceptualize the factors that influence the implementation of automation
technologies. Finally, Section 4 will describe the output of this paper, which
is a conceptual model to explain the factors that influence the
implementation of automation technology application towards warehouse
productivity for logistic industry.

2. Logistic Industry in Supply Chain Management

According to Khairuddin et al. (2019), the demands of the
consumers have effectively increased, and therefore, the supply chain
management is getting the priorities to polish the logistics' development.
There is integrating, organizing, and controlling the flows of the raw
materials into the final products and distributing them to its consumers in
supply chain management. The logistics and supply chain plays an essential
role in any organization, especially in the industry (Amr et al., 2019). Supply
Chain Management (SCM) is a business theory that integrates and connects
the various operations, individuals, and services that rely heavily on the
starting point until the ending destination in the supply chain (Hendy et al.,
2020).

2.1 Automation Technology

Industry 4.0 focused on the revolution of traditional manufacturing
and integrated the supply chain industry (Oleśków-Szłapka et al. 2018;
Zifkovits et al. 2020; Moldabekova et al., 2020). After the emergence of
mechanization, electrification and computerization was introduced, it
followed with the increased number of digitization and automation of the
manufacturing industry. According to Ratnasingam et al. (2020), there is
expanding digitalization, particularly on the supply chain that applies the
interconnectedness with all the elements and the system based on real-time
data exchange. These automation and digitalized technologies give the high
possibility of enhancing sustainability efficiency in the logistic industries.
As Salmela et al. (2012) and Marthauer et al. (2019) mentioned, logistic
providers are commonly more into proceeding with adapting the existing
technologies rather than using the more innovative and advanced
technologies that will improve their performance comprehensively, such as
logistic automation and robotic technologies.

2.1.1 Logistic Automation Technology

The implementation of automation technologies improves the
processes to be quicker and more flexible, mainly because of the industrial
robots Unmanned Aerial Vehicles (UAV) and Autonomous Mobile Robots
(AMR). These robots could help e-commerce deliver and return the package
for the consumer in just a short time (Amiruddin et al., 2020). In the logistic
industry, these automation technologies will affect the supply chain
management as a whole and not only on the warehouses (Dekhne et al.,
2019). Such as the system for controlling the warehouse management, the
industrial robotics that has programmed flows of different tasks that focused
on the logistic needs and integrated automated systems for the essential part
of logistics. By tracking and synchronizing data between physical processes
and cyber computational space, IoT and Cyber-Physical Systems (CPS) will
solve conventional supply chains and logistics (Strandhagen et al., 2017;
Efthymiou et al., 2019). More automation technologies are giving a
tremendous impact to the logistics sectors, which are intelligent robots and
autonomous vehicles, RFID and Quick Response Codes, sensors and
conveyors, Cyber-Physical System (CPS), Robotic Mobile Fulfilment
Systems (RMFS) (Edirisuriya et al., 2018; Efthymiou et al., 2019; Boysen
et al., 2019; Lamballais et al., 2020). Most of the researchers agreed that
automation technology is playing a crucial role in the logistic industry.

2.1.2 Internet of Things (IoT)

According to Khairuddin et al. (2019), the Internet of Things (IoT)
can solve the existing problems with the tracing and tracking problems in
the logistic factors because of its feature of providing the real-time tracking
of their freight. The application of IoT in the industry, such as in logistics
and its supply chain, known as Industrial IoT (IIoT), can enhance
information management and operations (Lin et al., 2016). The IoT
application enhances innovation, production efficiency, customer service
and safety in logistic operational and supply chain management (Tu et al.,
2018; Goncharova et al., 2018; Ivankova et al., 2020).

The IoT plays a crucial role in future technologies, particularly in
the development of smart logistics, which will transform the whole
operation and architecture of the logistic system to a great extend
(Witkoswki et al., 2017). The applications of IoT in Logistic 4.0 and smart
logistics are found in the smart aspects of logistics transportation,
warehousing, unloading, loading, carrying, packaging, and distributing
(Song et al., 2020). The IoT-based technologies in this automation logistics
are RFID, mainly experts in identifying and capturing the important data of
logistic Wireless Sensor Network (WSN) for monitoring the smart
warehousing, Wireless Communication Technologies (Zhou et al., 2011;
Song et al., 2020).

2.1.3 Big Data

The capability of Big Data analysis is it helps in processing the huge
volume, velocity, and varieties of data to retrieve and extract the crucial
insights which can strategize the organizations to gain a competitive
advantage in the logistic industry (Wamba et al., 2015; Wamba et al., 2017).
According to Gravili et al. (2018), the features of Big Data could enhance
the impact on the process of decision-making in Supply Chain Management
(SCM). In focusing on the whole supply chain performance, comprehensive
data-driven decision-making is essential in improving the process,
managing the logistic procedures, and enhancing the inventory and its cost
management (Lamba et al., 2017). In big data technologies, there are several
logistic application areas which are in forecasting the demand, equipment
maintenance, setting the network and route planning and risk on supply
chain (Ghost et al., 2015; Ayed et al., 2015; Karim et al., 2016; Wang et
al., 2018; Song et al., 2020).

2.1.4 Robotic Process Automation (RPA)

Robotic Process Automation (RPA) is one of the new elements in
these logistic and transportation sectors. These robots are smart engines that
benefit businesses in cutting the numbers of human activities and adding
advanced expertise that could be the solutions to the transportation industry
(Lin et al., 2018). This RPA would be so impactful to implement in the
warehouse productions and the logistics administrative office tasks that
indirectly enhance their shipment processes' value. According to
Gružauskas et al. (2020), Artificial Intelligence (AI) and RPA are a good
combination in replacing the manual process in software robots where AI
can penetrate the large data while RPA automates the manageable tasks.
RPA is also related to Robotic Desktop Automation, where it is software
that functions locally and repeats the human worker steps by collaborating
with another interactive system (Yusupbekov et al., 2017; Anagnoste et al.,
2017).

2.1.5 Warehouse Technologies

According to Krishnan et al. (2019), smart warehouses could be
crucial in the facilities because of their ability to integrate the businesses
process across the cyber network. The system implemented in smart
warehouses can adapt to various businesses and is sufficiently smart to
conduct business operations without much human interference. According
to Jabbar et al. (2018), to ensure the smart warehouse can perform at any
marketplace level, it has been designed to function with optimum
productivity by combining only the best strategies, automation, and
innovations. Volodymyr et al. (2020) stated that nowadays, it is crucial to
use the current modern technologies and innovation in warehousing logistic
development. Few infant technologies have been implemented in Malaysia's
warehouse sectors, such as Warehouse Management System (WMS),
scanners, SAP system and BRO excel (Krishnan et al., 2019). Table 1
summarizes all the automation and element technologies that are found that
related in warehouse productivity for the logistic industry:

Table 1. Summary of automation technologies and their elements

TECHNOLOGIES ELEMENTS REFERENCES
Logistics Automation
- Unmanned Aerial Amiruddin et al.,
Technology
Vehicles (UAV) 2020; Dekhne et al.,
Internet of Things
(IoT) - Autonomous Mobile 2019; Strandhagen et

Big Data Robots (AMR) al., 2017; Efthymiou

Robotic Process - Cyber-Physical Systems et al., 2019;
Automation (RPA)
(CPS) Edirisuriya et al.,
Warehouse
Technologies - Robotic Mobile 2018; Boysen et al.,

Fulfilment Systems 2019; Lamballais et

(RMFS) al., 2020

- RFID Khairuddin et al.,

- Wireless Sensor 2019; Lin et al.,

Network (WSN) 2016; Tu et al., 2018;

- Wireless Goncharova et al.,

Communication 2018 Ivankova et al.,

Technologies 2020; Witkoswki et

al., 2017; Song et al.,

2020; Zhou et

al., 2011

- Consumption Demand Wamba et al., 2015;

Forecasting Gravili et al., 2018;

- Equipment Maintenance Lamba et al., 2017;

Prediction Ghost et al., 2015;

- Network Planning Ayed et al., 2015;

- Route Planning Karim et al., 2016;

- Supply Chain Wang et al., 2018;

- Risk Management Song et al., 2020

- Artificial Intelligence Yusupbekov et

(AI) al., 2017;

- Robotic Desktop Anagnoste et al.,

Automation 2017; Gružauskas

et al., 2020; Lin,

et al., 2018

- Warehouse Krishnan et

Management System al., 2019;

(WMS) Jabbar et al.,

- Scanners 2018;

- SAP system Volodymyr et

- BRO excel al., 2020

2.2 Technology Organization Environment (TOE) Model

TOE model was developed by Tornatzky & Fleischer (1990) mainly
used as a tool to measure and for evaluating the new technology
implementation. This model describes the impact and influences of
technological, organizational, and environmental aspects on implementing
and adopting technologies in an organization (Depietro et al., 1990;
Tornatzky & Fleischer, 1990). The technological aspect is related to how
the technologies can be relevant to the firms (Jia et al., 2017). The
organizational aspect describes how the structure, such as the firm size and
human asset of an organization, reacts with new technologies. The
environmental aspect demonstrates how the condition of the industry,
competitors and government receiving the adoption of the new technologies
(Mahroof et al., 2019). This TOE model is suitable to be used by various
organizations that have innovation adoption types and to measure the impact
of in as in a comprehensive view.

3. Methodology

As this study aims to identify the factors that influence the
implementation of automation technology application towards warehouse
productivity for logistic industry, a productive way to extract the factors
from the past studies in extant academic and practical journals is needed. In
this methodology part, the literature review analysis was based on the
successful systematic review that been recommended (Webster et al., 2002;
Tranfield et al., 2003; Cooper et al., 2009; Seuring et al., 2012; Saenz et al.,
2015; Garay-Rondero et al., 2019).

The online database selection phase was focused on acquiring the
conceptual models, elements in automation technologies, logistic 4.0, and
industry 4.0 in the search conducted. In considering the quality, impact, and
representativeness of papers, only established academic and industry
journals were considered. These were the identified database containing the
literature related to the automation technologies in the logistic sector which
are IEEE, Science Direct, Scopus, Emerald Insight, and Google Scholar.

The systematic review analysis was used to construct a conceptual
model to derive the comprehensive content and address the objective of this
study. There are four phases in this systematic review analysis, which are
identifying, analyzing, categorization and mapping.

In the first phase, which is the identifying phase, a search was
conducted on those online databases with the chosen related keywords in
the title or the context of “automation technology,” “logistic industry,”
logistic 4.0”, “industry 4.0”, “warehouse productivity,” “conceptual model,”
and “industry revolution.” After identifying the keywords, 86 papers were
found and then subjected to the following process, which was an analytical
selection process. In all, 26 articles were rejected because they were
duplicated in a similar area of works found.

Subsequently, an analysis phase of the papers was conducted, using
the analytical procedures which is “rough cut,” eliminating those papers that
have zero significant focus on automation technology or logistic industry or
logistic 4.0 or industry 4.0 or warehouse productivity or conceptual model
or industrial revolution throughout the paper or in any section. “Reading
cut” means discarding any papers that do not have any related with the
model on the automation technology in the logistic industry towards
warehouse productivity for logistic industry. During this phase, there were
12 papers discarded because these papers were not related and contributing
to the topic of this paper. After this phase, 74 papers were identified related
to the automation technology in the logistic industry towards warehouse
productivity for the logistic industry. There are 184 criteria found from the
articles, with 15 criteria were combined because they were duplicated with
the rest of founded criteria. The 169 criteria were being analyzed, if the
criteria being duplicated, they will be combined, and if there were similarly
defined criteria, it would be either combined or added to complement.
Eventually, there are nine criteria being finalized and available for the next
phase.

In the categorization phase, the finalized criteria have been
organized according to the Technological Organizational Environmental
(TOE) framework model. The T-O-E model, an indicator used as an
analytical tool, makes a real contribution to the role of technology,
organization, and the environment in analyzing the factors that influence the
implementation of the automation technology application model towards
warehouse productivity for the Malaysian logistic industry. The T-O-E
model succeeds in providing a clear and comprehensive picture of both the
internal and external factors in implementing the technology in the logistic
sector. The analytical tools offered by the T-O-E model explain thoroughly
from various points of view. When businesses considering implementing the
technologies, they must establish all the impact of such technologies from a
comprehensive perspective, and TOE can provide that point of reference.
Based on all identified criteria, the researcher classified all the 169 criteria

into nine factors, namely Cost, Perceived Advantage, Compatibility, Firm
Size, Firm Scope, Organizational Support, IT Knowledge, Competitive
Pressure and External Support. These nine factors were categorized to map
into the TOE model. Three factors which are Cost, Perceived Advantage and
Compatibility, were put under the technological context. Another four
factors, Firm Size, Firm Scope, Organizational Support, and IT Knowledge
were put under the organizational context. The remaining factors, which are
Competitive Pressure and External Support, were categorized within the
environmental context.

Referring to Table 2, based on the criteria from the selected articles
were filtered, and nine categorizations of factors being considered. Then,
the factors were detailed out to explain the factors found based on the
literature. The influential factors were derived from the understanding of the
criteria and the challenges in implementing the automation technology in
the firms of the logistic industry.

Table 2. The factors that influence the implementation of automation
technology application towards warehouse productivity for logistic

industry

TECHNOLOGICAL CONTEXT

FACTORS CHALLENGES REFERENCES

Cost • The cost in Ghobakhloo et al.,

implementing the 2019; Mahroof et al.,

automation technology 2019; Prause et al.,

is costly and high. 2019; Hao et al., 2020;

Sun et al.,2016

Perceived • The automation Ghobakhloo et al.,
Advantages 2019; Hsu et al., 2014;
technology might not
Kuan et al., 2001;
improve the outputs of the Krishnan et al., 2019;
Prause et al., 2019;
logistic industry player. Hao et al., 2020; Sun et

al.,2016

Compatibility • The automation Ghobakhloo et al.,
2019; Liu et al.,
technology not fit with (2015); Kim et al.,

the current existing 2017; Tu et al., 2018;
Rahman et al., 2019;
technology. Rosli et al., 2012; Awa

• The automation et al., 2016; Prause et
al., 2019; Sun et al.,
technology cannot adapt 2016; Massod et al.,

and fit with the current 2019

environment.

ORGANIZATIONAL CONTEXT

Firm Size • The smaller firms will Lei et al., 2021; Awa
et al., 2016; Hao et al.,
have barriers in
2020;
implementing the
Sun et al., 2016; Rosli
automation because of the et al., 2021; Maarof et
al., 2019; Rahman et
limited knowledge and al., 2019; Song et al.,
2019; Won et al., 2020;
resources.
Arnold et al., 2018

Firm Scope • The smaller firm scope Lei et al 2021; Awa et
al., 2016; Hao et al.,
of logistic industry might 2020; Won et al., 2020;

have not enough Maarof et al., 2019;
Prause et al., 2019
experiences in

implementing the

automation technology.

Organizational • The support of the top Won et al., 2020;
Support management is low. Ghobakhloo et al.,

• The employees in the firms 2019;
shown a lack of support
and interest in having the Giotopoulos et al.,
automation technology. 2017; Lee et al., 2007;

Wang et al., 2010;

Thornton et al., 2016;
Rosli et al 2012;

Maarof et al., 2019;
Prause et al., 2019; Sun

et al., 2016; Krishnan
et al., 2019

IT Knowledge • Having a lack of skills in Hennelly et al.,

utilizing the automation 2019; Maaroof et al.,
technology. 2019; Won et al., 2020;

• Having a lack of expertise Ghobakhloo et al.,
2019; Giotopoulos et
in the organizations might
al., 2017; Liviu et al.,
give a low benefit in 2009; Huo et al., 2015;

implementing the Hsu et al., 2014;

automation technology.

ENVIRONMENTAL CONTEXT

Competitive • Pressure the firms in Won et al., 2020;
Pressure Ghobakhloo et al., 2019;
implementing the Khrishnan 2019; Awa et

automation technologies in al., 2016; Mital et al.,
2018; Hwang et al.,
the logistic firms to match 2004; Wang et al., 2010;
Quetti et al., 2012; Hsu
the rivals.
et
al.,2014

External Support • Low support from the Awa et al., 2016; Tu
external environment that et al., 2018; Krishnan et
cause discouragement and al., 2019; Wang et al.,
uncertainty for the firms
2010; Rahman et al.,
2019; Masood et al.,

2019;
Arnold et al., 2018

4. Discussion

Fig 1. Conceptual framework for the factors that influence the implementation of
automation technology application towards warehouse productivity for logistic

industry.

In the discussion section, we refer to the three contexts of the
technology-organization- environment (TOE) framework model, which are
technological, organizational, and environmental. According to Figure 1
above, in the technological context, it shows that cost, perceived advantage,
and compatibility have impact on the implementation of automation
technology application towards warehouse productivity. As Prause et al.
(2019) mentioned, it will not be appropriate if the firms were implementing
the technologies with high cost and surpass the firms' budget. The lower cost
of the particular technologies, the more willing the firms would be ready to
implement them. The automation technology itself should give rich
transformational advantages to the firms' improvement (Hao et al., 2020;
Ghobakhloo et al., 2019; Krishnan et al., 2019). If there are limited benefits
and many risks in utilizing the automation technologies, it would not be
worth it to have the technologies in the firms. Compatibility highlights how
the automation technologies fit and consistent with the current technologies
and environment of warehouse productivity in the logistic industry. There
will be a risk if the significant automation technologies are not compatible
and parallel with the needs and current infrastructure of the logistic firms
(Mahroof et al., 2019; Ghobakhloo et al., 2019; Prause et al., 2019).

In the organizational context, the size of the firms is also having a
significant impact in implementing automation technologies because large
firms have stable financial resources and have the power and ability to use
new technologies (Arnold et al., 2018; Rahman et al., 2019). Thong and Yap
(1995) reported that the larger the firms' sizes, the easier the process in of
acceptance. Indirectly, this highlights that if the size and scope of the firms
are smaller, it might be a challenge for them to implement the automation
technologies successfully. Furthermore, the firm's probability of
implementing technologies increases if the support of the organization itself
increases. It will be a massive gap if the logistic firms themselves lack
support in the addition of technologies in their companies, specifically from
the top management and the workers (Won et al., 2020; Mahroof et al., 2019;
Prause et al., 2019). Moreover, the IT knowledge among the staff on the
automation technologies also significantly impacts when implementing the
technologies in warehouse productivity in the logistic industry. The logistic
firms will not fully optimize the benefits of the automation technologies if
they do not have enough skills and expertise in utilizing them (Hennely et
al., 2019).

In the environmental context, the implementation of automation
technology among the competitors can cause pressure from business
partners and customers for digitalization in their operations to the firm’s
mission for competitiveness and trend in the Logistic 4.0 era (Krishnan et
al.2019). According to V-Baralles et al. (2010), referring to the current
competitive business environment, extensive capabilities are critical to a
company's development. However, entering an intensely competitive
business environment would be a pressure and would force the firms to
implement the automation technologies without enough preparation (Mital
et al., 2018; Ghobakhloo et al., 2019). The external support highlights how
the environment is in the same efforts to expand the implementation of the
automation technologies in the warehouse productivity of the logistic
industry. As Krishnan et al. (2019) mentioned, the government's support
could help the firms reduce the export and import charges. The lack of
support from the external environment might lower the motivation and
caused uncertainty for the firms to implement the automation technologies
in their company (Tu et al., 2018).

5. Conclusion

This research was centered on inadequate implementation and
influential factors in implementing the automation technologies among the
logistic industry players in the current warehouse productivity towards
logistics (Türkeș et al., 2019; Otles et al., 2019). Therefore, in counter this
situation, this research aims to investigate the the factors that influence the
implementation of automation technology application towards warehouse
productivity for logistic industry. The conceptual framework for the factors
that influence the implementation of automation technology application
towards warehouse productivity for logistic industry that has been
developed from this study can become a guideline to the industry players
that implementing the automation technologies at their firms. This model is
developed based on the TOE model that covered comprehensive views for
the firm to implement the automation technologies in the right strategy.
However, the result of this study is only based on the literature review and
covered only warehouse productivity for the logistic industry. Hence, the
future researcher can cover another area in the logistic industry and proceed
with data collection to verify the industries.

6. References

Alam, A., Bagchi, P. K., Kim, B., Mitra, S., & Seabra, F. (2014). The
mediating effect of logistics integration on supply chain
performance: a multi-country study. The International Journal of
Logistics Management.

Alfalla-Luque, R., Marin-Garcia, J. A., & Medina-Lopez, C. (2015). An
analysis of the direct and mediated effects of employee commitment
and supply chain integration on organisational performance.
International Journal of Production Economics, 162, 242-257.

Amiruddin, B. P., & Romdhony, D. R. (2020). A Study on Application of
Automation Technology in Logistics and Its Effect on E-
Commerce.

Amr, M., Ezzat, M., & Kassem, S. (2019, October). Logistics 4.0: Definition
and historical background. In 2019 Novel Intelligent and Leading
Emerging Sciences Conference (NILES) (Vol. 1, pp. 46-49). IEEE.

Anagnoste, S. (2017, July). Robotic Automation Process-The next major
revolution in terms of back office operations improvement. In
Proceedings of the International Conference on Business
Excellence (Vol. 11, No. 1, pp. 676-686). Sciendo.

Arnold, C., Veile, J., & Voigt, K. I. (2018, April). What drives industry 4.0
adoption? An examination of technological, organizational, and
environmental determinants. In Proceedings of the International
Association for Management of Technology (IAMOT) Conference,
Birmingham, UK (pp. 22-26).

Arvis, J. F., Ojala, L., Wiederer, C., Shepherd, B., Raj, A., Dairabayeva, K.,
& Kiiski, T. (2018). Connecting to compete 2018: trade logistics in
the global economy. World Bank.

Awa, H. O., Ukoha, O., & Emecheta, B. C. (2016). Using TOE theoretical
framework to study the adoption of ERP solution. Cogent Business
& Management, 3(1), 1196571.

Ayed, A. B., Halima, M. B., & Alimi, A. M. (2015, May). Big data analytics
for logistics and transportation. In 2015 4th international conference
on advanced logistics and transport (ICALT) (pp. 311-316). IEEE.

Bae, H. S. (2017). Empirical relationships of perceived environmental
uncertainty, supply chain collaboration and operational
performance: analyses of direct, indirect and total effects. The Asian
Journal of Shipping and Logistics, 33(4), 263-272.

Boysen, N., De Koster, R., & Weidinger, F. (2019). Warehousing in the e-
commerce era: A survey. European Journal of Operational
Research, 277(2), 396-411.

Dekhne, A., Hastings, G., Murnane, J., & Neuhaus, F. (2019). Automation
in logistics: Big opportunity, bigger uncertainty. McKinsey Q, 1-12.

Depietro, R., Wiarda, E., & Fleischer, M. (1990). The context for change:
Organization, technology and environment. The processes of
technological innovation, 199(0), 151-175.

Drazin, R. (1991). The processes of technological innovation. The Journal
of Technology Transfer, 16(1), 45-46.

Edirisuriya, A., Weerabahu, S., & Wickramarachchi, R. (2018, December).
Applicability of lean and green concepts in Logistics 4.0: a
systematic review of literature. In 2018 International Conference on
Production and Operations Management Society (POMS) (pp. 1-8).
IEEE.

Efthymiou, O. Κ., & Ponis, S. T. (2019). Current status of Industry 4.0 in
material handling automation and in-house logistics. International
Journal of Industrial and Manufacturing Engineering, 13(10), 1370-
1374.

Eljazzar, M. M., Amr, M. A., Kassem, S. S., & Ezzat, M. (2018). Merging
supply chain and blockchain technologies. arXiv preprint
arXiv:1804.04149.

Emmer, C., Glaesner, K. H., Pfouga, A., & Stjepandić, J. (2017). Advances
in 3D measurement data management for Industry 4.0. Procedia
Manufacturing, 11, 1335-1342.

Ganbold, O., Matsui, Y., & Rotaru, K. (2020). Effect of information
technology-enabled supply chain integration on firm's operational
performance. Journal of Enterprise Information Management.

Ghobakhloo, M., & Ching, N. T. (2019). Adoption of digital technologies
of smart manufacturing in SMEs. Journal of Industrial Information
Integration, 16, 100107.

Ghosh, D. (2015, September). Big data in logistics and supply chain
management-a rethinking step. In 2015 International Symposium
on Advanced Computing and Communication (ISACC) (pp. 168-
173). IEEE.

Giotopoulos, I., Kontolaimou, A., Korra, E., & Tsakanikas, A. (2017). What
drives ICT adoption by SMEs? Evidence from a large-scale survey
in Greece. Journal of Business Research, 81, 60-69.

Goncharova, N. L., & Bezdenezhnykh, T. I. (2018). Employing the elderly
in the service sector in conditions of electronic and fourth
innovation and technology revolution: Industry 4.0. In Innovation
Management and Education Excellence through Vision 2020 (pp.
2330-2336).

Govindan, K., Cheng, T. E., Mishra, N., & Shukla, N. (2018). Big data
analytics and application for logistics and supply chain
management.

Gružauskas, V., & Ragavan, D. (2020). Robotic Process Automation for
Document Processing: A Case Study of A Logistics Service
Provider. Vadyba, (2), 119.

Hao, J., Shi, H., Shi, V., & Yang, C. (2020). Adoption of automatic
warehousing systems in logistics firms: a technology–organization–
environment framework. Sustainability, 12(12), 5185.

Hendy Tannady, R., Andry, J. F., & Marta, R. F. Exploring The Role of ICT
Readiness and Information Sharing On Supply Chain Performance
in Coronavirus Disruptions.

Hennelly, P. A., Srai, J. S., Graham, G., Meriton, R., & Kumar, M. (2019).
Do makerspaces represent scalable production models of
community-based redistributed manufacturing?. Production
Planning & Control, 30(7), 540-554.

Hsu, P. F., Ray, S., & Li-Hsieh, Y. Y. (2014). Examining cloud computing
adoption intention, pricing mechanism, and deployment model.
International Journal of Information Management, 34(4), 474-488.

Huo, B., Ye, Y., Zhao, X., & Shou, Y. (2016). The impact of human capital
on supply chain integration and competitive performance.
International Journal of Production Economics, 178, 132-143.

Hwang, H. G., Ku, C. Y., Yen, D. C., & Cheng, C. C. (2004). Critical factors
influencing the adoption of data warehouse technology: a study of
the banking industry in Taiwan. Decision Support Systems, 37(1),
1-21.

Ivankova, G. V., Mochalina, E. P., & Goncharova, N. L. (2020, September).
Internet of Things (IoT) in logistics. In IOP Conference Series:
Materials Science and Engineering (Vol. 940, No. 1, p. 012033).
IOP Publishing.

Jabbar, S., Khan, M., Silva, B. N., & Han, K. (2018). A REST-based
industrial web of things’ framework for smart warehousing. The
Journal of Supercomputing, 74(9), 4419-4433.

Jacobs, M. A., Yu, W., & Chavez, R. (2016). The effect of internal
communication and employee satisfaction on supply chain
integration. International Journal of Production Economics, 171,
60-70.

Jajja, M. S. S., Chatha, K. A., & Farooq, S. (2018). Impact of supply chain
risk on agility performance: Mediating role of supply chain
integration. International Journal of Production Economics, 205,
118-138.

Jia, Q., Guo, Y., & Barnes, S. J. (2017). Enterprise 2.0 post-adoption:
Extending the information system continuance model based on the
technology-Organization-environment framework. Computers in
Human Behavior, 67, 95-105.

Karim, L., Boulmakoul, A., & Lbath, A. (2016, May). Near real-time big
data analytics for NFC-enabled logistics trajectories. In 2016 3rd
International Conference on Logistics Operations Management
(GOL) (pp. 1-7). IEEE.

Khairuddin, A. A., Akhir, E. A. P., & Hasan, M. H. (2019). A Case Study
to Explore IoT Readiness in Outbound Logistics. Int. J Sup. Chain.
Mgt Vol, 8(2), 947.

Kim, H. J. (2017). Information technology and firm performance: the role
of supply chain integration. Operations management research, 10(1-
2), 1-9.

Krishnan, E. R. K., & Wahab, S. N. (2019). A qualitative case study on the
adoption of smart warehouse approaches in Malaysia. In E3S Web
of Conferences (Vol. 136, p. 01039). EDP Sciences.

Kuan, K. K., & Chau, P. Y. (2001). A perception-based model for EDI
adoption in small businesses using a technology–organization–

environment framework. Information & management, 38(8), 507-
521.
Kumar, V., Mishra, N., Chan, F. T., & Verma, A. (2011). Managing
warehousing in an agile supply chain environment: an F-AIS
algorithm based approach. International Journal of Production
Research, 49(21), 6407-6426.
Lamba, K., & Singh, S. P. (2017). Big data in operations and supply chain
management: current trends and future perspectives. Production
Planning & Control, 28(11-12), 877-890.
Lamballais Tessensohn, T., Roy, D., & De Koster, R. B. (2020). Inventory
allocation in robotic mobile fulfillment systems. IISE Transactions,
52(1), 1-17.
Lei, Y., Guo, Y., Zhang, Y., & Cheung, W. (2021). Information technology
and service diversification: A cross-level study in different
innovation environments. Information & Management, 103432.
Lin, D., Lee, C. K. M., & Lin, K. (2016, December). Research on effect
factors evaluation of internet of things (IOT) adoption in Chinese
agricultural supply chain. In 2016 IEEE International Conference
on Industrial Engineering and Engineering Management (IEEM)
(pp. 612-615). IEEE.
Lin, S. C., Shih, L. H., Yang, D., Lin, J., & Kung, J. F. (2018, September).
Apply RPA (robotic process automation) in semiconductor smart
manufacturing. In 2018 e-Manufacturing & Design Collaboration
Symposium (eMDC) (pp. 1-3). IEEE.
Luff, P. (2017). The 4th industrial revolution and SMEs in Malaysia and
Japan: some economic, social and ethical considerations. Reitaku
International Journal of Economic Studies, 25, 25-48.
Mahroof, K. (2019). A human-centric perspective exploring the readiness
towards smart warehousing: The case of a large retail distribution
warehouse. International Journal of Information Management, 45,
176-190.
Masood, T., & Egger, J. (2019). Augmented reality in support of Industry
4.0—Implementation challenges and success factors. Robotics and
Computer-Integrated Manufacturing, 58, 181-195.
Mathauer, M., & Hofmann, E. (2019). Technology adoption by logistics
service providers. International Journal of Physical Distribution &
Logistics Management.
Mital, M., Chang, V., Choudhary, P., Papa, A., & Pani, A. K. (2018).
Adoption of Internet of Things in India: A test of competing models
using a structured equation modeling approach. Technological
Forecasting and Social Change, 136, 339-346.

Moldabekova, A., Philipp, R., Satybaldin, A. A., & Prause, G. (2021).
Technological readiness and innovation as drivers for logistics 4.0.
The Journal of Asian Finance, Economics, and Business, 8(1), 145-
156.

Oleśków-Szłapka, J., & Stachowiak, A. (2018, September). The framework
of logistics 4.0 maturity model. In International conference on
intelligent systems in production engineering and maintenance (pp.
771-781). Springer, Cham.

Otles, S., & Sakalli, A. (2019). Industry 4.0: The Smart Factory of the Future
in Beverage Industry. In Production and Management of Beverages
(pp. 439-469). Woodhead Publishing.

Prause, M. (2019). Challenges of industry 4.0 technology adoption for
SMEs: the case of Japan. Sustainability, 11(20), 5807.

Rahman, M., & Aydin, E. (2019). Organisational challenges and benefits of
e-hrm implementations in governmental organisations: Theoretical
shift from TOE model. Uluslararası İktisadi ve İdari İncelemeler
Dergisi, 127-142.

Ratnasingam, J., Lee, Y. Y., Azim, A. A. A., Halis, R., Liat, L. C., Khoo,
A., ... & Amin, M. N. Z. M. (2020). Assessing the awareness and
readiness of the Malaysian furniture industry for Industry 4.0.
Bioresources, 15(3), 4866-4885.

Rosli, K., Yeow, P. H., & Siew, E. G. (2012). Factors influencing audit
technology acceptance by audit firms: A new I-TOE adoption
framework. Journal of Accounting and Auditing, 2012, 1.

Salmela, E., Happonen, A., & Huiskonen, J. (2012). New concepts for
demand-supply chain synchronisation. International Journal of
Manufacturing Research, 7(2), 148-164.

Saukkonen, J., Kemell, K., Haaranen, M., & Svärd, E. (2020). Robotic
Process Automation as a Change Agent for Business Processes:
Experiences and Expectations. In Proceedings of the 2nd European
Conference on the Impact of Artificial Intelligence and Robotics.
Academic Conferences International.

Song, S., Shi, X., & Song, G. (2019). Supply chain integration in omni-
channel retailing: a human resource management perspective.
International Journal of Physical Distribution & Logistics
Management.

Song, Y., Yu, F. R., Zhou, L., Yang, X., & He, Z. (2020). Applications of
the Internet of things (IoT) in smart logistics: A comprehensive
survey. IEEE Internet of Things Journal.

Sternad, M., Lerher, T., & Gajšek, B. (2018). Maturity levels for logistics
4.0 based on NRW's Industry 4.0 maturity model. Business
Logistics in Modern Management.

Strandhagen, J. O., Vallandingham, L. R., Fragapane, G., Strandhagen, J.
W., Stangeland, A. B. H., & Sharma, N. (2017). Logistics 4.0 and
emerging sustainable business models. Advances in Manufacturing,
5(4), 359-369.

Sun, S., Cegielski, C. G., Jia, L., & Hall, D. J. (2018). Understanding the
factors affecting the organizational adoption of big data. Journal of
Computer Information Systems, 58(3), 193-203.

Thornton, L. M., Esper, T. L., & Autry, C. W. (2016). Leader or lobbyist?
How organizational politics and top supply chain manager political
skill impacts supply chain orientation and internal integration.
Journal of Supply Chain Management, 52(4), 42-62.

Tian, M., Huo, B., Park, Y., & Kang, M. (2021). Enablers of supply chain
integration: a technology-organization-environment view.
Industrial Management & Data Systems.

Tjahjono, B., Esplugues, C., Ares, E., & Pelaez, G. (2017). What does
industry 4.0 mean to supply chain? Procedia manufacturing, 13,
1175-1182.

Tornatzky, L. G., Fleischer, M., & Chakrabarti, A. K. (1990). Processes of
technological innovation. Lexington books.

Tu, M. (2018). An exploratory study of Internet of Things (IoT) adoption
intention in logistics and supply chain management: A mixed
research approach. The International Journal of Logistics
Management.

Türkeș, M. C., Oncioiu, I., Aslam, H. D., Marin-Pantelescu, A., Topor, D.
I., & Căpușneanu, S. (2019). Drivers and barriers in using industry
4.0: a perspective of SMEs in Romania. Processes, 7(3), 153.

Volodymyr, M., & Oksana, O. (2020). World Trends In Warehousing
Logistics. Intellectualization of logistics and Supply Chain
Management, (2).

Wamba, S. F., Akter, S., Edwards, A., Chopin, G., & Gnanzou, D. (2015).
How ‘big data’can make big impact: Findings from a systematic
review and a longitudinal case study. International Journal of
Production Economics, 165, 234-246.

Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J. F., Dubey, R., &
Childe, S. J. (2017). Big data analytics and firm performance:
Effects of dynamic capabilities. Journal of Business Research, 70,
356-365.

Wang, Y. M., Wang, Y. S., & Yang, Y. F. (2010). Understanding the
determinants of RFID adoption in the manufacturing industry.
Technological forecasting and social change, 77(5), 803-815.

Won, J. Y., & Park, M. J. (2020). Smart factory adoption in small and
medium-sized enterprises: Empirical evidence of manufacturing
industry in Korea. Technological Forecasting and Social Change,
157, 120117.

Yu, Y., Huo, B., & Zhang, Z. J. (2021). Impact of information technology
on supply chain integration and company performance: evidence
from cross-border e-commerce companies in China. Journal of
Enterprise Information Management.

Zhao, L., Huo, B., Sun, L., & Zhao, X. (2013). The impact of supply chain
risk on supply chain integration and company performance: a global
investigation. Supply Chain Management: An International Journal.

Zsidisin, G. A., Hartley, J. L., Bernardes, E. S., & Saunders, L. W. (2015).
Examining supply market scanning and internal communication
climate as facilitators of supply chain integration. Supply Chain
Management: An International Journal.

Exploring Latent Dirichlet Allocation for Topic
Modelling in Facebook for Mental Health on
COVID-19 Pandemic

Nurzulaikha Khalid1, Shuzlina Abdul-Rahman2, Wahyu Wibowo3

2Research Initiative Group of Intelligent Systems
1,2Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,

40450 Shah Alam, Selangor, Malaysia
[email protected], [email protected]

3Institut Teknologi Sepuluh Nopember (ITS), Surabaya,
East Java, Indonesia, 60111
[email protected]

Abstract: Social media sites are the primary sources of information
regarding people's attitudes and sentiments around various topics and
situations. People use social media for hours each day to share their
thoughts, views, and responses with others. The harmful influence of social
media on mental health has recently become more apparent with COVID-
19 impacting Malaysia. This research explores Facebook data concerning
COVID-19-related discussions. There are three phases involved, which are
data collection, data pre-processing, and topic modelling. This research
proposes the use of the Latent Dirichlet Allocation (LDA) topic modelling
approach to categorize the topics. The three topics discovered are “COVID-
19 Cases with Hospital Quarantine”, “Lockdown and Mental Health”, and
“Vaccination with COVID-19 Cases”. Next, this study focuses on a topic
with a mental health element, which is Topic 2, to further discuss the insight.
According to Topic 2, “Lockdown and Mental Health”, as factories are the
principal source of the rising number of COVID-19 cases, most people
disagree with the policymakers' choice to allow factories to operate during
the lockdown. As a result, the lockdown has been extended, and many
people are fearful of losing their jobs and income.

Keywords: Facebook, Social Media, COVID-19, Latent Dirichlet
Allocation, Mental Health

1. Introduction

COVID-19 pandemic has had major repercussions on public health,
economy, politics, and culture (Cheng et al., 2020). For many people,
companies, corporations, and governments, the problems posed by this
epidemic are significant. Not only that, but the virus's impacts on people's
daily lives are certainly unlike anything most people have ever experienced
before. Several industries and businesses most affected by this pandemic
include the economic (Maliszewska et al., 2020), education (Schleicher,
2020), and public mental health (Campion et al., 2020) sectors. Thus,
scientists quickly assessed several elements of the epidemic, including
possible mental health consequences for which the nature of survey design
is time-consuming and expensive (Gualano et al., 2020) and the instrument
validation makes it difficult to establish real-time results (Coughlan et al.,
2009), especially in the face of quickly-changing news cycles that affect the
pandemic-related discussion. In the absence of survey data, social media
offers a potentially significant data source for researching emerging social
concerns, including their impact on behaviours and social mood (Berry et
al., 2018).

The repeated tracking of social media data could provide a
diachronic perspective on public morale and collective attitude, as
individuals actively contribute to narratives, providing unprompted and
varied understandings of various events (Cho et al., 2018). Individuals have
also sought out crisis-related news at greater levels during the COVID-19
pandemic (Bento et al., n.d.), resulting in a collective increase in worldwide
social media use. As a result, social media data regarding the COVID-19
pandemic is a valuable source of data for drawing real-time inferences about
overall social well-being during a critical public health outbreak. In this
study, the topic modelling approach is used to determine the topics of
conversation on social media related to the COVID-19 pandemic. The topic
that contains mental health elements is further discussed. The data from
Facebook's COVID-19 focus in Malaysia is used to derive insights.

2. Related Work

In text mining, some documents, such as blog posts or news articles,
must be gathered and then classified into topics so that people can
comprehend them independently and clearly. Topic modelling is the most
commonly used unsupervised learning approach for text classification

(Chauhan, 2017) in text mining, latent data exploration, and discovering
connections between data and text documents that can identify terms and
phrases in a series of documents (Wu et al., 2020).

Topic modelling is a statistical text mining tool for identifying
possible (hidden) trends in a data corpus and classifying main words in a
corpus as topics. It is a quick and straightforward technique to start
examining data because it does not require any training (Du, 2021). The
fundamental purpose of a topic model is to cluster documents in a text
domain; each document has a topic probability distribution, and documents
with a high probability for the same subject may be clustered together (Jiang
et al., 2019). As a result, unlike traditional clustering, a topic model permits
data from a variety of clusters instead of just one. A review of the topic
modelling study done by Kherwa and Bansal (2019) stated that in LDA, the
top words of all subjects indicate highly crisp topics, clearly separated and
also cohesive to tell the nature of distinct topics.

Several studies utilize LDA topic modelling. For example, the study
by Kwok et al. (2021) collects 31,100 English tweets from Australian
Twitter users containing COVID-19 vaccine-related keywords between
January and October 2020. This study looks at tweets by visualizing high-
frequency word clouds and word token correlations. A study by Kaila and
Prasad (2020) investigates tweets connected to #coronavirus. Both studies
developed an LDA topic model to classify widely discussed topics in a broad
sample of tweets. Another study done by Xue et al. (2020) uses machine
learning techniques to analyze 1.9 million Tweets (written in the English
language) about COVID-19 between 23 January and 7 March 2020. They
determined the most suitable number of topics using the Gensim coherence
model. Since this dataset has the highest coherence ranking, they selected
11 as the total number of topics returned by LDA.

3. Methodology

3.1 Data Collection and Pre-processing

This study uses Facebook to gather data. The researchers collect
posts from the Facebook public page of “Kementerian Kesihatan Malaysia”.
The researchers focus the analysis in this study on the period from 1 June
2021 to 31 August 2021. 1 June 2020 is selected for being the day the Full

Movement Control Order (FMCO) started in Malaysia (Nga et al., 2021).
A total of 74,266 comments were scrapped from the “Kementerian
Kesihatan Malaysia” Facebook page. The extracted dataset consists of 40
metadata, such as username, time, number of likes, number of comments,
shares, reaction, post text, and full comments. With all these metadata
available, only the comment text is concentrated upon to study the sentiment
expressed by the commenter.

After the data collection is completed, the researchers filter the
dataset to consist of only COVID-19 pandemic posts by manually looking
for the keyword, “COVID-19”, in each post. Then, various data cleaning
methods are performed, such as removing irrelevant comments, translating,
removing special characters, converting words to lowercases, tokenization,
deleting stop words from the corpus, and lemmatizing. These are critical
tasks in text analytics. There is a total of 8,890 data when the cleaning
process is completed.

3.2 Topic Modelling

This study uses Latent Dirichlet Allocation (LDA) for topic
modelling. The LDA topic models are unsupervised machine learning
algorithms that use probabilistic inferences to group enormous amounts of
text input into understandable themes (Blei et al., 2003). The researchers
use the Gensim coherence model to determine the most appropriate number
of topics based on the data. The researchers choose the number of topics that
generate the highest coherence value. The higher the coherence score, the
easier it is to comprehend the subject that belongs to the topic's word
distribution (Alash & Al-Sultany, 2020). The number of topics selected
ranges from 2 to 15 and the coherence score for each method with k topics
is calculated by importing Coherence Model from Gensim, a model library
in Python. Three topics are extracted for further analysis and discussion
since this number has the highest coherence score of 0.5606 compared to
other numbers. The LDA is utilized from the LDA model library in Python
to extract keywords from the set of three topics. After extracting the
keywords using LDA, the researchers validate and label the topics manually
by referring to the high-frequency keyword. This is done because automatic
labelling of topics is not possible as discovering the topics is an
unsupervised learning process. It requires human judgement and
intervention to examine the coherence and meaningfulness of the topics and
subsequently label them (Chang et al., 2009). Then, the 2D Plane of

Intertopic Distance is displayed by using the pyLDAvis package to visualize
the distance between topics selected as well as the frequency of terms
mentioned in each topic. Lastly, the mental health-related topics will be
further discussed.

4. Result and Discussion

There are three topics extracted from the keywords, and they are
“COVID-19 Cases with Hospital Quarantine”, “Lockdown and Mental
Health”, and “Vaccination with COVID-19 Cases”, as illustrated in Table
1. There are 10 extracted keywords that contribute to Topic 1: “case”,
“covid”, “day”, “hospital”, “people”, “new”, "please", "state", "quarantine",
and "time". These keywords are sorted by their respective weights. For
example, in Topic 1, the weight of “case” is 0.031, and “covid” is 0.021.
The topics represented in the third column of Table 1 are manually defined
by referring to the high-frequency keyword.

Topic Table 1: Topic extracted. Topic extracted
1 Keywords from keywords
2
3 0.031*”case” + 0.021*”covid” + COVID-19
0.017*”day” + 0.012*”hospital” + Cases with
0.011*”people” + 0.011*”new” + Hospital
0.011*”please” + 0.010*”state” + Quarantine
0.010*”quarantine” + 0.010*”time”
0.054*”people” + 0.023*”case” + Lockdown and
0.023*”stay” + 0.023*”home” + Mental Health
0.022*”government” + 0.017*”mental” +
0.016*”work” + 0.015*”covid” + Vaccination with
COVID-19
0.014*”sop” + 0.014*”factory” Cases
‘0.041*”vaccine” + 0.025*”case” +

0.024*”still” + 0.024*”covid” +
0.022*”people” + 0.019*”already” +

0.016*”day” + 0.015*”factory” +
0.014*”dose” + 0.014*”high”

The topic distance and a 2D plane of intertopic distance are
presented in Fig. 1. Each bubble on the left represents a topic: Topic 1
(COVID-19 Cases with Hospital Quarantine), Topic 2 (Lockdown and
Mental Health), and Topic 3 (Vaccination with COVID-19 Cases). The

centres are determined by computing the distance between topics. All three
bubbles show decent sizes, which means that all three topics are prevalent.
Furthermore, the bubbles do not overlap and are scattered throughout the
chart, meaning that the topic modelling has good cross-validation of the
classification for the three themes. On the right is a list of the most frequently
used terms for the topic and the frequency of occurrence. Based on the result,
the word “people”, followed by “vaccine”, have the highest mention in the
dataset. The words “home”, “stay”, “government”, and “factory” are also
mentioned frequently.

Fig. 1: 2D Plane of Intertopic Distance

“The government should make a lockdown like the first time it was made in
March, it is said that lockdown is not like now. I am mentally stress, wonder how
much longer we have to be locked up like this, please take appropriate action do
something that can benefit many people died.”
“The people are mentally tired of taking care of sop but the government takes it
easy on the general manager for permission to operate factories even though the
clusters are many from workplaces but the government points the finger at cross
state clusters why not mention the factory cluster is this a double standard”
“Until when I have to stay locked at the house but the case is still high. I mentally
stress”
“Pity us, at the time of lockdown we cannot work. people who can still work are
grateful. People cannot work for a long time can be mentally ill”

Fig. 2: Examples of comments that fall under Topic 2

Based on Fig. 2, some commenters express great displeasure with
the outcome, believing that the outbreak cannot be managed when
policymakers ignore the pandemic's severity. People believe that factories
are to blame for the rise in COVID-19 instances, and cases increase due to
the policymakers' incompetence in allowing factories to run resulting in a
prolonged lockdown. Lockdown, on the other hand, requires an individual
to stay home for an extended period while also practising SOP (Standard
Operating Procedure). Spending extra time at home can be exceedingly
stressful if the individual lives in a toxic home environment (Shanmugam et
al., 2020). After the government published the Movement Control Order
(MCO), the Women's Aid Organisation and Talian Kasih reported a
respective increase of 44% and 57% in calls. These numbers include
domestic violence as one of the stated reasons (Lee, 2020). Staying with the
family and having greater contact at home can be disastrous for a patient
suffering from post-traumatic stress disorder (PTSD) caused by past family
trauma. As a result of the COVID-19 pandemic, the boundary between
safety and compulsion becomes increasingly blurred (Shanmugam et al.,
2020).

People are also becoming more stressed as the COVID-19 instances
do not appear to recede despite the lockdown. Many lose their jobs because
of the lockdown. Losing a job may impact one's emotions and cause
instability and uncertainty, which can lead to mental health issues, such as
anxiety and depression. Financial troubles rapidly set in, and many groups
among the general population, particularly those in the B40 and M40
categories, have either lost or are on the verge of losing their source of
income (Shanmugam et al., 2020). The COVID-19 pandemic has
contributed to their distress, compounded by the burdens of increasing living
expenses. The adoption of the MCO impacts daily lives, especially
financially. Despite the government's stimulus packages designed to
alleviate the financial hardships faced by many Malaysians, many small and
medium enterprises (SMEs) in the country are forced to cut wages, reduce
the number of employees, and enforce unpaid leave for an indefinite period
due to the country's economic uncertainty.

5. Conclusion

In recent years, social media has produced a large amount of data
that can be utilized for data-driven or information-driven decision making.
This study sets out to identify the topics related to the COVID-19 pandemic

discussed on social media using the Latent Dirichlet Allocation (LDA)
approach.

Based on the outcome in the previous chapter, three topics are
discovered as a result of the topic modelling method: “COVID-19 Cases
with Hospital Quarantine”, “Lockdown and Mental Health”, and
“Vaccination with COVID-19 Cases”. These are manually defined by
referring to the high-frequency keyword. After the topic modelling process
is completed, the researchers choose a mental health-related topic for further
experimentation on the issue. Topic 2 is more conducive to mental health
based on the topic modelling results. When it comes to Topic 2, "Lockdown
and Mental Health", most individuals disagree with the policymakers'
decision to allow factories to function during the lockdown as factories are
a primary source for the rising number of COVID-19 cases. As a result, the
lockdown is extended, and many are concerned about their job security and
fear losing their source of income.

The extension of the work can also be carried out by performing
sentiment classification on the topics obtained in this study. For future study,
the researchers aim to use lexicon-based approaches to simulate text
polarity. The researchers intend to construct a model using a machine
learning classifier. Finally, the researchers aim to assess the performance of
the lexicon-based approach using accuracy, precision, recall, and the f-
measure to ensure the accuracy of the experiment results and increase the
credibility of sentiment identification on the subject.

6. References

Alash, H. M., & Al-Sultany, G. A. (2020). Improve topic modeling
algorithms based on Twitter hashtags. Journal of Physics:
Conference Series, 1660(1). https://doi.org/10.1088/1742-
6596/1660/1/012100

Bento, A. I., Nguyen, T., Wing, C., Lozano-rojas, F., Ahn, Y., & Simon, K.
(n.d.). Information Seeking Responses To News Of Local COVID-
19 Cases: Evidence From Internet Search Data 1 School of Public
Health, Indiana University, Bloomington 2 O ’ Neill School of
Public and Environmental Affairs, Indiana University, Bloomington
3 Lud.

Berry, N., Emsley, R., Lobban, F., & Bucci, S. (2018). Social media and its
relationship with mood, self‐esteem and paranoia in psychosis. Acta

Psychiatrica Scandinavica, 138(6), 558–570.

Blei, D. M., Ng, A. Y., & Edu, J. B. (2003). Latent Dirichlet Allocation

Michael I. Jordan. In Journal of Machine Learning Research (Vol.

3).

Campion, J., Javed, A., Sartorius, N., & Marmot, M. (2020). Addressing the

public mental health challenge of COVID-19. The Lancet

Psychiatry, 7(8), 657–659. https://doi.org/10.1016/S2215-

0366(20)30240-6

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009).

Reading tea leaves: How humans interpret topic models. Advances

in Neural Information Processing Systems, 288–296.

Chauhan, P. (2017). Sentiment Analysis: A Comparative Study of

Supervised Machine Learning Algorithms Using Rapid miner.

International Journal for Research in Applied Science and

Engineering Technology, V(XI), 80–89.

https://doi.org/10.22214/ijraset.2017.11011

Cheng, X., Cao, Q., & Liao, S. S. (2020). An overview of literature on

COVID-19, MERS and SARS: Using text mining and latent

Dirichlet allocation. Journal of Information Science,

0165551520954674.

Cho, S. E., Jung, K., & Park, H. W. (2018). Social media use during Japan’s

2011 earthquake: How Twitter transforms the locus of crisis

communication. March 2015.

https://doi.org/10.1177/1329878X1314900105

Coughlan, M., Cronin, P., & Ryan, F. (2009). Survey research: Process and

limitations. International Journal of Therapy and Rehabilitation,

16(1), 9–15.

Du, Y. (2021). A Deep Topical N-gram Model and Topic Discovery on

COVID-19 News and Research Manuscripts. Electronic Thesis and

Dissertation Repository. https://ir.lib.uwo.ca/etd/7797

Gualano, M. R., Moro, G. Lo, Voglino, G., Bert, F., & Siliquini, R. (2020).

Effects of Covid-19 Lockdown on Mental Health and Sleep

Disturbances in Italy.

Jiang, H., Zhou, R., Zhang, L., Wang, H., & Zhang, Y. (2019). Sentence

level topic models for associated topics extraction. World Wide

Web, 22(6), 2545–2560.

Kaila, R.P. & Prasad, A. V. K. (2020). Informational Flow on Twitter -

Corona Virus Outbreak – Topic. International Journal of Advanced

Research in Engineering and Technology (IJARET), 11(3), 128–

134.

Kherwa, P., & Bansal, P. (2019). EAI Endorsed Transactions on Scalable

Information System s Topic Modeling: A Comprehensive Review.
7(24), 1–16.
Kwok, S. W. H., Vadde, S. K., & Wang, G. (2021). Tweet topics and
sentiments relating to COVID-19 vaccination among Australian
twitter users: Machine learning analysis. Journal of Medical
Internet Research, 23(5). https://doi.org/10.2196/26953
Lee, H. (2020). Implement emergency response to domestic violence amid
COVID-19 crisis. Women’s Aid Organisation.
https://wao.org.my/implement-emergency-response-to-domestic-
violence-amid-covid-19-crisis/
Maliszewska, M., Mattoo, A., & van der Mensbrugghe, D. (2020). The
Potential Impact of COVID-19 on GDP and Trade: A Preliminary
Assessment. The Potential Impact of COVID-19 on GDP and
Trade: A Preliminary Assessment, April.
https://doi.org/10.1596/1813-9450-9211
Nga, J. L. H., Ramlan, W. K., & Naim, S. (2021). Covid-19 pandemic and
unemployment in Malaysia: A case study from Sabah.
Cosmopolitan Civil Societies, 13(2), 73–90.
https://doi.org/10.5130/ccs.v13.i2.7591
Schleicher, A. (2020). The impact of COVID-19 on education: Insights from
education at a glance 2020. OECD Journal: Economic Studies, 1–
31. https://www.oecd.org/education/the-impact-of-covid-19-on-
education-insights-education-at-a-glance-2020.pdf
Shanmugam, H., Juhari, J. A., Nair, P., Chow, S. K., & Ng, C. G. (2020).
Impacts of COVID-19 Pandemic on Mental Health in Malaysia: A
Single Thread of Hope | Shanmugam | Malaysian Journal of
Psychiatry. Malaysian Journal of Psychiatry Ejournal, 29(1), 78–
84.
https://www.mjpsychiatry.org/index.php/mjp/article/view/536/415
Wu, Y.-C., Chen, C.-S., & Chan, Y.-J. (2020). The outbreak of COVID-19:
An overview. Journal of the Chinese Medical Association, 83(3),
217.
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., & Zhu, T. (2020). Public
discourse and sentiment during the COVID-19 pandemic: Using
latent dirichlet allocation for topic modeling on Twitter. PLoS ONE,
15(9 September), 1–12.
https://doi.org/10.1371/journal.pone.0239441

Information Requirement for Plants in
Augmented Reality Application using

Participatory Design

Nabihah Yusof1, Rozianawaty Osman2

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,
40450 Shah Alam, Selangor, Malaysia

[email protected], [email protected]

Abstract: Learning about plants could be more interesting when
technologies like augmented reality are used. Yet, before functions could be
designed for the application, the information used on a mobile application
must be identified. Thus, this study identifies the appropriate plant
information to be displayed through the use of augmented reality on a
mobile application. The participatory design with the PICTIVE technique
was used during data collection. This information requirement provides an
insight into the appropriate plant information required for the application.
The results of this study show that the use of augmented reality applications
can display more attractive plant information on mobile devices.

Keywords: Augmented reality, Participatory design, Mobile application,
Plant learning, Information requirement

1. Introduction

Malaysia is best for ecotourism as it has many beautiful and tranquil
nature places, especially National Park. The National Park also plays a
major role in the tourism industry. It contains the most valuable assets of
Malaysia (i.e., flora and fauna). This is one of the reasons why the park
should be preserved. As stated by Tang (2020), establishing a natural park
system could promote natural ecological protection, create a beautiful
country and promote the harmonious cohabitation between humans and
nature (as cited in Yang et al., 2020).

According to Yew et al., 2020, static images of these plants are
frequently displayed in books, picture walls, posters and brochures to
showcase these wilderness riches. While the use of static graphics serves its
job, they are less appealing when it comes to conveying information. On
many occasions, there is a restricted amount of space available to present
information in the media. As a result, the audience was unable to learn more
about the plants. One of the possible solutions is by introducing technology
such as Augmented Reality (AR) to view more information about any plant
displayed on flyers, brochures and posters.

The paper is written as follows: Section two explains the literature
review of the study. Section three describes the methodology used in the
study. Section four presents the results of the study. Finally, section five
concludes the paper.

2. Literature Review

In education, augmented reality helps students to learn through rich
visuals and immersion in the subject matter. Textbooks, tangible forms,
posters and printed brochures may all be replaced by augmented reality.
According to Billinghurst (2002), AR technology has progressed to the point
that it can be used in a far broader range of applications, and education is
one area where it may be particularly useful. Information and
communication technologies are entwined with society's daily life in a
variety of ways, and they play an active part in educational processes
(Martín-Gutiérrez et al., 2015). In a nutshell, the notion of eLearning using
AR appeals to a key human sense of information acquisition.

According to Chang et al. (2010), augmented reality can help
learners increase their desire to learn their educational realism-based
practises. AR has significant potential to deliver both compelling contextual,
on-site learning experiences and serendipitous exploration and discovery of
the interrelated nature of knowledge in the real world (Johnson et al., 2010).
According to Billinghurst & Dunser (2012), examining the effectiveness of
adopting AR in educational settings revealed that the high degree of
engagement given by the AR increases learners' kinesthetic, visual/spatial,
and collaborative problem-solving skills while also raising their motivation
levels.

Although augmented reality has been theorised and used for over 50
years, it has recently become a widely accessible technology as a result of
the development and commercialization of mobile technology (Sommerauer
and Müller 2014). The usage of AR technology allows users to interact
directly with objects. According to Chen et al. (2014), to enable context-
aware, integrated multimedia user interaction, virtual information or objects
are overlaid in real-time on photographs of the actual environment. The
application's usage of 3D visuals allows for realistic and natural interactions
(Benko et al. 2012). Users like interactive apps because they allow them to
interact with the information source rather than receive information in a
single way (Chen et al. 2014). Furthermore, interactive applications allow
users to choose what they want to see or hear, as well as how the information
is presented.

According to Wright et al. (2010), the participatory design intends
to increase research participants' involvement in technology and process
design to optimise the project effect. Incorporating participatory design into
ICT-based treatments may be a more effective way to engage participants
as active participants (Xu et al. 2019). The participatory design may be used
to create low-cost resources that can be used in any physical space, as well
as to assist individuals to acquire self-confidence and analyse difficult
issues.

3. Methodology

The research methodology of the study is described in this section
as the purpose to achieve the research objectives. The research objective is
to identify the appropriate plant information to be displayed to the users. To
achieve the objective, the researcher implements two parts of the
methodology. There are Part 1 and Part 2. Part 1 is an introduction and Part
2 is an information display.

In Part 1, the first thing is the participants will be explained the purpose
of the project and what tasks are required (Information Sheet). Users’
consent will then be taken. Then, the following information will be
collected: demographic information (age, gender, education, occupation);
experience (mobile application, augmented reality).

Part 2 is information display which is the objective of this task is to
identify the appropriate plant’s information to be displayed through

augmented reality. Participants will be provided with an image of a plant
and will be asked what they want to know more about the plant. Throughout
the tasks, participants’ responses will be video recorded. Plant information
suggested will be analyzed across participants.

This study also uses participatory design and the PICTIVE
technique to collect the data from participants. The participatory design
session is a basic task in which participants are given the tools to generate
and design mockups by the researcher. The PICTIVE technique was used in
the study because it is intended to be used to develop the interface from
scratch. Figure 1 shows the setting of participatory design and Figure 2
shows the design surface using the PICTIVE technique.
Setting:

Design Surface

Video Recorder

Fig. 1 The setting of participatory design

Design Surface:

Post-it
Notes

A4 Paper

Coloured Pen

Pen and Pencil

Mobile Poster
Device

Scissors

Fig. 2 Design surface using the PICTIVE technique

4. Results and Findings

In this section, the researcher will present the results from the
participatory design using PICTIVE techniques as mentioned earlier. After
conducting a face-to-face interview using PICTIVE techniques with seven
participants, the results were obtained as shown in Figure 3. The results
below show the information that needs to be displayed using augmented
reality.

Fig. 3 Results from Participants

Based on the results of the participants, the researchers have
summarized the plant information that needs to be displayed, which are the
name of plants, the scientific name of plants, the type of plants, the origin of
plants, the lifetime of plants, the benefits of plants, the appropriate
temperature of plants, the appropriate soil of plants, the colour of the plants
and the appropriate forest of plants.

5. Conclusion

In conclusion, the research focus on the objective of the study which
is to identify the appropriate plant information to be displayed to the visitors.
However, this study is concerned with the information required for
augmented reality plants application using participatory design. This study
implemented a participatory design using the PICTIVE technique to achieve
the best results.

6. References

Billinghurst, M. (2002). Augmented reality in education. New horizons for

learning, 12(5), 1-5.

Benko, H., Jota, R., & Wilson, A. (2012, May). Miragetable: freehand

interaction on a projected augmented reality tabletop.

In Proceedings of the SIGCHI conference on human factors in

computing systems (pp. 199-208).

Chang, G., Morreale, P., & Medicherla, P. (2010). Applications of

augmented reality systems in education. In D. Gibson & B. Dodge

(Eds.), Proceedings of Society for Information Technology &

Teacher Education International Conference 2010, 1380-1385.

Chesapeake, VA: AACE.

Chen, C. Y., Chang, B. R., & Huang, P. S. (2014). Multimedia augmented

reality information system for museum guidance. Personal and

ubiquitous computing, 18(2), 315-322.

Johnson, L., Levine, A., Smith, R., & Stone, S. (2010). Simple augmented

reality. The 2010 Horizon Report, 21-24. Austin, TX: the New

Media Consortium.

Martín-Gutiérrez, J., Fabiani, P., Benesova, W., Meneses, M. D., & Mora,

C. E. (2015). Augmented reality to promote collaborative and

autonomous learning in higher education. Computers in Human

Behavior, 51(2015), 752–761.

https://doi.org/10.1016/j.chb.2014.11.093.

Billinghurst, M., & Dunser, A. (2012). Augmented reality in the classroom.

Computer, 45(7), 56–63. https://doi.org/10.1109/MC.2012.111.

Sommerauer, P., & Müller, O. (2014). Augmented reality in informal

learning environments: A field experiment in a mathematics

exhibition. Computers & Education, 79, 59–68.

https://doi.org/10.1016/j. compedu.2014.07.013.

Wright, P., & McCarthy, J. (2010). Experience-centered design: designers,
users, and communities in dialogue. Synthesis lectures on human-
centered informatics, 3(1), 1-123.

Xu, Y., & Maitland, C. (2019, January). Participatory data collection and
management in low-resource contexts: a field trial with urban
refugees. In Proceedings of the Tenth International Conference on
Information and Communication Technologies and
Development (pp. 1-12).

Predictive Analytics of Alumni Employability

Tuan Nur Najwa Tuan Mohammad1, Sofianita Mutalib2* and Ariff Md Ab Malik3

2Research Initiative Group of Intelligent Systems, Universiti Teknologi MARA,
4050 Shah Alam Selangor, Malaysia
[email protected]

1Faculty of Computer and Mathematical Sciences,
Universiti Teknologi MARA, 4050 Shah Alam Selangor, Malaysia

[email protected]

3Faculty of Business and Management, Universiti Teknologi MARA,
4050 Shah Alam Selangor, Malaysia
[email protected]

Abstract: The issue of employability among graduates has received much
attention from the government. The role of higher education institutions is
to ensure all graduates can fit in the highly competitive Malaysian and global
job market and to avoid a mismatch between graduate qualifications and
labour market demand. Data mining had been used in this study because it
is the discipline of discovering novel and potentially relevant information
from enormous amounts of data. This study aims to propose a suitable
classification model for predicting graduates’ careers and whether or not
they will be employed. The results from predictive analysis can help various
parties in planning graduate job opportunities. Analysis was carried out on
alumni data of 20399 graduates from the period 2014 to 2021, by extracting
from a Malaysian public university. Before classification, an oversampling
technique known as SMOTE was implemented to treat the imbalance
between classes for the target variables. Three machine learning algorithms
were applied, namely Artificial Neural Network (ANN), Support Vector
Machine (SVM), and Random Forest to predict graduates’ careers and
whether or not the graduates will be employed based on “Kelayakan
Akademik” as the target class. The classification models were compared
based on misclassification rate, accuracy, precision, recall, FI-score, and
Receiver Operating Characteristics (ROC) - Area Under the Curve (AUC)
score. The result indicate that Random Forest outperformed other classifiers

with higher accuracy and the lowest misclassification rate. This predictive
model could facilitate the planning, development, and implementation of
programs related to the added value of graduates to further improve the
marketability and employability of graduates.

Keywords: Artificial Neural Network (ANN), Data Mining, Employability,
Support Vector Machine (SVM), Random Forest.

1. Introduction

In driving a country’s economies around the world, the idea of
employability plays an important part. As a benchmark for the current labour
market environment, this topic is significant (International Labour
Organization [ILO], 2020). This topic is thus a critical strategy and has
become a government’s agenda for encouraging its citizens and the younger
generation to work. The United Nations has also called on each region to (1)
encourage the transformation of educational institutions into a profession,
(2) examine and review its education systems, vocational training policies
and employment policies, and (3) offer young people in the workplace
benefits and goals (Organisation for Economic Co-operation and
Development [OECD], 2018). Graduates who are already pursuing higher
education are concerned about the issue of unemployment and the
difficulties of seeking work for recent graduates (Fernandez-Chung &
Ching, 2018). The increasing rivalry for jobs has increased between public
and private higher education institutions and the number of students enrolled
in them. In terms of human capital growth, the expansion of higher
education institutions is a good measure since it will create further options
for the population to receive tertiary education, thus increasing the country’s
human capital accumulation (Wan et., 2018).

Enabling reliable prediction necessitates the development of a good
predictive model for increasing the employability rate of a certain group of
graduate students. Nowadays, the increasing volume of data that can be
efficiently processed using machine learning has increased the difficulty of
scientific research. Given the problem addressed, this study aims to analyse
a suitable classification model that can be used to predict graduates’ careers
and whether or not they will be employed. Predictive analysis approaches
and data mining can assist different parties in arranging postgraduate
employment prospects. This study also focuses on data mining techniques
to analyse alumni’s employability dataset using several predictive models,

such as Artificial Neural Network (ANN), Support Vector Machine (SVM)
and Random Forest.

2. Related Work

Data mining for educational purposes has been expanded based on
student performance, staff actions, and administrative decisions (Unal,
2020). Knowledge discovery using data is another phrase for data mining
(Alomari et al, 2019). Learning, statistics, information technology, artificial
intelligence, data retrieval, and visualisation are all parts of data mining
(Arunachalam & Velmurugan, 2018). Educational data mining is a new
topic in the world of data mining. In today’s competitive countries,
educational institutions are using data mining technologies to investigate
and evaluate student performances, anticipate their findings to prevent
dropouts, and focus on excellent and bad performers in education. Education
quality must be enhanced by using a method such as education data mining.
Modern educational institutions demand data mining for their strategies and
future aims (Zoric, 2020).

In A’rifian et al. (2019) study, they examined three prediction
models: the Decision Tree (DT), Logistic Regression (LR) and Artificial
Neural Network (ANN). Data mining was used to find the correlations and
patterns that can facilitate one to make better conclusions. The study found
that the ANN was the best model for predicting the positioning of employees
in the public and private fields. The ANN scored the greatest accuracy
(81.52%) and the lowest error rate (18.48%). All things considered, the
ANN is the best model for predicting the adverse target of private sector
graduates as its value is larger than its sensitivity. This finding can be
utilised to forecast whether graduates are hired in the public or private
sector.

Another study by Vinutha & Yogisha (2020) projected the
employability and mapping of graduates by means of machine learning
algorithms based on their academic achievements, employability, and
industry demands. The study employed many machine learning methods
such as the Logistic Regression, the Decision Tree, k-nearest neighbour,
Support Vector Machine and Naïve Bayes to develop the model. The ANN
classifier resulted in the greatest precision (87.42%). This research may
benefit various organisations, including government, and private
companies, particularly in predicting students’ employability

3. Methodology

Following Cross-Industry Process for Data Mining (CRISP-DM),
this study involved four phases to apply a suitable classification model for
predicting graduates’ careers and whether they will be employed. The
iterative and sequence of phases are shown in Figure 1.

Data Data Pre- Classificati Evaluation
Colection processing on

Fig. 1 Research Framework

Data Collection. Data collection is the first phase in the model’s
development methodology. The study used secondary data acquired from
one of the IPTAs. The data consists of 79566 instances in rows and 106
attributes which include the eight years of data from 2014 to 2021.

Data Pre-processing. Prior to evaluation, the data were pre-
processed in the training database. This stage involved data cleaning, data
transformation, data discretion and data normalization. For data cleaning,
the researcher removed data from essential characteristics, identified
outliers, corrected inconsistent data, and deleted duplicate data with missing
values. The data cleaning procedure finished with 20399 out of 79566
instances from the raw data that are ready for mining. In this study, data
transformation was used to transform continuous form into nominal,
numeric and separated into certain scales, meanwhile, data discretion was
used to translate continuously numbered, nominal, and split qualities by
special scale. These approaches are intended to facilitate the process of data
analysis. The final phase in data preparation was data normalization, which
is a procedure through which data values are classified into particular values
by utilizing minimum and maximum stages with a range between 0 and 1.
After preprocessing, a complete dataset was obtained and can be used for
experiments. Then for feature selections, WEKA was used to select
attributes by employing Attribute Evaluator. To analyze the characteristics,
InfoGainAttributeEval was chosen. This tool assesses the value of the
characteristic in relation to classes or target variables by evaluating the
information gained. If the dataset is unbalanced, the researcher would use
the SMOTE technique to overcome it before the modelling phase to generate
high accuracy models. Hence, it is important to have a balanced dataset for
the classification model.

Classification. In this study, the researcher uses classification and
regression through a predictive data mining task on an alumni dataset. that
the methods of data analysis discussed are Artificial Neural Network
(ANN), Support Vector Machine (SVM) with Gaussian RBF kernel and
Random Forest. The dataset has been split into training with 70% and a test
set of about 30%, to ensure the accuracy of the experimental result and to
enhance the credibility of graduates' career predictions. For this purpose, the
researcher splits the dataset to reduce the error of over-fitting or under-
fitting and also to optimise the model. The rules of classification from the
training data are discovered and evaluated for the rest of the data. This study
explores the predictive ability of multiple classification algorithms models
in determining graduates' careers. Target class which is “Kelayakan
Akademik” was used to indicate whether alumni employment takes place
according to or not to academic qualifications, and no information was
obtained about their employment whether or not according to academic
qualifications. This target class is the attribute in a dataset that the researcher
is most interested in to fulfil the objectives of this study.

Evaluation. In this stage, the researcher has discussed and
compared the misclassification rate, accuracy, precision, recall, FI-score and
Receiver Operating Characteristics (ROC) - Area Under the Curve (AUC)
score of the models. The best algorithm was decided based on the
comparison to fulfil the second objective of this study.

4. Results and Discussion

4.1 Data Management

Data management is an administrative process to ensure the
reliability and accessibility of the data. In this study, data management is
done by splitting the alumni dataset by target class for predicting graduates’
careers and whether the graduates will be employed, which is “Kelayakan
Akademik”. Microsoft Excel was used for data management. Table 1 shows
the number of observations for the target class, which is 20399 of
“Kelayakan Akademik”.

Table 1. Number of Observations for Target Class

Target Class Description Number of observations for each
class

Does the current 1 = “YES” = 3149

Kelayakan job match your 2 = “NO” = 1481

Akademik academic 0 = “NO INFORMATION” =

qualifications? 15769

4.2 Analysis of Imbalanced Alumni Dataset With SMOTE

The dataset included the qualifications of academics from
“Kelayakan Akademik”, with 15769 out of the 20399 alumni posts without
information on whether or not they followed academic qualifications. Only
77.3% of the alumni did not provide information about their employment,
on whether or not according to academic qualifications in the sample were
from the majority class. Figure 2 shows the class distribution of the
imbalanced alumni dataset for “Kelayakan Akademik”. The x-axe
represents the class, which indicates whether or not the alumni's
employments were according to their academic qualifications. Also noted is
the lack of information about their employment, whether or not it is
according to their academic qualifications. The y-axe represents the number
of alumni for each class. The blue bar, which is hardly visible, indicates that
the alumni employments are not according to their academic qualifications
(a minority class). Figure 3 shows a graphical representation of the
imbalance ratio where the minority class account for 7.26% of the total
dataset containing 20399 alumni.

Fig. 2 Imbalanced Alumni Dataset for “Kelayakan Akademik”

The subsequent phase of the experiment was to apply the data-point
level approach methods to the alumni dataset. To counteract the effect of the
class imbalance, a supervised synthetic minority oversampling technique
(SMOTE) was applied. The SMOTE technique was used to over-sample the
minority instances and make them equal to the majority class. The 0 numeric
value indicates that the alumni are without information on whether or not
they followed their academic qualifications; a value of 1 indicates that the
alumni's employment was according to academic qualifications (YES), and
a value of 2 indicates vice versa. The class with minority increased the total
number of records in the majority class, resulting in an equal number of
records for the classes. Therefore, the alumni dataset for “Kelayakan
Akademik” increased from 4630 to 15769 for the “YES” and “NO” classes,
as shown in Figure 3.

Fig. 3 Balanced Alumni Dataset for “Kelayakan Akademik”

4.3 Analysis of Performance Comparison of The Models
By comparing the three prediction classifiers of Support Vector

Machine (SVM) with Gaussian RBF kernel, Random Forest, and Artificial
Neural Network (ANN) were verified. Table 2 shows the prediction
performances of “Kelayakan Akademik” as the target class for alumni
datasets based on three prediction classifiers.

Table 2. Prediction performance of “Kelayakan Akademik” based on
Three Prediction Classifiers.

Classifiers Accur ROC Precisi F1- Recall RMSE
acy AUC on score
SVM_RBF 0.63 0.98
Random 0.63 0.80 0.62 0.61 0.92 0.41
Forest 0.71 0.84
ANN 0.92 0.98 0.92 0.92

0.71 0.88 0.71 0.70

Random Forest achieved the best prediction performance (0.92 for
accuracy, precision, F1-score, and recall). Compared to other classifiers, the
accuracy value of this paper was 0.21–0.29 higher than others. The RMSE
of Random Forest reached 0.41, which was the smallest of the SVM with
Gaussian RBF kernel and ANN. This target class produced slightly high
accuracy (about 0.92 with low values for the RMSE). From this result, this
classifier was better at representing the dataset for the machine learning
model in predicting graduates’ careers based on graduate employment
according to academic qualifications. This finding also implies that Random
Forest was able to predict with low errors. By comparison, the prediction
performance of the Random Forest was significantly better than others. The
Random Forest could also accelerate the training process and improve the
operation efficiency by determining the split point. At the same time, this
classifier can preserve the excellent property of the gradient boosting
decision tree, which could effectively predict graduates' careers and whether
or not they will be employed.

Table 3 presents the importance of each predictor by using Gini
importance in the Random Forest classifier for “Kelayakan Akademik”
which determines the graduates’ career and whether or not they will be
employed. Based on Table 3, the most important feature in predicting
graduates’ careers turned out to be “Program”. Arguably, graduates would
choose the “Program” factor that will have an impact on their employment
choices based on their academic qualifications. The findings of this study
also revealed that graduates are not interested in choosing jobs that were
outside their field of studies. This was one of the reasons for many graduates
struggling to find work after graduation. Accordingly, universities must
educate students to not be selective in choosing a career, particularly when

the labour market was diminishing. Findings from the table also indicate
“Status Konvo”. as the least important feature.

Table 3. Feature Importance for “Kelayakan Akademik”

Feature Importance Feature Scores

Program 0.182

Sektor Pekerjaan Dikemas Kini 0.132

Fakulti 0.122

CGPA 0.074

Jumlah Pekerjaan Sekarang 0.074

Sektor Ekonomi Dikemas Kini 0.073

Keputusan SPM BI 0.066

Sektor Ekonomi 0.065

Keputusan SPM BM 0.054

Sektor Pekerjaan 0.048

Status Pekerjaan 0.037

Status Pekerjaan Sekarang 0.023

Taraf Pekerjaan 0.021

Pendapatan 0.015

Jantina 0.015

Status Konvo 0.000

5. Conclusion

This study employed three methods for evaluating the performance
of different classification algorithms to identify which algorithms would
give the best result in predicting graduates’ careers. Based on the prediction
accuracy results as well as the classification errors, the researcher concluded
that the Random Forest classifier generates the best performance and is the
most efficient in learning and classification. The Random Forest classifier
achieved a greater accuracy of 0.92 with an ROC AUC score of 0.98, which
outperformed all the classifiers under analysis in classifying alumni
employment, whether or not according to their academic qualifications, and
no information is available about their employment, whether or not
according to their academic qualifications. The Random Forest also
obtained the highest values in terms of the F1-score (0.92) and generated the
least amount of error rates (0.41). With the highest precision (0.92), it could
be concluded that the model generated by Random Forest returned more
relevant data than irrelevant data against the other classifiers. These findings
mark the fulfilment of the research objectives.


Click to View FlipBook Version