APLIKASI
KOMPUTER DALAM
KAJAJIAN
LINGUISTI
skbl2113
Sifat Data Linguistik
dan Keperluan yang
perlu diPenuhi
dalam Persekitaran
Pengkomputeran
Penyelidikan
Linguistik
SKBL2113 Aplikasi Komputer dalam
Linguistik
Mukadimah
Manfaat kuasa komputer elektronik sebagai
alat untuk ahli linguistik begitu luar biasa
sekali.
Perisian komersial sedia ada tidak memenuhi
semua kehendak ahli linguistik, dan
masyarakat linguistik pula belum lagi dapat
membangunkan perisian yang dapat mengisi
jurang ini.
Kejayaan dalam membangunkan perisian
untuk ahli linguistik bergantung kepada
metod yang diguna pakai dalam
memodelkan sifat data yang ingin kita
manipulasi.
Lima ciri penting data linguistik yang perlu
diketahui dalam membangunkan perisian
untuk ahli linguistik.
How language is studied?
studies of its structure
identify the structural units and classes of a language
describing how smaller units can be combined to form a
larger grammatical units
studies of its uses
how speakers and writers EXPLOIT the resources of
their language
study the actual language used in naturally occurring
texts
focus on a particular linguistic structure
similar structure occurs in different contexts and serve different function
multiple structures that are so similar in their meaning and grammatical function
focus on the language of a text or a group of speakers/writers
language of women compared to language of men?
language of individual author compare to the language used by his contemporary?
focus comparing the language of different texts or group of
texts
describing the characteristics of registers
how to find the pattern in a particular register?
What data do linguists
use to investigate
linguistic phenomena?
Data gained by intuition
The researcher‟s own intuition
(“introspection”)
Other people‟s (“informant‟s”) intuition
(accessed, for example, by elicitation
tests)
Naturally occurring language
Randomly collected texts or occurrences
(“anecdotal evidence”)
Systematic collections of texts
(“corpora”)
Sifat Data Linguistik
Bersifat Multilingual
(Multilingual nature)
Bersifat berturutan (Sequential
nature)
Bersifat Hierarki (Hierarchical
nature)
Bersifat Multidimensi
(Multidimensional nature)
Bersifat Sangat Bersepadu
(Highly integrated nature)
Bersifat Multilingual
(Multilingual nature)
The data that linguists work
with typically include
information in many languages
example: bilingual/multilingual
dictionary
example: bahasa melayu text
quotes a paragraph in english
Fundamental property of
textual data
Viewed as special character
problem
The computing environment
must be able to keep track of
what language each datum is
in, and then display and
process it accordingly.
Bersifat berturutan
(Sequential nature)
Stream of speech is a succession of
sound that unfolds in temporal
sequence
Written text is sequential in nature,
as word follows word and sentence
follows sentence
Changing the order of constituents
can change the meaning of the text.
Word processors excel at modeling
the sequential nature of text
The computing environment must be
able to represent the text in proper
sequence
Bersifat Hierarki
(Hierarchical nature)
Hierarchy is a fundamental
characteristic of data structures
in linguistics
example: syntactic analysis - a
sentence may contain clauses
which contain phrases which
contain words.
text analysis
the structure of a lexicon
meanings
Solution: Standard Generalized
Markup Language SGML. HTML
https://www.w3schools.com/html
/html_editors.asp
the computing environment
must be able to build
hierarchical structures of
arbitrary depth.
Bersifat Multidimensi
(Multidimensional
nature)
Stream of speech which has form
and meaning in many simultaneous
dimensions
The meaning of the text
Interlinear text processing systems
Database managers
The computing environment must be
able to attach many kinds of
analysis and interpretation to a
single datum
Bersifat Sangat
Bersepadu (Highly
integrated nature)
The computing environment
must be able to store and
follow associative links between
related pieces of data
The separation of
information from
format
The computing environment
must be able to present
conventionally formatted
displays of the data.
Reference
Simons, Gary F. 1998. The nature of
linguistic data and the requirements
of a computing environment for
linguistic research. In Using
Computers in Linguistics: a
practical guide, John M. Lawler and
Helen Aristar Dry (eds.). London
and New York: Routledge. Pages 10-
25.