The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2016-02-29 23:13:55

UWA_Brochure_20160301

UWA_Brochure_20160301

Data
Intensive
Discovery
at UWA

Data Intensive Discovery at UWA
The University of Western Australia,
December 2015. All rights reserved.

Images credits:
Brock Hogan (Cover), UWA Image (Inside Cover)
Alesandro14/Shutterstock (p.4-5), pixelparticle/
Shutterstock (p.7), Sashkin/Shutterstock (p.8-9), Djorgovski
et al, (Caltech) (EoR image); Casey Reed (Pulsar image);
NASA/JPL-Caltech/SSC (Galaxy evolution image -NGC
3190 Field); NASA/Stanford-Lockheed Institute for Space
Research¹s TRACE Team (Cosmic Magnetism image-Sun’s
Corona); NASA/JPL-Caltech (Cradle of life image). (p.12-
13), 233304076/Shutterstock (p.14-15), D. Kucharski K.
Kucharska/Shutterstock (p.17), paulista/ Shutterstock
(p.18-19), Sergey Nivens / Shutterstock (p.20-21), McIek /
Shutterstock (p.22-23), UWA Image (p.25), Ken Mulvaney,
reproduced with permission of Murujuga traditional
owners (p.26-27), wacomka/Shutterstock (p.28-29),
Idea Studio/ Shutterstock (p.31), UWA Image (p33),
iStock_000004759130XX (p35)

02

03

Introduction We live in an age of almost
incomprehensible amounts of
Data Intensive data. Vast expanses of it are
Discovery at the produced every second of every
University of day. IBM estimates that we
Western Australia create 2.5 quintillion bytes of
information daily. So much so
Professor Robyn Owens that 90 per cent of data in the
University of Western Australia world today has been created in
Deputy Vice-Chancellor (Research) the past two years alone.

Data comes from everywhere,
generated from new advances in
technologies and systems such as
mobile phone interactions, social
media posts, online transactions
and searches, as well as sensor
technologies and machine-to-
machine interactions and the Internet
of Things.

As the power of analytic tools
increases, the ability to identify crucial
patterns is also becoming faster and
more accurate, meaning that we
are able to discover the answers to
questions that in the past would have
been impossible and generate new
questions, previously unthinkable.

This explosion of data has led to a
whole new field of Data Science, the
creation and application of powerful
new methods to collect, curate, store
and analyse data, and make new
discoveries.

04

It was US management theorist From helping to control the spread of Context
Russell Lincoln Ackoff who insects causing widespread famine independence
first combined the terms data, in East Africa, to climate analytics,
information, knowledge and wisdom to tracking the spread of infectious Understanding
into a single formula back in 1989 diseases, to advising on engineering
with his ‘knowledge pyramid’ and designs for billion-dollar offshore Data
while this has sparked much debate oil and gas developments, to radio Information
across the past few decades, there is astronomy and leading a step change Understanding relations
no doubt the role that expertise, skill in the development of new software Knowledge
and innovation play in bringing ‘big tools and techniques to create new Understanding patterns
data’ to life. ways of processing and accessing Wisdom
large research data, UWA is providing Understanding principles
Advancements in data science global solutions with big data.
driven by university research is
revolutionising industry, government The potential opportunities cut
and society and The University of across every sector in our community,
Western Australia (UWA) has world from health, to financial services,
leading capabilities and expertise in energy, education, government,
data intensive discovery across the manufacturing and transport.
areas of analytics, infrastructure and
applications. While the sheer volume and
complexity of data mean there
At UWA, with an international are also challenges, we see these
reputation for excellence in teaching, as an opportunity to discover
learning and research, we understand new knowledge and form new
that data intensive discovery is not partnerships across the world.
just big science but has the power to
shape every aspect of our world. Here we share some of our stories and
I would encourage you to come and
Our ability to apply a multi- talk to us, should you wish to find out
disciplinary approach to harness more.
and mine this data is ensuring that
exciting new ground is being covered
in this field in every way.

05

World class infrastructure

We have the
infrastructure to
support our data
intensive discovery

Governments, industry and The Pawsey Supercomputing Centre Supercomputing Centre is a state-of-
academia have all identified the-art facility delivering cutting edge
data science as the next The Pawsey Supercomputing Centre science for Australia’s future.
'cutting- edge' sector of the is an internationally significant
knowledge economy. supercomputing facility, situated only With its focus on innovation, Pawsey
Advancements in data science 11 km from UWA. Established in 2000 contributes to a range of national
can help improve decision- as a joint venture between WA's four projects, connecting scientists and
making in such fields as public universities and CSIRO, it is the providing them with technologies
healthcare, manufacturing, longest running and most successful such as Virtual Laboratories and
energy efficiency, venture of its type in Australia. the Pawsey Supercomputing Centre
environmental sustainability, Research Cloud.
education and job training, Named in honor of Dr Joseph
transportation, government Pawsey, the father of Australian The centre also supports the national
operations, and research and radio astronomy, the Pawsey Research Data Storage Infrastructure
development. Supercomputing Centre stands at (RDSI) project to create an Australia-
To achieve this however the forefront of Australia's most first archive to preserve past and
requires not only the ‘people important scientific disciplines present research data for future
power’ and brains of some of handling computational challenges of generations.
the world’s best thinkers but the highest scale.
also the right high-tech and It is an integral part of the Square
cutting-edge infrastructure, The Centre services key scientific Kilometer Array (SKA), the largest
something that we understand areas such as radio astronomy, scientific project in history, cementing
at UWA. bioinformatics, resources science and Australia's place as an important
energy research, ensuring Australia global hub in science and radio
remains internationally competitive in astronomy.
sectors of national significance.
Housing Magnus, the most powerful
public research supercomputer in the
Southern Hemisphere, the Pawsey

06

The collection and interpretation of
Big Data cuts across different types of
technologies and industries and refers to
information which cannot be processed
or analysed in a timely or cost-effective
manner using traditional techniques.

The SKA Project The SKA telescope will be co-located engineers and data specialists
in Australia and Africa. It will have an that could support the build up
The SKA Project is an international unprecedented scope in observations, of the SKA in Australia through
effort to build the world's largest radio exceeding the image resolution design, construction and ultimately
telescope, with a square kilometre quality of the Hubble Space Telescope operations.
(one million square metres) of by a factor of 50 times, whilst also
collecting area. The most capable having the ability to image huge areas In 2014, Deloitte Access Economics
radio telescope ever built, it will of sky in parallel. identified ICRAR as being one of the
expand our understanding of the With a range of other large telescopes top five centres of its kind in the
Universe and drive technological in the optical and infrared range being world.
development worldwide. built and launched into space over
the coming decades, the SKA will ICRAR’s Data Intensive Astronomy
The scale of the SKA represents a perfectly augment, complement and (DIA) team, located at the
huge leap forward in both engineering lead the way in scientific discovery. Centre’s UWA node, is comprised
and research and development of researchers from astronomy
towards building and delivering a The International Centre for Radio and industry who have led the
radio telescope, and will deliver a Astronomy Research (ICRAR) development of data and operations
correspondingly transformational systems for billion-Euro astronomical
increase in science capability when The International Centre for Radio infrastructures in Europe and South
operational. Astronomy Research (ICRAR) is a America.
joint venture between UWA and
Deploying thousands of radio Curtin University with additional When The SKA comes online it will
telescopes, in two unique funding from the State Government of produce a relentless stream of science
configurations, it will enable Western Australia (WA). data products that are exascale in
astronomers to monitor the sky in terms of their storage and processing
unprecedented detail and survey ICRAR was designed to be a multi- requirements.
the entire sky thousands of times skilled institute of astronomers,
faster than any system currently in ICRAR’s DIA team are now leading the
existence. international effort to address the
challenges surrounding the flow of
data within the SKA observatory.

07

World class infrastructure

Centre for Applied Statistics (CAS) The expertise among CAS team −− Development of a mathematical
members is extensive and varied, with framework to model the
The UWA Centre for Applied Statistics current work being carried out within progression of Alzheimer’s
(CAS) in the Faculty of Engineering, areas including Bayesian methods, disease (with the Commonwealth
Computing and Mathematics provides bootstrapping, neural networks, and Scientific and Industrial Research
consulting services to a broad range of support vector machines. Organisation—CSIRO);
clients, both internal and external to
UWA, in addition to the teaching and −− With the present focus of big-data −− Development and validation
learning of statistical concepts and related applications, current of improved algorithms for the
methodology. areas of research focus within CAS modelling and alignment of
include: complex sleep states (with the
Led by Director Dr Kevin Murray and Centre for Sleep Science);
made up of between 10-15 group −− Investigation of the effectiveness of
members, CAS offers collaborative mixed models in determining the −− Development of improved
services to researchers as well as association between rare genetic analytical models of childhood
free consultations to UWA doctoral variants and complex disease (with growth patterns (with the Telethon
students. Much of the work conducted the Centre for Genetic Origins of Kids Institute); and
by the group is carried out in Health and Disease);
collaboration with commercial and −− Numerous supervised research
research institutions including nearby −− Estimation of the burden of projects, including simulation
Sir Charles Gairdner Hospital and the emergency department utilisation studies testing alternative
Telethon Kids Institute. using the extensive linked historical statistical methods for fishery
admission and separation database catch rate standardisation and
The primary focus of the group in WA (with the Collaborative for the (speculative) modelling of AFL
involves the propagation of statistical Health Analysis and Statistical results.
rigour in research output from UWA. Modelling);
More recently CAS has adjusted The unique position that CAS holds as
to accommodate the increased −− Modelling of links between a link between commercially available
availability of ‘big data’ throughout community-based palliative care data and high quality research
the academic and commercial for people with disabilities and has solidified with the advent of
communities. associated emergency department increasing data and computer literacy
in the last years of life (a linked outside of the research environment.
08 data project, utilising the Mental
Health Information System of
WA in collaboration with Curtin
University);

Centre for Microscopy The Institutional Research Data A second dedicated high speed fibre
Characterisation and Analysis Store (IRDS) optic connection ensures reliable
(CMCA) high-performance bandwidth, with
speeds of up to 10Gbps to the UWA
UWA’s $45 million world-class Centre The Institutional Research Data Store network.
for Microscopy Characterisation (IRDS) at UWA provides researchers
and Analysis (CMCA) is an inspiring and research groups with a secure, The IRDS can expand well beyond the
collaborative research facility with scaleable storage facility for their initial storage capacity (2.5 petabytes)
more than 40 instrument platforms digital research data. to accommodate UWA's future
located on five sites. research needs.
It provides scalability, flexibility,
Supported by an intellectual hub of and is designed for working with The IRDS is available to all UWA
expert scientists, CMCA is a unique the large datasets commonly used researchers, including academic
analytical facility that supports within research. IRDS supports and professional staff and Higher
cutting-edge research in biological, the requirements of researchers in Degree by Research (HDR) Students.
biomedical, geo-environmental terms of confidentiality, integrity, Proposed future releases include
and physical sciences, with strong availability, security and ownership collaboration with national and
relevance to the energy and minerals of data. international researchers via
sector. authentication methods such as the
It cuts across many disciplines Australian Access Federation (AAF).
The Centre’s application in earth including Architecture, Landscape
and environmental sciences focuses & Visual Arts, Education, Arts, Research observations, findings
on analysing and characterising Engineering, Computing and or outcomes in digital form can be
minerals, rocks and soils. Its high- Mathematics, Medicine, Dentistry stored in the IRDS. The data may be
impact capability supports Australian and Health Sciences, Science, Law numerical, descriptive or visual and
industry in a broad range of areas. and Business, and the UWA Cultural raw or analysed, experimental or
Precinct. observational.
CMCA has facilitated atomic-
level analysis to understand new The IRDS features a dedicated high
generation steels, contributed speed fibre optic connection that
to alumina-based research and ensures reliable high-performance
development (R&D) projects, helped bandwidth, with speeds of up to
identify and validate new mining 10Gbps to the UWA network core.
exploration methods, and assisted in
‘cracking’ methane gas.

09

Data Intensive Science across UWA

Living sustainably in our
ancient environment

Page 16

Indigenous knowledge

Page 26

Long term living standards

Page 32

Australia within the World

Page 22

Health and wellbeing

Page 18

10

Food production and quality

Page 16

Natural resources

Page 28 / 30

Human creativity and culture

Page 20

Advancing data intensive science

Page 12 and 24
11

Advancing data intensive science Data Processing, Access and Storage

Reaching
for the
stars

Key Contacts: Exploring the entire Universe through astronomers, engineers and data
Professor Peter Quinn space and time, from now to the very specialists working collaboratively
International Centre for Radio Astronomy first stars and galaxies that existed across UWA and Curtin University.
Research more than 10 billion years ago, is an
Email: unparalleled feat of human scientific ICRAR’S Data Intensive Astronomy
[email protected] endeavour. The volume of data (DIA) team is based at UWA and is
Telephone: generated worldwide by new and leading the international effort to
+61 8 6488 4553 planned observatories is currently address the challenges surrounding
doubling every six to 12 months. the flow of data within the SKA.
Professor Andreas Wicenec
International Centre for Radio Astronomy This expansion and globalisation “Each year the SKA will produce 100
Research of astronomical research poses times more data than the world’s
Email: major technical and organisational internet traffic in 2010 and traditional
[email protected] challenges which are being addressed methods of processing and accessing
Telephone: through new collaborative alliances data are not equipped to manage
+61 8 6488 7847 of organisations. Managing, exploring such a huge volume,“ ICRAR Executive
and sharing the huge volumes of Director, Professor Peter Quinn said.
12 digital information flowing from these
new global facilities is focusing and Large data sets and computational
leading the international discussion. resources of the future are likely to be
concentrated at data centres located
The biggest and most capable radio around the world with end users
telescope ever built, The Square transparently interacting with them.
Kilometre Array (SKA) observatory, is
under development in Australia and Data will no longer be migrated
South Africa. from one place to another but rather
accessed, processed and explored
In Australia, the design, construction remotely across a distributed
and ultimately operation of SKA network. To manage this environment
is supported by the International and coordinate these interactions,
Centre for Radio Astronomy Research new software tools and techniques
(ICRAR), a multi-skilled institute of are needed.

“The Square Kilometre Array
(SKA) will be the biggest
and most capable radio
telescope ever built. It will
expand our understanding
of the Universe and drive
technological development
worldwide."

13

Advancing data intensive science Data Processing, Access and Storage

ICRAR’s DIA team comprises price/performance value proposition a Wide Area Network (WAN) from
researchers from astronomy to be realised at a time of ever hundreds of kilometres away into
and industry who have led the increasing budget pressure,” Professor memory and high performance non-
development of data and operations Wicenec said. volatile storage,” Professor Wicenec
systems for billion-Euro astronomical said.
infrastructures in Europe and South “ICRAR also operates a data and
America. The team is developing computing lab to provide system ICRAR is a joint venture between
new software tools and techniques support to its science teams, and the Curtin University and The University
to process and access large research DIA team at UWA directly supports of Western Australia, with additional
data. the various survey science activities funding from the State Government
using data gathered from telescopes of WA.
The Next Generation Archive System imaging the distant universe across a
(NGAS) is one example. Originally broad spectrum of wavelengths. Through our collaborations with high
created under the leadership of performance computing providers
Professor Andreas Wicenec, Director “We provide expertise for a large and vendors ICRAR often has access
of ICRAR’s Data-Intensive Astronomy variety of algorithms and data formats to new and exciting technologies that
program, while he was working with to achieve optimal deployment are trialled and evaluated and which
the European Southern Observatory, and usage of the available IT open up new opportunities to work
ICRAR has produced an optimised infrastructure.” collaboratively with industry partners
version of the NGAS to deal with who are facing similar challenges as
far higher data rates than originally The ICRAR team at UWA also they explore the natural resources of
possible. contributes to the SKA Science our planet.
Data Processor (SDP) work package
The NGAS infrastructure is currently which is responsible for the data Major Collaborators:
used to control many Petabytes of reduction, long term archiving The SKA Science Data Processor
data stored in hundreds of millions and dissemination of the vast data Consortium, consisting of 18 full and 10
of individual files, distributed and streams delivered by the SKA frontend associate partner organisations from 12
mirrored across four continents, while signal processing systems. countries in five continents.
providing transparent access for end The ICRAR DIA team is one of the four
users. “We are responsible for the SDP major contributors to this consortium.
‘Data Layer’ which will handle all Also the University of Cambridge,
It has also been adapted to run in aspects of data management, from Netherlands Institute for Radio Astronomy,
both supercomputing and cloud receiving the data streams through SKA South Africa and University of Cape
based environments, and to support to distributing them to thousands of Town, Oxford University, Massachusetts
the integrated usage of cloud individual compute nodes, triggering Institute of Technology, National Radio
resources for scientific analysis. the processing steps, collecting Astronomy Observatory, Victoria University
the intermediate and final results of Wellington (NZ), Raman Research
“NGAS’s unique architecture has and then providing access to those Institute (India), National Astronomical
the advantage of embedding results for the global astronomical Observatory of China, CSIRO and The
the processing for archive data community. University of New South Wales.
management within the storage
subsystems, allowing for near- “Essentially we are connecting an
infinite scalability with commodity enormous sensor network directly
component storage hardware. This in to a configurable and flexible HPC
turn has allowed for an exceptional system and streaming data through

14

“Many of the challenges we
face are shared by those
exploring our planet to find
new resources.”

15

Living sustainably in an ancient fragile environment Data Interpretation

Controlling whitefly
to give millions of
people food to eat

Key Contact: Bemisia tabaci, more commonly farmers a cassava plant that’s
Dr Laura Boykin known as the whitefly, is a resistant to the viruses and the
UWA School of Chemistry and worldwide agricultural pest whiteflies.”
Biochemistry costing global agriculture
Email: billions of dollars a year. After decades of thinking they were
[email protected] battling a single silverleaf whitefly
Telephone: Their effect is particularly devastating species, scientists now know there
+61 8 6488 4488 in East Africa where the sap-sucking are at least 34 morphologically
insects destroy entire crops of the indistinguishable species.
16 cassava plant, a crucial food source
for the region, causing widespread “It’s only been in the last seven years
economic hardship and famine. or so that people have started to do
sampling of the region,” Dr Boykin
A team of researchers at UWA, led said. “The more we sample, the more
by Dr Laura Boykin, is focusing their we realise there are tons more species
research efforts there, working of whitefly in Africa than we ever
with the Pawsey Supercomputing thought.”
Centre’s Cray® XC40™ “Magnus”
supercomputer. Dr Boykin and her team are using
genomics, supercomputing and
“It’s a massive problem,” Dr Boykin evolutionary history to understand
said. “I’m one of 15 principal the sap-sucking insects' genetic
investigators working on a new differences. The computational
project whose mission is to give challenge is in making sense of the

“Magnus is changing
the world in agricultural
development.”

vast amount of genomic data their — a calculation impossible without a and a productive programming
sequencing machines produce. supercomputer. environment, the Cray XC40
supercomputer excels at large-scale
“We have the task of trying to make The team is running MrBayes — a computations and reduces processing
sense out of billions of base pairs – program for Bayesian inference and times.”
billions of As, Ts, Gs and Cs at a time,” model choice across a wide range
she said. of phylogenetic and evolutionary So far, the team has analysed an
models. It uses Markov chain Monte entire genetic region for all the global
Using Magnus, a petascale Cray Carlo methods to sample the massive samples. Between 16 and 80 Markov
XC40 supercomputer at the Pawsey evolutionary tree space. (A Markov chains traversed the tree space on
Supercomputing Centre, the team chain is a random process that Magnus in just under 96 hours.
is generating phylogenetic trees of undergoes transitions from one state
whitefly species from around the to another on a state space.) Dr Boykin and her team are now
world. Phylogenetic trees represent making meaningful progress toward
evolutionary relationships, or Given the large size of the genetic distinguishing damaging whiteflies
genealogy, among species. datasets and sophisticated computing from others and arming scientists
techniques involved, the project is with the information they need to
For this project, the genetic datasets highly computationally intensive. develop management strategies.
involved thousands of base pairs.
Even with only 500 whiteflies in a “Magnus is well suited for this kind “Magnus is changing the world in
dataset, the possible relationships of problem,” Dr Boykin said. “With agricultural development,” she said.
between these flies run into the multiple processor technologies, “Controlling whiteflies in East Africa
octillions (a 1 followed by 27 zeros) a high-performance network, will give 700 million people more food
distributed operating system to eat.”

17

Health and well being Data Management and Access

World-famous
pregnancy intensive
lifetime study

Key Contacts: The Western Australian (Raine) information over the past 25 years
Professor Peter Eastwood Study follows the lives of 3000 which is being used to answer
School of Anatomy, Physiology and Human people from early pregnancy hundreds of research questions
Biology through to adulthood and has across a wide range of health and
made significant contributions developmental areas.
and to medicine and public health.
“Over the past 24 years, the
Ms Jenny Mountain It is one of the largest 2868 children born into the
School of Population Health successful cohorts (or groups Raine Study, and their parents,
of people to be studied) have generously participated in
Email: of pregnancy, childhood, 11 reviews, at the ages of 1, 2, 3, 5,
[email protected] adolescence and now early 8, 10, 14, 17, 18, 20 and 22
[email protected] adulthood to be carried out years,” Professor Eastwood said.
anywhere in the world.
Telephone: “From these reviews we have
+61 8 9346 1706 "It was designed to find out generated a phenotypic dataset which
+61 8 6488 6957 how early-life factors during fetal currently contains more than 70,000
development impact on child and measures and over 20 million genetic
adult health." said Scientific Director variants on each cohort participant.
of the study, Professor Peter
Eastwood. “Stored biological samples include
antenatal blood, cord blood, placenta,
The Raine Study families have milk teeth, blood, saliva, urine
provided environmental, and DNA. DNA has been obtained
developmental and health from 2000 participants, as well

18

The Raine Study is an
internationally unique and
rich resource for the study of
environmental and genetic
factors affecting health and
development.

as 1500 mothers and 900 fathers. ophthalmology, pregnancy and birth, streamlined approaches to data
Genome-wide genotyping data has reproductive health, sleep and risk storage, management and access.
been obtained for the children and taking behaviour.”
mothers.“ It is currently in discussion with IBM
The Raine Study’s extensive to determine whether its new IBM
As a result, the Raine Study national and international research Watson Health platform could be
represents an invaluable resource for collaborations continue to develop used to store, manage, and analyse
researchers: and add value to the cohort. the cohort’s data, and to provide
participants with selected access to
“The Raine Study now encompasses “The Raine Study is a member of 14 their own health information.
25 dedicated research groups, 150 consortia established to amalgamate
researchers and participates in genome wide association data,” said The resource value of the cohort
numerous multinational research Professor Eastwood. and the need for state-of-the-art
consortia,“ said Professor Eastwood. data management and access is
“A huge challenge lies in continuing set to increase even further, as the
“They bring expertise from 25 to successfully manage the growth Raine Study is looking to expand
broad areas of research including; of the dataset in relation to its participant base to include the
asthma and allergy, cardiovascular organisation, access, quality control, participants’ parents, grandparents
and metabolic heath, childhood security and distribution of data as and offspring.
developmental growth, dental health, well as successful translation and
diabetes,  genetic epidemiology, dissemination of research results.”
gastroenterology, infection
and immunity, mental health, To achieve this, the Raine Study is
musculoskeletal development, investigating state-of-the art, more
nutrition, physical activity,

19

Human creativity and culture Data Analytics

Putting
Shakespeare’s
works under the
microscope with Big
Data

Key Contact:
Dr Brett Hirsch
UWA School of Humanities
(English and Cultural Studies)

Email:
[email protected]

Telephone:
+61 8 6488 1173

Major Funding Sources:
Australian Research Council and The
Leverhulme Trust.

Major Collaborators:
Professor Hugh Craig, Director of the
Centre for Literary and Linguistic
Computing, University of Newcastle and
Professor Gabriel Egan, Director of the
Centre for Textual Studies, De Montfort
University.

20

He is the English poet and playwright
credited with up to 38 plays, and 154
sonnets whose works have been translated
into every major living language and are
performed more than those of any other
dramatist in the world.

However, as UWA’s Dr Brett Hirsch “Authorship attribution proceeds According to Dr Hirsch, while the
explained, the authorship of on the basis of comparing and previous edition of the Oxford
Shakespeare’s works attracts contrasting such models and Shakespeare remains the industry
significant academic and public profiles generated for each authorial standard text, praised for its
attention because of doubts over candidate with the texts under meticulous scholarship and editorial
which he wrote on his own and which consideration. “ rigour, the New Oxford Shakespeare
were workshopped with others. also represents the first complete
Much of the work being carried out by works edition of Shakespeare to
“To meet the demand for new Dr Hirsch and his colleagues has only conduct fresh inquiry on the question
plays and to revise old ones, early become possible recently, with the of authorship.
modern playwrights typically worked increasing availability of a substantial
with one another,” Dr Hirsch said. quantity of early modern writing, “As a species of data mining,
“Scholars acknowledge that a including the works of Shakespeare’s techniques of corpus-based stylistic
number of Shakespeare’s plays are contemporaries, prepared in machine- analysis and authorship attribution
such collaborations, but there is little readable structured data formats. are already used outside of
consensus on the precise shares of literary studies, such as in forensic
each collaborator. “ “While new methods for corpus-based document examination, sentiment
textual analysis are introduced, it is analysis, and the personalisation
Working with the Centre for Literary also common to adopt techniques and customisation of Facebook and
and Linguistic Computing at the developed for other purposes as Google advertising content,” he said.
University of Newcastle, with up varied as bioinformatics, data
to 10 academics and postgraduate compression, and cryptography to “As existing techniques become
students, Dr Hirsch researches the question of stylistic analysis and more robust and new methods
authorship and literary style through authorship attribution,” Dr Hirsch are developed, the precision and
the application of computational said. granularity with which authorship
and corpus-based quantitative attributions are made will increase.
techniques. “As part of a larger international team, In literary studies, this is of particular
our researchers have conducted interest in cases where only a
“Key to authorship attribution authorship attribution analysis that short fragment of text remains,
studies is the generation of authorial has fed directly into the next major since existing techniques produce
models and profiles,” he said. “That scholarly edition of Shakespeare’s unreliable results.”
is, distinctive patterns in the use of complete works, the New Oxford
formal linguistic features, such as Shakespeare, to be published by
preferences for certain grammatical Oxford University Press in 2016.”
constructions or lexicon.

21

Australia within the World Data Analytics

Detecting related
trading in two
financial securities

Key Contact: Finance research in High Frequency by keeping stock prices in check and
Dr Keith Godfrey Trading requires analysis of vast improving the efficiency of financial
UWA Business School amounts of market trading data. markets.
Email: There are tens of thousands of
[email protected] securities traded around the world “However, some trading activities
Telephone: each day, many of which are traded overstep the law, for example in
+61 8 6488 5839 hundreds of thousands of times the lead up to company merger
during the day. announcements when traders
22 act on inside information about a
Imagine analysing that data for forthcoming deal and trade the two
patterns that could, for example, companies before the deal is made
show where market manipulation or public.”
illegal insider trading is taking place.
Dr Godfrey said hedge fund traders
UWA’s Dr Keith Godfrey spent his rely on the premise their actions
PhD studying pairs of securities to cannot be detected easily.
investigate the way they are traded
together. “Trading on stock exchanges is
anonymous, and trades in two
“Pairs trading is a strategy which can or more securities are difficult to
generate profits from the difference match precisely,” he said. “As a
in price movements without being result, there has been little empirical
affected by their movements together documentation of pairs trading
and with the market,” he said. “Most despite the anecdotes that it is
of the related trading in two shares is widespread and can be profitable.
legal and indeed can benefit society

“Looking for improper
trading of this kind is
like looking for a needle
in a haystack, yet it is
of enormous interest to
regulators and others
with an interest in stable
financial markets.”

“If we can overcome the difficulty particular amount of time, typically a “The evidence of the prices being
in matching up the trades, we can few milliseconds, between executing pushed apart goes against the ‘law
investigate all forms of pairs trading two trades. of one price’ which suggests the two
including those that benefit society shares should trade at very similar
and those that may be indicative of “If we see a particular time difference prices.
market manipulation and/or illegal occurring more frequently than could
insider trading.” be expected from randomness, we “More generally I have also been
can infer the presence of related able to confirm empirically the long
Dr Godfrey said the size of the data trading.” standing hypotheses that traders are
to be analysed is enormous, with pairs trading securities from similar
a six-month test of the S&P 500 Dr Godfrey has used the approach industries such as two banks, two
index constituent stocks involving to study the twin securities of BHP integrated circuit manufacturers, or
1,048,326,510,036,240,846 possible Billiton, where an Australian stock and two energy companies.
pairings. a UK stock are traded simultaneously
in New York. He also analysed the “One of the avenues being explored
“Implementing a 100 second limit trading between all 124,750 pairs of further is the trading ahead of merger
between trades reduced this to the S&P 500 index constituent stocks. and acquisition announcements,” he
87,529,904,232,756 which is still a said. “Looking for improper trading
very large number – but this becomes “It turns out there is a pattern in the of this kind is like looking for a needle
manageable with parallel computing,” BHP Billiton pairs, where traders bring in a haystack, yet it is of enormous
he said. the prices closer together if they move interest to regulators and others
too far apart from an average over with an interest in stable financial
“My insight to analysing the data about 40 minutes, but push them markets.”
involves studying patterns in the time apart if they move closer together,”
differences between trades. The idea he said.
is each trader will typically take a

23

Advancing data intensive science Data Analytics

How Big Data is
helping the fight
against disease

Key Contact: It is not a link you would “Emerging diseases such as SARS
Professor Michael Small automatically make: mapping big and Ebola don’t conform to these
UWA School of Mathematics and Statistics data sets to understand and help assumptions,” he said. “In each case,
prevent the spread of infectious and transmission is highly community
Email: often catastrophic diseases such as dependent in the same way that
[email protected] Severe Acute Respiratory Syndrome computer viruses or rumour and
(SARS), Avian Influenza and Ebola. opinion on social networks can be
Telephone: quickly spread.
+61 8 6488 3877 However UWA’s Complex Data
Modelling group comprising of “In all these cases the agent of
Major Funding Sources: 18-20 world class mathematicians, infection is less important than
Health, Welfare and Food Bureau of the statisticians, computer scientists and the nature of the network that the
Hong Kong Government engineers who apply a wide range infectious agent travels on. Data
of mathematical and computational intensive science allows us to map
modelling techniques to better interaction between individuals
understand large and complex data across entire communities, across
sets have been doing just that. computers communicating on the
Internet and between users of social
Professor Michael Small said that media.”
while mathematical modelling of
disease transmission was ostensibly Professor Small said the team started
solved in 1927, the solution assumed by looking at the particular problem
a well-mixed and wide spread of transmission of SARS both in
contagion. Hong Kong and across the globe,

24

“In the case of SARS we were able to
show that disease spread was critically
dependent on transmission within
hospitals and that clusters of outbreaks
could be explained by social rather than
epidemiological factors.”

analysing the predicted transmission of outbreaks could be explained by Engineering Systems because there
pattern and how the spread could be social rather than epidemiological are similar challenges in modelling
controlled. factors. the flow of materials in engineering
networks. One can use very similar
“Key to understanding a problem “Hence, control of the disease techniques to optimise flow of
like this is being able to map the focussed on hospital infection information or resources through a
observed (and available but latent) and restricting or limiting social telecommunication or infrastructure
data onto a mathematical model of interaction,” Professor Small said. network.
the population and transmission,” he
said. Alternative techniques - using detailed “We are looking at distribution system
computational models - have been for things such as water and power
“Distribution of the population within developed by other members of the as well as transport of resources.
a city, transportation and movement group to model the spread of Dengue Applications are also arising in the
of people and living environment and Influenza, and also the spread of safe, efficient and reliable operation of
all need to be abstracted. The bushfire. large-scale engineering systems – such
model then provides a proxy for the as Floating Liquid Natural Gas (FLNG)
community and we can test the effect “Our work is mathematical and platforms.”
of transmission and transmission computational modelling informed by
control measures. and tested on real data and real-world This work is now being extended to
problems,” he said. look at particular problems of stability,
“In the case of SARS we were able robustness and reliability in the design
to show that disease spread was “This sort of work has natural of Complex Engineering Systems – how
critically dependent on transmission extensions to Engineering for Remote can one build structures like a FLNG
within hospitals and that clusters Operations, and in particular Complex platform that are optimally efficient,
but also robust.

25

Indigenous knowledge Data Collection and Storage

Keeping track of
rock art

Key Contact: Rock art, or human-made markings databases are the largest repositories
Professor Jo McDonald on natural stone, provides a unique of rock art data in Australia.
UWA Centre for Rock Art Research and visual archive of Australia’s history
Management going back at least 15,000 years with As part of the Archaeology discipline
Email: its study helping to bring the human group, the Centre has six rock art
[email protected] landscape to life. researchers working in Western
Telephone: Australia, China, North America and
+61 8 6488 4306 Australia is home to more than Africa and another five archaeologists
100,000 known and documented rock undertaking broader multidisciplinary
26 art sites, with many more remaining archaeological work.
unrecorded, and WA features some of
the nation’s most spectacular rock art The Pilbara Rock Art Database and
galleries.  the Canning Stock Route Database
are online rock art repositories with
Few landscapes offer as much rock art sites and associated data
tangible evidence of human history from across the Pilbara and Western
as the Pilbara, Kimberley and Desert regions, with the information
Western Desert regions, presenting accessible to UWA staff and students
archaeologists and rock art and associated institutions for
researchers with an extraordinary research purposes.
opportunity to learn more about the
rich visual histories associated with Partner Aboriginal communities
this art form. are also able to access the data,
and this is achieved by developing
The UWA Centre for Rock Art appropriate cultural protocols
Research and Management (CRAR+M) to ensure the cultural safety of

“We’ve been developing mobile apps
to assist in the consistent collection of
geospatially referenced data, which allows
for updated information to be used by
Rangers and researchers visiting sites to
monitor conditions.”

communities using this information. Corporation manage the devices has improved efficiency
The databases are crucial tools immense cultural resource and accuracy of data usage by field
aiding the repatriation of rock art that is the Murujuga National researchers. The development
information to traditional owners. Park and in adjacent lands of online databases has allowed
where there are development researchers at the University to share
“The Pilbara Database holds pressures.” information with remote partner
over 50,000 photographs, 1,200 Aboriginal communities – providing
documents, and data on more than Professor McDonald said the ongoing much faster exchanges of information
2,800 rock art sites (11,000 motifs),” challenge to collecting, managing and knowledge sharing.”
Centre Director, Professor Jo and maintaining information
McDonald said. about cultural sites lies in the Professor McDonald said CRAR+M
efficient collection of field data – is in the process of increasing its
“The Canning Stock Route Database and the subsequent mobilisation geographic capacity to include
holds more than 15,000 photographs, of information to researchers, rock art sites from other regions of
600 documents and 800 rock art and Indigenous owners and Rangers and Australia, and legacy data collections
ethnographic sites. other users. by significant historic rock art
researchers, photographers and
“A lot of the Pilbara site information is “We’ve been developing mobile apps recorders.
from the Dampier Archipelago – which to assist in the consistent collection
is a National Heritage Listed Place – as of geospatially referenced data, “We intend to keep developing
well as being a focus for mining and which allows for new and updated our systems and capacity to make
industrial infrastructure in the Pilbara. information to be used by Rangers this database relevant to our
and researchers visiting sites to growing partner communities,
“The data on rock art sites monitor conditions,” she said. industry collaborators and research
allows researchers to help “The move from paper to mobile colleagues” she said.
the Murujuga Aboriginal

27

Natural resources Data Collection and Processing

Measuring the
movement of the
world’s oceans

Key Contact: Predicting the movement of the ocean provide, they also affect global ocean
Dr Nicole Jones might sound like something out of circulation and therefore climate
School of Civil, Environmental and Mining science fiction but for UWA’s Ocean change prediction.
Engineering Dynamics Research Group, co-led
UWA Oceans Institute by Professor Greg Ivey and Dr Nicole “The strong currents and turbulence
Jones, the use of data modelling is created by breaking internal waves
Email: allowing them to do just that. can impose huge forces on seabed
[email protected] structures, so the safe design and
The knowledge is making it possible operation of offshore infrastructure
Telephone: for them to identify hotspots for can depend on our understanding of
+61 8 6488 3074 internal waves, in particular on them.”
Australia’s Northwest Shelf, an
Major Funding Sources: extensive and globally important gas UWA’s understanding of internal
Australian Research Council and oil region. waves leads to safe (or sometimes less
conservative) offshore infrastructure
Major Collaborators: Internal waves are similar to surface design, better climate models and
Oliver Fringer, Stanford University and waves but instead of travelling along improved knowledge of the transport
Cynthia Bulteau, Matthew Rayson and the top of the water, they travel along and distribution of marine sediments
Jeffrey Book from the Naval Research internal surfaces of constant water and nutrients.
Laboratory. density. They occur when cold water
from the bottom of the ocean is “We are using a combination of field
28 pushed up underwater mountains or observations from various locations
continental slopes by the surface tide. around Australia and specialised
ocean numerical models to improve
“Internal waves play a vital role in the our knowledge and ultimately the
transport and distribution of marine prediction of internal waves,” Dr
sediments and nutrients,” Dr Jones Jones explained.
said. “Through the ocean mixing they

Measuring mixing requires sampling “We’ve been able to identify
at a very rapid rate to capture all some of the important
eddies of different length scales. processes that influence the
Capturing internal waves accurately in generation, propagation
ocean numerical models requires very and ultimate dissipation of
high resolution in both the horizontal internal waves.”
and vertical dimensions across the
region of interest as the waves are 29
generated, modified and dissipated
by the bathymetry.

“We’ve been able to identify some
of the important processes that
influence the generation, propagation
and ultimate dissipation of internal
waves,” she said.
The ultimate goal of this work is to be
able to predict the location, size and
mixing of internal waves.

“Due to the vast variation in the ocean
conditions around the world, internal
wave fields can vary dramatically.

However, through gaining a process
understanding of them we can more
rapidly apply our knowledge to gain
predictive ability in new regions.”

Natural resources Data Interpretation

New techniques to
help build Australia’s
resource wealth

Key Contact: The resource industry invests heavily “This leads to highly biased and
Professor Eun-Jung Holden in collecting various types of data inconsistent interpretation outcomes
UWA Centre for Exploration Targeting throughout all of its exploration, and furthermore, the inability to
estimation, and production stages. validate and improve the outcomes
Email: due to unavailable ground-truth on
[email protected] Geoscientific datasets, or geodata, subsurface geology is problematic.
comes from diverse observations
Telephone: including: geophysical and other “Another challenge is that although
+61 8 6488 5806 remotely sensed responses collected the data are collected at significant
from drill holes, ground, air and even expense, limited time and resources
Major Funding Sources: space; geochemical responses of rock are typically allocated for processing
Rio Tinto, Geological Survey of Western specimens; and field observations. and analysing large volumes of these
Australia and Australian Research Council. data, leading to a bottleneck in the
Despite significant investment in interpretation workflow.
Major Collaborators: data collection, the industry faces
Rio Tinto, Geological Survey of Western significant challenges with building “Nevertheless, data interpretation
Australia, Geosoft and Advanced Logic robust subsurface geology models outcomes are the basis of decisions
Technology. and extracting relevant geological with significant economic and social
information from data. implications for the resource industry
30 and for society in general.”
“The main challenge is dealing
with the complex nature of geology The Geodata Algorithms Team at
which has evolved through many UWA’s Centre for Exploration Targeting
geological events in the distant (CET) within the School of Earth
past, where interpreters must make Sciences focuses on working with the
educated guesses based on limited resources industry and government
observations,” said UWA’s Professor agencies to provide the industry with
Eun-Jung Holden. new data analytics tools to improve
the efficiency and robustness of their
data interpretation.

“The uniqueness of this team is in the
ability to develop end-user focused
advanced geodata analytics tools for the
resource industry with a specific aim to
maximize geological knowledge gain in their
data interpretation workflow.”

The team is made up of Three software products have These tools are widely used by the
researchers trained in computer been developed through industry mining and petroleum companies
science, engineering and applied engagement and support and around the world to improve the
mathematics. Equipped with a strong commercialised through world- understanding and modelling of
capability in computational algorithm leading software vendors in subsurface for resources exploration
design and implementation, spatial geophysics data analysis and extraction.
they’ve built renowned expertise in platform (Geosoft’s Oasis Montaj)
understanding the nature of geodata and in televiewer image analysis The Geodata Algorithms Team
and their use in industry workflows (Advanced Logic Technology’s won various research and student
by collaborating with geoscientists in WellCAD): awards internally and externally to
academia and industry. 1) The CET Grid Analysis UWA, and was the winner of UWA
extension for Geosoft’s Oasis Vice Chancellor’s Research Award in
The uniqueness of this team is in Montaj; Impact and Innovation, 2015.
the ability to span the boundaries 2) The CET Porphyry Analysis
of computational science and extension for Geosoft’s Oasis Professor Holden, who established
geoscience and the boundaries of Montaj; and and leads The Geodata Algorithms
academia and industry. 3) Image & Structure Team at CET, also led the
Interpretation Workspace commercialisation processes for these
The innovative end-user focused (automated structure picking and products.
research of the team can be seen in analysis methods) for WellCAD by
their continuing success in industry Advanced Logic Technology.
engaged research, and delivery of
their research outcomes to industry
through commercialisation to
maximise their impact in industry
practice.

31

Long term living standards Data Processing, Access and Storage

Clever clovers and
computer simulation
revolutionising plant
breeding

Key Contacts: “The genomics toolbox has to a massive increase in available
Professor William Erskine profound implications for the raw sequence data, which has the
and breeding and molecular marker potential to provide information
Dr Parwinder Kaur development for subterranean about the genetic basis of important
UWA Centre for Plant Genetics and clover that can be easily traits vital for the future genetic
Breeding translated to related species. “ improvement of pasture legumes.”

Emails: Subterranean clover (Trifolium He pointed out that extracting
[email protected] subterraneum), also known as meaning from this huge volume of
[email protected] ‘subclover’ is a key pasture species available raw sequence data poses
in Australia. Grown commercially new informatics challenges and
Telephone: for animal fodder it can thrive in difficulties.
+61 8 6488 1903 poor-quality soil where other clovers
cannot survive and has revolutionised “Life scientists are not only
32 farming practice, converting many encountering challenges with
struggling farms into successful handling, processing and moving
livestock holdings. information that were once the
domain of astronomers and high-
While conventional breeding methods energy physicists, but biological data
have led to increased yield and mining is much more heterogeneous
quality, further gains are expected than data from physics,” he said.
if breeders exploit new traits, and
biologists are joining the big-data club “Biological data stem from a wide
to bring this about. range of experiments that spit out
many types of information, such as
Professor William Erskine, Director genetic sequences or interactions of
of UWA’s Centre for Plant Genetics proteins.
and Breeding, explained: “The recent
advent of next generation sequencing “The complexity is daunting, and
platforms and technologies has led getting the most from the data

“Using whole, living
organisms is fast becoming
a thing of the past as
we harness computer
simulation technologies to
optimise plant breeding”

requires interpreting them in light of “To take this further we will proceed development of this platform takes
all the relevant prior knowledge which with a new platform, HapMap huge computational resources and
means scientists have to store large (Haplotype Mapping) using the world storage facilities.
data sets, and analyse, compare and first Core Collection of subterranean
share them — definitely not a simple clover.“ “Our use of the Pawsey Centre’s
task." state-of-the-art computational
HapMap techniques, recently framework and processing power
To tackle these challenges the developed for human genomics, offer has enabled us to reduce processing
UWA Centre for Plant Genetics very precise means of fingerprinting time significantly, by the ability to
and Breeding uses the latest the LD (linkage disequilibrium) run many jobs in parallel. It is nothing
supercomputing technologies, regions in the genome by aligning less than a step change to enable life
methods and equipment. against the reference genome. scientists moving from vivo to silico
to explore big data sets for answering
“As a first step, we invested in “For subterranean clover such important scientific questions about
subterranean clover de novo genome alignments can be carried out for life.”
sequencing initiative to generate a any trait we are interested in and
reference genome scaffold. markers can be designed to assist in
rapid screening of cultivars,” Professor
“The availability of complete genome Erskine said.
sequence for subterranean clover now
provides new perspectives on solving “This will allow breeding jobs for
genetic problems that are difficult to subterranean clover to progress
tackle with conventional approaches, rapidly using this genomics data
such as identifying the molecular generated for the World CORE
basis of multigenic and complex collection of subterranean clover.
traits like methanogenic potential,
seed dormancy, red-legged earth “With even a single sequenced
mite resistance (RLEM), root disease subterranean genome taking
resistance (especially Phytophthora) up around 580 megabytes, the
and early root growth.

33

Services

Highlights of some
of the available UWA
courses advancing
data-intensive Science

The University of Western
Australia is developing
the leaders of tomorrow
who are able to meet the
global challenges and seize
opportunities in a rapidly
changing world.

34

Our case studies illustrate the Business Information and Logistics Biotechnology
key role played by data-intensive Management Master by coursework
science computation and Master by coursework (Genetics and Genomics)
technologies across all disciplines, www.studyat.uwa.edu.au/courses/master- www.studyat.uwa.edu.au/courses/master-
and this is reflected both in UWA’s of-business-information-and-logistics- of-biotechnology-genetics-and-genomics
undergraduate and postgraduate management-coursework
courses. Master by coursework
In addition, many (some?) of (Genetics and Breeding)
At the undergraduate level, UWA UWA’s postgraduate degree www.studyat.uwa.edu.au/courses/master-
offers a major in Data Science, open programs include units on the of-biotechnology-genetics-and-breeding
to students enrolled in any of UWA’s latest discipline-specfic big-data
Bachelor degree courses. developments, technologies and Agricultural Science: Genetics and
For more information, visit equipment: Breeding
http://handbooks.uwa.edu.au/ Master by coursework
majors/majordetails?code=MJD- Geographic Information Science www.studyat.uwa.edu.au/courses/master-
DATSC Master by coursework of-agricultural-science-genetics-and-
or coursework and dissertation breeding
At the postgraduate level, UWA www.studyat.uwa.edu.au/courses/master-
offers a number of degree courses of-geographic-information-science Strategic Communication
which focus on the development Graduate Certificate
of new tools, technologies and Ore Deposit Geology www.studyat.uwa.edu.au/courses/
skills relevant across all disciplines Master by coursework graduate-certificate-in-strategic-
working with huge volumes of data, www.studyat.uwa.edu.au/courses/master- communication
including: of-ore-deposit-geology
Graduate Diploma
Scientific and High Performance Emergency Medicine Research www.studyat.uwa.edu.au/courses/
Computing: Graduate Certificate graduate-diploma-in-strategic-
Graduate Certificate www.student.uwa.edu.au/courses/ communication
www.studyat.uwa.edu.au/courses/ graduate-certificate-in-emergency-
graduate-certificate-in-scientific-and-high- medicine-research Master by coursework
performance-computing www.studyat.uwa.edu.au/courses/master-
Population Health Studies of-strategic-communication
Information Technology Graduate Certificate
Master by coursework www.studyat.uwa.edu.au/courses/ Professional Engineering
www.studyat.uwa.edu.au/courses/master- graduate-certificate-in-population-health- Master of Professional Engineering –
of-information-technology-coursework studies Preliminary
www.studyat.uwa.edu.au/courses/master-
Physics: Computational Physics Public Health of-professional-engineering-preliminary
Master by coursework Graduate Diploma
www.studyat.uwa.edu.au/courses/master- http://handbooks.uwa.edu.au/courses/ Master of Professional Engineering
of-physics-computational-physics coursedetails?id=c115 www.studyat.uwa.edu.au/courses/master-
of-professional-engineering
Business Information and Logistics Master programs (3)
Management http://www.studyat.uwa.edu.au/ 35
Graduate Certificate search?q=master+of+public+health
www.studyat.uwa.edu.au/courses/
graduate-certificate-in-business-
information-and-logistics-management

For more information, please contact:

Professor Robyn Owens
Deputy Vice-Chancellor (Research)

Address:
35 Stirling Highway
Perth, WA 6009

Telephone:
+61 8 6488 2460

Fax:
+61 8 6488 1013

Email:
[email protected]


Click to View FlipBook Version