Zhiyi Huang and Paul Werstein
There are many problems that cannot be solved in reasonable time without the
use of supercomputers. An alternative is to use a group of standar
off-the-shelf personal computers to form a powerful
cluster computer.
We have a number of projects including:
- Distributed shared memory using Treadmarks.
- Linux kernel support for cluster computing.
- Network support for cluster computing.
- Single system image - making a cluster computer appear as a single powerful computer.
- Programming environments for distributed computing.
Nathan Rountree and Ian McDonald
We often get large amounts of data where each object that is
represented falls into a particular group depending on certain
features. For instance, a particular lattitude and longitude may be
associated with land rather than sea, or with high oxygen content
rather than low. Sometimes is is nice to build models which condense
this pattern into a brief but salient piece of "knowledge": for
example, a rule expressing the relationship between levels of bacteria
and the diagnosis of a disease. Developing that knowledge can be very
difficult, especially when there are a lot of data (or a lot of
features). Some methods seem to be more accurate than others---that
is, they model the relationship between features and predicted group
with a greater chance of the group being correct. Data mining projects
aim to make the process of generating new knowledge from data
faster, more acurate, and applicable to new fields of knowledge.
Current and previous projects include:
- Embedding prior knowledge in neural networks to make them learn faster.
- Building parts of a model in parallel on lab machines.
- Applying data mining techniques to COMP103 data to identify
students most likely to pass an introductory programming course.
- Identifying high risk patients from the university's surgical
audit database.
Paul Werstein and Ian McDonald
There are many problems that are most conveniently solved by storing
(and retrieving) data in (and from) a relational database. To get
fast access to your data, the database keeps an index on some
attibutes of the data---allowing fast access by either student
identification number or name or perhaps some other feature.
However, sometimes we need to retrieve data by more than one feature;
e.g. by lattitude, longitude, and time all together. The problem is
that once data has been retrieved by longitude, the resultant dataset
may still be very large, yet has no index on it for the other
features.
Our experiments have shown that standard commercial and
non-commercial databases cannot cope with certain reasonably modest
problems without some sort of new indexing systems. Even
purpose-built databases benefit from data structures that index more
than one attribute at a time. Current projects in this area include:
- Benchmarking databases on retrieval of the kind of data generated
by Formula 1 racing.
- Building a new database to deal with race data efficiently.
- Development of a new indexing system so as to retrieve race data in
real-time.
- Development of a visualisation system for verifying the retrieval
of real-time race data.
|