Systems Research Group: Projects page

Zhiyi Huang and Paul Werstein

There are many problems that cannot be solved in reasonable time without the use of supercomputers. An alternative is to use a group of standar off-the-shelf personal computers to form a powerful cluster computer.

We have a number of projects including:

Distributed shared memory using Treadmarks.
Linux kernel support for cluster computing.
Network support for cluster computing.
Single system image - making a cluster computer appear as a single powerful computer.
Programming environments for distributed computing.

Nathan Rountree and Ian McDonald

We often get large amounts of data where each object that is represented falls into a particular group depending on certain features. For instance, a particular lattitude and longitude may be associated with land rather than sea, or with high oxygen content rather than low. Sometimes is is nice to build models which condense this pattern into a brief but salient piece of "knowledge": for example, a rule expressing the relationship between levels of bacteria and the diagnosis of a disease. Developing that knowledge can be very difficult, especially when there are a lot of data (or a lot of features). Some methods seem to be more accurate than others---that is, they model the relationship between features and predicted group with a greater chance of the group being correct. Data mining projects aim to make the process of generating new knowledge from data faster, more acurate, and applicable to new fields of knowledge.

Current and previous projects include:

Embedding prior knowledge in neural networks to make them learn faster.
Building parts of a model in parallel on lab machines.
Applying data mining techniques to COMP103 data to identify students most likely to pass an introductory programming course.
Identifying high risk patients from the university's surgical audit database.

Paul Werstein and Ian McDonald

There are many problems that are most conveniently solved by storing (and retrieving) data in (and from) a relational database. To get fast access to your data, the database keeps an index on some attibutes of the data---allowing fast access by either student identification number or name or perhaps some other feature. However, sometimes we need to retrieve data by more than one feature; e.g. by lattitude, longitude, and time all together. The problem is that once data has been retrieved by longitude, the resultant dataset may still be very large, yet has no index on it for the other features.

Our experiments have shown that standard commercial and non-commercial databases cannot cope with certain reasonably modest problems without some sort of new indexing systems. Even purpose-built databases benefit from data structures that index more than one attribute at a time. Current projects in this area include:

Benchmarking databases on retrieval of the kind of data generated by Formula 1 racing.
Building a new database to deal with race data efficiently.
Development of a new indexing system so as to retrieve race data in real-time.
Development of a visualisation system for verifying the retrieval of real-time race data.