This page details my current and recent research, which I've tried to organise into coherrent categories: ARSpectator; Culture & Heritage; Robotics; Learning & Recognition; 3D Reconstruction; and Interaction.
My contact details are to the right, but email is generally the best option. If you wish to make a meeting time, it's best to check my calendar first.
Selected publication are listed with each topic below, but for a more complete (and often up-to-date) list, see my Google Scholar page.
Email: | steven@cs.otago.ac.nz |
---|---|
Phone: | ++64 3 479 8501 |
Office: | Room 245, Owheo Building |
Over the last couple of years, we have seen many major advancements in sports broadcasting as well as in the interactivity in sports entertainment. 25 years ago, the first real-time graphics animation of a sporting event was broadcast on television for the Americas Cup, driven by NZ innovation. Nowadays spectators can remotely follow the same event live in real-time using their mobile devices. However, spectators at live sporting events often miss out on this enriched content that is available to remote viewers through broadcast media or online.
The main idea of this project is to extend NZ’s lead in this field, visualising game statistics in a novel way on the mobile devices of on-site spectators to give them access to information about the sporting event. We will provide spectators with an enriched experience like the one you see in a television broadcast. Our plan is to use new technologies like Augmented Reality to place event statistic such as scoring, penalties, team statistics, additional player information into the field of view of the spectators based on their location within the venue. While currently we focus more on delivering data to the spectators, this approach could be easily extended for supporting coaches and team analysts. Our research will bring sports events closer to the audience, as well as bringing the spectators closer to the events and the teams.
Computer vision finds application in a wide variety of areas, and I am particularly interested in heritage and cultural aplications. This has included working with David Green on an installation for the Art & Light exhibition as well as more traditional academic application of computer vision. The collaborative nature of this work appeals to me, as does the value that these areas have to society.
Working with archaeologists, I am investigating the manufacture of pre-European Māori stone tools. We are analysing the shape of incomplete tools, and the fragments left behind during manufacture. At this stage we are investigating ways to automate manual processes in the archaeological analysis of these artifacts, but in the longer term we hope to discover new insights into the toolmaking process.
I am also investigating the analysis of historic documents, in particular the Marsden Online Archive. Character recognition techniques that have been successful on printed text don't work on these documents. We are exploring whole-word rather than character-level recognition, drawing a parallel with recent advance in machine learning that have proven successful for object recognition.
3D reconstruction pipelines take collections of images and produce 3D models of the world. While this task has become almost routine, there are many steps in the pipeline, and many areas that can still be explored and improved. The models produced find a wide variety of applications, including augmented reality, digital culture and heritage, medical imaging and surgery, and robotics
One of the main areas I am exploring is the exploitation of feature scale and orientation in this context. Most structure-from-motion pipelines are based on point features, but the most successful feature detectors (such as SIFT and ORB) estimate the size and orientation of the features as well as their location. This additional information is often discarded, but can be used to identify unreliable feature correspondences between images, or to accelerate RANSAC-based camera pose estimation.
Another area of interest is exploiting parallel computing to handle large data sets for vision processing. 3D reconstruction methods scale up to several thousand images, but larger scenes are often processed in parts. Even on smaller data sets, there is a lot of processing to be done, and exploiting parallel hardware can make significant speed increases.
Vision is one of the main ways we learn about the world, so tasks of learning and recognition are another interest of mine. Deep networks have become dominant in this field, and I am interested in applying them, but also in learning more about how they work. Applications we have explored include medical segmentation and document analysis, and we've also examined more closely the assertion that these networks work the same way as the human visual system.
Deep learning approaches are not, however, always applicable. Sometimes more analytic techniques such as PCA or manifold learning can be applied, and in other cases there is insufficient training data for current convolutional network models. An example of the latter is fine-grained recognition - telling the difference between similar sub-classes of objects. In the picture above, for example, the middle image is of a Kea, as is one of the others. The other image is a Kaka, but telling these species apart is much more difficult than distinguishing a bird from a cat.
Robots rely on sensors to build models of the world around them, and vision is a rich and adaptable source of such information. Robotics impose additional constraints, such as real-time processing and the need for robust models of the world, on vision tasks. I was involved as a researcher on the recently-completed MSI/MBIE funded Vision-Based Automated Pruning project, led by Richard Green at the University of Canterbury. We worked with Tom Botterill to develop vision systems to build 3D models of grape vines and direct a robotic pruning arm to remove unwanted canes.
I am also involved in a recently-started project, Karetao Hangarau-a-Mahi: Adaptive Learning Robots to Complement the Human Workforce, led by Armin Werner (Lincoln Agritech), Will Browne (Victoria University), and Johan Potgieter (Massey University). The projecct aims to develop robots that can safely and effecively work alongside people. I am part of a national team of researchers developing new and robust ways to sense and model dynamic environments for this challenge.
As well as the ARSpectator project, I am interested in other forms of interaction - generally when cameras are used to track or interpret people's interactions. One of the main things I'm interested in is our AR Sandbox, which is based on an open-source project at UC Davis. We have used this for teaching in Surveying, and for various outreach activities, but are interested in its potential in other areas as an intuitive, tangible, and responsive interface.
Other interaction related work that I've been involved with includes tracking faces for interactive displays; identifying power lines for interaction with a mobile app; interactive art installations; and navigation tools for the blind.