LCD Report of activities, 1999-2008

 

Previous CFI-funded Infrastructure

 

            The Laboratory of Computational Discovery has received two CFI grants.  In March 1999, the LCD received a CFI Research Development Fund grant of $1.32 million (H. Hamilton, lead researcher; $529693 from CFI).  This grant was used to purchase Grendel, a Silicon Graphics Onyx2 supercomputer with 24 CPUs, 12 Gb RAM, 540 Gb hard discspace, and two 24-inch high-speed graphical displays.  The machine arrived on campus in 1999, has been available to all LCD members, and has continued in operation to this day.

            In June 2002, the LCD received its second CFI grant, a CFI New Opportunities Fund grant of $142230 (A. East, lead researcher; $56892 from CFI).  This grant was used to purchase Cadmium, a Parallel Quantum Solutions supercomputer with 48 CPUs, 24 Gb RAM, and 2240 Gb hard discspace.  The machine, dedicated for exclusive use by the computational chemists (A. East and collaborators), arrived on campus in 2003, and has continued in operation to this day.  In the past 3 years, Cadmium has seen upgrades in memory and discspace capacity, as well as the addition of 8 faster CPUs.

 

Research

 

            The computational discoveries have been in the areas of computational chemistry, data mining, econometrics, computer graphics, and mathematics, as described below.

            Data Mining I.  This research area looks for hidden patterns and trends in large groups of data. Howard Hamilton, of the Department of Computer Science, is leading a team researching new data mining techniques for the MegaMiner software system, which contains a set of general tools for discovering previously unknown relationships in scientific and commercial data.  Graduate students Kamran Karimi and Linhui Jiang created components for the system using Grendel, a computer capable of providing fast access to large databases, either maintained permanently on the machine or created dynamically as intermediate databases summarizing data from web sources.  Extensive testing has been performed on a component for finding frequent substrings in long strings of text. The data mining algorithms were improved to handle hundreds of millions of tuples (data table entries), instead of just millions of tuples. Graduate student Xuewei Wang used Grendel for data preparation and analysis for an intelligent agent based on Web mining and Web usage mining. Projects on web mining include PagePrompter and WebUsageMiner, which benefit from Grendel’s capacity to handle huge amounts of web usage data. (38 publications)

            Data Mining II. Robert Hilderman and his M.Sc. student Terry Peckham studied a new fine-grained approach to the generation of statistically significant contrast sets from large, dense datasets.  Their new approach identified differences between groups and explained what the differences were. A data structure was developed that allowed the search space to be built incrementally, providing a mechanism for pruning uninteresting parts of the search space, and one that was computationally parallelizable on Grendel architecture. Hilderman and his Ph.D. student Mondelle Simeon have extended this work with discretization of quantitative attributes.  Determining the cut points for the best discretization is compute intensive and various techniques utilizing Grendel have been evaluated.  Hilderman discovered that, when using data structures called domain generalization graphs with multiple attributes, all possible summaries in the search space could be easily generated from a database. He introduced and tested serial and parallel versions of algorithms for traversing these graphs, and found that it was practical for data mining tasks to generate and analyze hundreds of thousands of summaries in a space and time efficient manner given a parallel architecture. (16 publications)

            Computational Chemistry.  Allan East and his team of several students and collaborators have applied computational chemistry methods to studying several chemical reaction mechanisms.  Most of their productive work has been in the area of petroleum cracking mechanisms.  In 2001-3 they made several discoveries regarding the Bronsted alpha-scission mechanism: (i) the centre of a straight-chain alkane is most vulnerable, (ii) branched alkanes are more vulnerable than straight-chain alkanes, and (iii) alpha-scission is a high-energy process that cannot be the dominant step in petroleum cracking.  In 2004-6 they made several discoveries regarding the popular chemisorption version of the beta-scission mechanism:  (i) different catalysts produce qualitatively different intermediates for it, (ii) there is a general mechanism which explains all variations of it, and (iii) it is also a high-energy process that cannot be the dominant step in petroleum cracking.  In 2007 they turned to molecular dynamics simulations to explore cracking mechanisms, and became the first research group to observe carbocation branching in a simulation.  (19 publications)

            Consumer Economics.   Household spending is one of the largest components of economic activity, with significant implications for economic growth. Greater understanding of this economic impetus is critical for achieving superior economic performance. Chris Nicol is leading a team that is using Grendel in econometric research on economic theories of consumer behavior, at the household and aggregate levels. Using microdata sets of household spending in the USA, Nicol successfully analyzed the regularities present in underlying behavioral relationships that generate these data. Nicol also successfully developed new shrinkage estimator techniques to sift through the “noise” (irrelevant or misleading data) of real-world data. (5 publications)

            Factorization Algorithms. Richard McIntosh is applying Grendel's computational power to the search for large prime numbers with specific properties. He is working with Richard Brown, Richard Crandall, Karl Dilcher and Alan Powell on the search for Wieferich, Wall-Sun-Sun (sometimes called Fibonacci-Wieferich), Wilson and Wolstenholme primes, and prime factors of Fermat numbers.  He recently discovered a 23-digit prime factor of the 18th Fermat number.  He is testing and modifying some of the parameters in Richard Crandall's elliptic curve factoring program to improve the performance.  Number theory has found new applications in cryptography and physics in the last few years.  (1 publication)

            Computer graphics.  Xue Dong Yang and co-workers have made use of Grendel’s special fiber-optic-connected graphics head and fast processing power to successfully model the motion of hair.  (4 publications)

Schur indices.  When data can be modeled by algebraic structure, representation theory of finite groups applies.  The structure can be categorized by Schur indices, which measure the dimensions of particular generated algebras.  Schur indices are difficult to compute, however.  Allen Herman and his group have used the GAP software system on Grendel to early success, developing two algorithms for Schur index calculation based on deep theories of character correspondences and equivalence of G-algebras in 2003, and making it possible to formulate a conjecture concerning the Schur indices that inspired further research by a European research group.  Herman collaborated with this group later, extending their GAP subroutines (for Wedderburn decomposition of a group algebra) to include calculation of Schur indices in 2007.  In other collaborations, GAP made it possible to do calculations in group rings and number fields that led to a series of papers with other collaborators (Yuanlin Li and Mike Parmenter).  As well, Herman and collaborator Rahnamai Barghi made use of character table, eigenvalue, and number field computations done using GAP in their recent project on characters of association schemes. (4 publications)

            Bayesian networks.  In computer science, Cory Butz recently developed the first join tree propagation algorithm that precisely articulates the probability information being propagated in a network in terms of conditional probability tables (CPTs). The significance is that CPTs are a special case of the probability distributions traditionally thought to be propagated. Having a clearer picture of the probability information being propagated in a network allows one to make more efficient decisions during join tree propagation. Butz and coworkers have empirically demonstrated that their approach performs inference faster than the leading techniques on numerous real-world and benchmark Bayesian networks. This work has caught the eye of Google, NSERC, and both national and international academics in the field. (1 publication)

           

Research Team

 

            The Laboratory of Computational Discovery, together with its CFI-funded infrastructure, originally housed scientists in the fields of computer science, geography, engineering, and economics.  This infrastructure spawned a second laboratory, now called the TERRA geomatics computational facility, which allowed the members from the department of geography to have a more specialized facility.  Our team has also lost three original members to other universities, including two professors who are now deans of science elsewhere: Chris Nicol is currently Dean of Arts and Science at the University of Lethbridge, and Nick Cercone is currently Dean of Science at York University.  The connection with the Faculty of Engineering was lost as those researchers migrated to smaller computer workstations within their faculty.

            The loss of our activities in geography (to the spin-off TERRA facility) and economics (with the loss of C. Nicol), and engineering were superbly replaced with new research in the fields of mathematics and chemistry.  Richard McIntosh (joined LCD in 1999) and Allen Herman (2002) found the CFI-funded infrastructure, the “Grendel” supercomputer, to be quite useful for their mathematics research, and Allan East (2000) found it indispensable for his burgeoning research in computational chemistry.  Other     computer scientists joined the team; the most recent additions would be Y. Yao (2006), J.T. Yao (2007), and C. Butz (2007), and mathematician K. Meagher (2007).  Of these additions in personnel, there were two international recruits: A. East did his Ph.D. at Stanford University, USA, and J.T. Yao did his at the National University of Singapore.

            The past performance has certainly benefitted from student participation.  Three chemistry undergraduates have won poster awards at the Canadian Society for Chemistry national conference, based on their research performed on the CFI-funded supercomputers.  The research in informatics and computer science has boosted the reputation of several members to the level of being book editors and international conference program committee members.  The past performance has spawned a number of collaborations, particularly with A. East in chemistry research, with new collaborations with the National Research Council, the University of Maryland, the University of Southern California, the University of Vienna, the Saskatchewan Research Council, and other members of his own department (A. Wee, L. Mihichuk, and B. Sterenberg).  Recent work in Bayesian networks by C. Butz has attracted attention from Google, Inc.

            The results in the field of chemistry have allowed A. East to continue a rise in reputation and prominence by allowing him to publish papers in leading American scientific journals, and impress collaborators from the USA and Austria.  He was able to capitalize on the cutting-edge computational power and facility stability, to efficiently explore numerous examples of carbon-carbon bond cracking, building up his knowledge to the stage where he can advance a field that had stagnated badly 8 years ago.  The computer scientists on our team also experienced recognition enhancement due to this CFI-funded research success, with invitations to write book chapters and be on conference organizing committees.

           

Training

 

            The CFI facility allowed us to recruit an outstanding system administrator, John Jorgensen, from Montreal (originally from Regina) in 2003.  John had gained a great deal of high-performance-computing expertise by working on these two CFI-funded facilities, and in 2007 was asked to assist in the installation of another CFI-funded supercomputer at the University of Regina.

            Chemistry students who have used the facilities include 1 Ph.D. student, 4 M.Sc. students, and 8 undergraduates.  All of these students are either furthering their education in chemistry or medicine, or have begun careers in science.  In addition, approximately 5 math students and 40 computer science students have used the facility.

            Training of students is most often done directly by LCD members, thanks to the benefits of being at a smaller university.  Students developed compter skills in a multiuser environment, including telnet and file-transfer skills, as well as data production and processing, and file management.  File processing and management on large computer servers will be a useful skill for our HPC trainees to have, even if they do not become researchers, for many businesses as well as web-based applications deal with large file servers.  Such trainees should prove valuable to Canada.

           

Collaborations

 

            Several of our computer science professors have participated in conference program committees; for example, R. Hilderman has been on 10 such committees since 2002.  As mentioned earlier, the facility has allowed A. East sufficient research outcomes to attract new collaborations with the National Research Council, the University of Maryland, the University of Southern California, the University of Vienna, the Saskatchewan Research Council, and other members of his own department.  L. Mihichuk has also entered into a new collaboration with J. Weil of the University of Saskatchewan, thanks to preliminary results he produced with the CFI-funded computers.

            Future partnerships in Phase II of the Laboratory of Computational Discovery are anticipated with software/internet companies, such as Loose Foot Computing (Regina) and Google (California), and petroleum researchers/consultants, such as NRCan (federal government) and Hatfield Consultants (Vancouver).

           

Capacity

 

            Grendel’s CPU usage was routinely over 90% for several years.  In the past year, it has been used by rather few users, as it is too slow to accommodate our newest projects, which await replacement infrastructure.  Cadmium’s CPU usage is also routinely over 90%, and by 2009 it will be too slow to keep our computational chemists competitive with other researchers around the world.

            Since 1999, the LCD has evolved from a university institute (the Institute for Informatics Research and then the Institute of Computational Discovery) to a more efficient multidisciplinary laboratory within the Faculty of Science.  Membership has also evolved over the past 8 years, with some members moving to new universities and several new members filling their place, attracted to the Laboratory and its fine CFI-funded HPC facility.  Eighty-seven scientific publications to date have directly arisen from these two CFI infrastructure investments.  In terms of translational technology transfer, this represents highly successful developmental science (Phase I), using these discoveries to establish new scientific principles as well as building reputation and attracting collaborators.  This puts the LCD in excellent position for Phase II, where it shifts towards market-driven applications to attract interest from engineering and industry.

 

Benefits to Canada

 

            The benefits to Canada that have been achieved are in the establishment of expertise and prestige, as well as the development of new software technology and nascent intellectual property (computer software).  The work to date has been in fuelling the “technology pipeline”, i.e. providing solid basic research that will lead to technology development, by us, in Phase II of our work. 

            The discoveries in the chemistry research have led to a knowledge base that is vital for advancing to the next phase of understanding chemical catalysis of hydrocarbon reactions.  This, in turn, will hopefully lead to new “green chemistry” technologies in petroleum/oil sands refining that will keep Canadian companies like Suncor and Syncrude ahead of foreign ones.  The ideal goal would be to lower the costs to the refiner, the consumer, and the environment.

            Our success could not have been achieved without the previous investments by CFI.  The CFI-funded supercomputers allowed us to compete with the best in the world, and begin to carve our research niche.  We hope, with this Leading Edge Fund request, that CFI will maintain their “partnership” with us and allow our economical knowledge farm in Regina to continue to build its reputation, as we stride into the next stages of technology transfer and build further external interest in our work.