Laboratory for Advanced Computing

Recent News

November 2007

NCDM receives SC|07 Conference Award.

First Place at the SC 07 Analytics Challenge Awarded to ANGLE, the New Approach for Protecting Cyber-infrastructure

Reno, NV, November 15, 2007 - A new approach for protecting cyber-infrastructure won first place at the Third Annual Analytics Challenge at the SC 2007 conference in Reno, NV.

Cyber-infrastructure refers to the Internet-based infrastructure that allows businesses, consumers and the government to use the Internet and Internet-based applications. There is a growing awareness that protecting cyber-infrastructure from interference by criminals and other threats is becoming a national priority.

A team led by the National Center for Data Mining (NCDM) at the University of Illinois at Chicago and including participants from Northwestern University, the University of Chicago, Argonne National Laboratory, and the University of Southern California developed an application to protect cyber-infrastructur, called Angle.

Given the high volume of the data that is transported over the Internet, methods for identifying attacks on cyber-infrastructure can produce so many alerts that analysts monitoring the infrastructure are often overwhelmed. In these circumstances, it is common for analysts to miss new behavior that might be the beginning of new types of attacks. The Angle application developed by the team introduced a new algorithm for identifying possibly malicious activity for further study.

Since the Internet is distributed, so is the data that must be analyzed to protect it. With today's supercomputers, the data must be collected, transported to the supercomputer, and then transported back. For large data, the time required to do this can be a significant fraction of the total time required by the analysis.

One of the innovations of the Angle project was the use of a data and compute cloud so that the data could be left in place and computation performed over the data. Although cloud computing has been used in the past several years by companies such as Google, Yahoo, Amazon and Microsoft to provide their services, these cloud infrastructures, by and large, are based on the standard Internet. In contrast, the Sector data cloud used by the Angle Project was a second-generation data cloud that is based on wide area high performance networks. These high performance networks enabled the large data sets produced by the project to be handled easily.

"Winning the Analytics Challenge shows the potential that second generation data and compute clouds have for changing the way we manage and compute with large distributed data," said Robert Grossman, Director of the National Center for Data Mining (NCDM) at the University of Illinois at Chicago and Managing Partner of Open Data Group.

The Angle Project was sponsored in part by CDAR, a Chicago-based research consortium that is developing new technologies and methodologies for analyzing large, complex and distributed data.

The National Center for Data Mining has led teams that have won two of the first three Analytic Challenges (at SC 05 and SC 07).


Second Generation Data Cloud Announced at SC07

Reno, NV, November 12, 2007 - A second generation Data Cloud called Sector was announced this week at the SC 2007 conference in Reno, NV.

Cloud computing is a critical piece of the infrastructure that allows companies such as Google, Yahoo, Amazon and Microsoft to provide their services.

A cloud provides computing resources or services over the Internet. A storage cloud provides storage services; a data cloud provides data management services; and a computing cloud provides computational services. Often these are layered to create a stack of cloud services that provide a computing platform for developing cloud-based applications.

Until now, data clouds all used the standard Internet to link distributed computing resources.

At SC 07, the National Center for Data Mining (NCDM) at UIC announced a second generation data cloud called Sector that uses high performance, wide area 10 Gbps networks.

The foundation for Sector Data Cloud is the 10 Gbps Teraflow Testbed, a joint project of the NCDM and the International Center for Advanced Internet Research (iCAIR) at Northwestern University.

"Data clouds have emerged as the preferred platform for distributed computing when working with large amounts of data," said Robert Grossman, Director of the National Center for Data Mining at the University of Illinois at Chicago and Managing Partner of Open Data Group. According to Grossman, "Sector is the first of a second generation of data clouds that are based on new network protocols designed to work with the very large data sets that are common in e-science and that are beginning to become more common in e-business."

Sector is an open source data cloud based on the NCDM developed UDP-based Data Transfer (UDT) protocol that enables even very large data sets to be transported efficiently over high performance wide area networks.

"We have extensively used the Sector Data Cloud and the Teraflow Testbedto distribute multi-terabyte astronomical datasets to the whole world. We are also working to implement large-scale streaming queries across large astronomical archives to support the users of the National Virtual Observatory," said Alexander Szalay, Professor of Astrophysics and Computer Science at the Johns Hopkins University.

About the National Center for Data Mining

The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was founded in 1998 as a national resource for high-performance and distributed data mining and data intensive computing. NCDM performs research, hosts standards, operates testbeds, and engages in outreach. NCDM coordinates the development of the Predictive Model Markup Language (PMML), a standard for statistical and data mining models, and operates the Teraflow Testbed, a network for distributing large e-science datasets. For more information about NCDM, see http://www.ncdm.uic.edu.


Teraflow Testbed - A High Performance Facility for Distributing and Sharing Large E-Science Data Sets Announced at SC07

Reno, NV, November 12, 2007. This week, at the SC 2007 conference in Reno, NV, a consortium of researchers announced the Teraflow Testbed (TFT). The Teraflow Testbed is a unique international facility for working with, and for sharing, large remote and distributed data.

The Teraflow Testbed is the first advanced network dedicated to linking together large e-science data sets so that they are easier to integrate with each other and easier to share with colleagues.

The Teraflow Testbed employs specialized transport protocols and dedicated lightpaths using 1 Gbps, 10 Gbps and multiple 10 Gbps data streams that connect Teraflow Testbed sites around the world. With the ability to move the data at 10 Gbps and higher, the Teraflow Testbed provides as much bandwidth between its distributed sites as most grid computers have between their nodes that are in the same room.

The design and implementation of the TeraFlow Testbed is being led by the National Center for Data Mining (NCDM) at the University of Illinois at Chicago and the International Center for Advanced Internet Research (iCAIR) at Northwestern University. Other members of the consortium include StarLight, an international communications facility in Chicago, and the National Lambda Rail.

"This facility is the first dedicated facility for distributing and sharing large e-science data sets," said Robert Grossman, Director of the National Center for Data Mining at the University of Illinois at Chicago and Managing Partner of Open Data Group. "Until today, most high performance network testbeds have been used for connecting supercomputers, not for changing the way people work with data," according to Grossman.

"The ability to share large amounts of distributed, federated data and stream between sites to support search, analysis, and visualization requires reliable high-bandwidth, low-latency networking at 10Gbps to 40Gbps over unconstrained lightpaths," said Henry Dardy, Chief Scientist for the Center for Computational Science at the Naval Research Laboratory. "Our research today with the Teraflow Testbed deploys Infiniband as a single wire hardware interconnect of processing, storage and network assets along with open source software to demonstrate virtualization of the enterprise."

"This facility will support multiple advanced applications, including many advanced prototypes that cannot be sustained by traditional technology infrastructures," said Joe Mambretti, Director of the International Center for Advanced Internet Research (iCAIR) and Co-Director of the StarLight facility, one of the world's largest optical network exchanges for national and international research and education networks, which is located in Chicago.

The initial leg of the Teraflow Testbed uses a dedicated 10 Gbps lightpath connecting a Teraflow Testbed cluster at the StarLight facility in Chicago and a Teraflow Testbed cluster in Mclean, Virginia. From McLean, the Teraflow Testned connects to clusters at NASA Goddard in Greenbelt, Maryland, Johns Hopkins University in Baltimore, Maryland, and to the Naval Research Laboratory in Arlington, Virginia.

The Teraflow Testbed also connects with Teraflow Testbed clusters in Daejeon, Korea, Tokyo, Japan, and Amsterdam, The Netherlands using shared 10 Gbps networks.

Over the next year, it will be extended to multiple other sites nationally and internationally.

The Teraflow Testbed is sponsored in part by the National Science Foundation, the US Army, the Department of Energy, and the University of Illinois at Chicago.

For more information, see http://www.teraflowtestbed.net.

About the National Center for Data Mining

The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was founded in 1998 as a national resource for high-performance and distributed data mining. NCDM performs research, hosts standards, operates testbeds, and engages in outreach. The Center coordinates the development of the Predictive Model Markup Language (PMML), a standard for statistical and data mining models, and operates the Teraflow Testbed, a network for distributing large e-science datasets. For more information about NCDM, see http://www.ncdm.uic.edu.

About the International Center for Advanced Internet Research, Northwestern University (iCAIR). iCAIR accelerates leading-edge innovation and enhanced global communications through advanced Internet technologies, in partnership with the international community, and national partners. The Center, which was created in partnership with a number of major high tech corporations, designs and implements large scale infrastructure and applications (metro, regional, national, and global). The Center has designed multiple advanced research testbeds, which are used to develop new communications architecture, services and technology. iCAIR also participates in the operations of advanced networks and facilities, such as StarLight, a unique global network exchange in Chicago. See http://www.icair.org for more information.

About the National Lambda Rail

National LambdaRail, Inc. (NLR) is a major initiative of U.S. research universities and private SECTOR technology companies to provide a national scale infrastructure for research and experimentation in networking technologies and applications. NLR puts the control, the power and the promise of experimental network infrastructure in the hands of our nation's scientists and researchers. For more information, see http://www.nlr.net.

About StarLight (sm)

StarLight is an advanced optical infrastructure and proving ground for network services optimized for high-performance, large scale national and global applications. Operational since summer 2001, StarLight has 1GE and 10GE switch/router facilities and true optical switching for wavelengths. StarLight is being developed by the Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC), the International Center for Advanced Internet Research (iCAIR) at Northwestern University, and the Mathematics and Computer Science Division at Argonne National Laboratory, in partnership with Canada's CANARIE and the Netherlands' SURFnet. See http://www.startap.net/starlight for more information.


The Second Annual Data Mining Practice Prize winner was announced at KDD-2007:

Data Quality Models for High Volume Transaction Streams: A Case Study

authors:

Robert L. Grossman
Open Data Group
River Forest IL USA
& National Center for Data Mining
University of Illinois at Chicago
Chicago IL USA
rlg1 at opendatagroup.com

Joseph Bugajski
Visa International
Foster City, CA USA
JBugajsk at visa.com

Chris Curry, David Locke & Steve Vejcik
Open Data Group
River Forest IL USA
{ccurry, dlocke, vejcik} at opendatagroup.com

August 28, 2007

"Detecting Changes in Large Data Sets of Payments Cards Data: A Case Study"
- by Robert Grossman is available in a video lecture format.

July 22, 2007

ACM SIGKDD 2007 Service Award to Robert Grossman

Award Acceptance Video

ACM SIGKDD is pleased to announce that Robert Grossman is the winner of its 2007 Service Award. Robert Grossman is recognized for his key role in the development of open and scalable architectures and standards for the SIGKDD and Global KDD Communities.

The ACM SIGKDD Service Award is the highest service award in the field of data mining and knowledge discovery. It is given to one individual or one group who has performed significant service to the data mining and knowledge discovery field, including professional volunteer services disseminating technical information to the field, leading organizations or projects that contribute technically to the field as a whole, furthering KDD education, or increasing funding to the KDD community.

The previous SIGKDD Service Award winners were Gregory Piatetsky-Shapiro, Ramasamy Uthurusamy, Usama M. Fayyad, Xindong Wu, the Weka team lead by Ian Witten and Eibe Frank, and Won Kim.

The award includes a plaque and a check for $2,500, to be presented at KDD-2007 (The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining) Opening Plenary Session on August 12, 2007 in San Jose, CA.

Grossman was one of the Founders of the Data Mining Group in 1998, which develops the Predictive Model Markup Language (PMML). He has been its Chair since it was started; and, during this time, it has released nine versions of PMML. PMML has seen wide spread adoption by the KDD community, in part, because:

  • PMML supports the sharing of statistical and data mining models in a platform and application independent fashion.
  • PMML supports architectures in which one application produces PMML models (called the PMML Producer) and another application, which may not even be a data mining application, consumes PMML models (called the PMML Consumer or scoring engine).
  • PMML supports KDD service oriented architectures.
  • PMML facilitates the storing of models in model repositories.
  • PMML supports applications in which models must be audited for compliance and other regulatory requirements.

For the past 10 years, Grossman has led two international testbeds for high performance and distributed data mining, which have been used by over fifty different organizations and groups to test, benchmark, and develop innovative technology for high performance and distributed data mining and knowledge discovery. The testbeds have also been used to develop and benchmark grid and service oriented technologies for mining large remote and distributed data sets. The first testbed was called the Terabyte Challenge and operated from 1995 to 1999, when working with a terabyte of data was still relatively rare. The second tested called the Teraflow Testbed was started in 2004 and will operate until at least 2008. Today when most distributed data mining takes place at 1-100 Mbps, the Teraflow Testbed can be used to mine data at 1-10 Gbps over wide area high performance networks.

Grossman has a long history of serving the KDD community. He was the Industrial Track Co-Chair for KDD 2006, the General Chair of KDD 2005, the Sponsorship Chair for KDD 2000 and 2001, and the co-chair of the First and Second SIAM International Conferences on Data Mining (SDM-01 and SDM-02).

Grossman has published over 140 research and technical papers in international conferences and journals. In 2005, he led the team that won the first annual High Performance Analytics Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications (SC 2005). He also led teams that won prizes involving high performance data mining and related areas at SC 2006, SC 1999, and SC 1998, SC 1996 and SC 1995.

Grossman is the Director of the National Center for Data Mining at the University of Illinois at Chicago and the Managing Partner of Open Data Group.

ACM SIGKDD is pleased to present Grossman its 2007 Service Award for his significant service and contributions to the global KDD community.

2007 ACM SIGKDD Awards Committee:

Ramasamy Uthurusamy (General Motors, USA), Chair
Jerome Friedman (Stanford University, USA)
Jiawei Han (University of Illinois Urbana-Champaign, USA)
Vipin Kumar (University of Minnesota, USA)
Heikki Mannila (University of Helsinki, Finland)
Rajeev Motwani (Stanford University, USA)
Ramakrishnan Srikant (Google, USA)
Ian H. Witten and Eibe Frank (University of Waikato, New Zealand)
Xindong Wu (University of Vermont, USA)


June 5, 2007

UIC's National Data Mining Center Enables Fast Data Transfer of Terabyte-sized Scientific Datasets. Press Release.


NCDM wins Bandwidth Challenge at SC06

November 16, 2006
Tampa, FL

The National Center for Data Mining at UIC wins the HPC Bandwidth Challenge at SuperComputing '06


Debbie Montano presents the SC06 Bandwidth Challenge Winner award
to Dr. Robert Grossman and Dr. Yunhong Gu of NCDM.


SuperComputing 2006 Bandwidth Challange Award

Bandwidth Challenge 06: End-to-End Achievement

The National Center for Data Mining at UIC has won the HPC Bandwidth Challenge at SC06 in Tampa, FL, sponsored by Qwest. Nine institutions participated in the competition. NCDM won by sustaining a data transfer rate of 8Gb/s over a 10Gb/s link, with a peak rate of 9.18Gb/s during the competition window. NCDM uses its own open source software products, UDT and SECTOR, to transfer large datasets efficiently at high speeds on optical networks.

This year the Bandwidth Challenge focused on a specific facet of networking: End-to-End achievement. Competitors were asked to fully utilize one 10 Gig path, end-to-end, disk-to-disk, from SC06 in Tampa back to their home institution, using the actual production network back home. Participants were required to realize, demonstrate and publish all the configuration, troubleshooting, tuning and policies used. The SC06 show floor was connected with the major US research networks, specifically: Abilene, ESnet, NLR PacketNet, NLR FrameNet, and HOPI. The US research networks provided transit for the international networks with which they peer.

NCDM transferred Sloan Digital Sky Survey Data (SDSS) between the SC06 show floor in Tampa, Fl and its lab in 4223 SEL at the University of Illinois in Chicago. It used SECTOR, the newly developed distributed data space management system. SECTOR transparently manages the file locating and data moving, while the NCDM developed UDT software is used for the actual data transfer. The data transfer was disk to disk over one 10Gb/s shared routed link between SC06 and UIC, via StarLight.

Bandwidth Challeng Competitors: Winner
National Center for Data Mining (NCDM) at UIC, Northwestern Univ., Johns Hopkins Univ., "Transporting Sloan Digital Sky Survey Data Using SECTOR".

Honorable Mention
1) CalTech, CERN, Univ. of Florida, Univ. of Michigan, "High Speed Data Gathering, Distribution and Analysis for Physics Discoveries at the Large Hadron Collider"

2) Indiana Univ., Pittsburgh SuperComputing Center, Oak Ridge National Laboratory, "All in a Day's Work: Advancing Data Intensive Research with the Data Capacitor"

Additional teams: 1) Japanese Aerospace Exploration Agency 2) Pacific Northwest National Laboratory 3) Purdue University 4) Internet2, Univ. of Washington 5) Texas A&M Univ. , Univ. of Delaware 6) Univ. of Tokyo

For more information on the HPC Challenge see SC06 BWC page.

March 2006

UDT

As of March 2006, 5519 users downloaded UDT (UDP-based Data Transfer Protocol) from our SourceForge page. UDT is a high performance data transport protocol for distributed data intensive applications developed in our lab.

March 2006

UIC Chicago Alumni News features an article about Stuart Bailey, the founder of Infoblox. Read More.

February 2006

Lecture: "GenDB - A Genome Annotation System for Prokaryotes"
Author: Dr. Folker Meyer
Date: Friday, February 10th, 2006
Time: 1pm-2pm
Location: 636 SEO

November 2005

LAC receives SC|05 Conference Awards.

NCDM won 2 awards at SC05 in Seattle, WA this year, including the first ever running of the Tri-Challenge. The Tri-Challenge was a combination of the HPCAnalytics, the HPC Bandwidth and the Storcloud Challenge. NCDM enterd all 3 competitions and received the highest combined score for its entries, winning first place. This competition is not expected to be run again, so NCDM is proud to be the first, and only, winner of the SuperComputing Tri-Challenge Award.

SuperComputing Conference High Performance Computing Challenges:
Bandwidth Challenge Winner (2004, Pittsburgh) Award
Application Foundation Award (2003, Phoenix)
Best Use of Emerging Infrastructure (2002, Baltimore) Award
1st runner-up, Outstanding (2000, Dallas) Award
High Performance Communication Award (1999, Portland) Award
Most Innovative of Show (1998, Orlando) Award
Gold Medal for Innovation (1996, Pittsburgh) Award
1st Place (1995, San Diego) Award

Links of Interest

Selected press clippings.

telephone (312) 996-0305
e-mail staff@teraflowtestbed.net
address 700 SEO MC 249, 851 S. Morgan St. Chicago, IL. 60607