For the Workshop on "Information Visualization Software
Infrastructures" at IEEE 2004 Visualization,
Organized by Katy Brner,
Indiana University, USA and Jean-Daniel Fekete, INRIA, France
I.1) What functionality should a general InfoVis infrastructure provide?
'Infrastructure' has many different meanings and connotations. In this
discussion we try to use the term in its broadest and most inclusive sense: a
set of tools, protocols, and policies that support a common goal. In this case,
the common goal is, of course, software and support for InfoVis. The tools in
an infrastructure are the software applications and libraries, databases, and repositories
that make up and manage the hard assets. Protocols address the requirements for
the software and data components of the infrastructure, from how they interact
with each other to what languages should be used. The policies help direct the
social aspects of the infrastructure, e.g. who maintains software, minimum
levels of implementation and documentation required to be part of the infrastructure,
versioning and release management, and licensing issues.
Tools
A Web-based central information repository, modeled after the successful
biological and software source code repositories (e.g. NCBI, FlyBase,
SourceForge CPAN ) should make up the core of the
infrastructure.
This repository would provide:
Protocols
Visualization software presents many unique challenges to implementers and
integrators. Often, tools are written in different languages or use different
interfaces for data access and user interaction. At one extreme, some tools
exist as standalone applications that provide full support for data I/O and
interaction, but little support for integration with other tools (e.g.
Spotfire). Other tools provide software frameworks for extending and embedding
visualizations along with extensible I/O toolkits (e.g. Matlab, IDL). Finally,
some packages exist only as libraries, meant to be as plugins in new applications
and existing applications.
The InfoVis infrastructure should not preclude any models, but should
provide a common set of component protocols for guiding new development towards
libraries to ensure the most flexibility for InfoVis researchers and end users.
To encourage the use of the protocols, the infrastructure should provide a
reference implementation of a container framework that provides support for
common tasks, such as data I/O and user interaction, that can be used by anyone
developing new InfoVis tools (e.g. Katy Boerner's IVC). The component protocols
should follow the general philosophy of Microsoft's COM framework, enabling
different levels of interface support and allowing the dynamic discovery of new
interfaces.
The reference implementation and component framework should be implemented
in one language. Given that the InfoVis community has adopted Java, Java would
be a good choice for this language. However, if it is important to make the
InfoVis tools accessible outside of academia, .NET bindings should be
considered.
Component Protocols (along with implementation suggestions) should include,
but not be limited to:
One thing worth noting is that components will not need to be implemented entirely in Java. As long as they expose themselves via the component interfaces, they can use any language that can talk to Java. This is important for supporting large datasets since Java memory usage does not scale well (e.g. The Boost Graph Library (BGL) integration).
The component interfaces should evolve early in their lifecycles. While this
may present some compatibility issues, the flexibility will be important as
design issues get worked out.
Policies
The infrastructure policies are designed to provide a basic set of
guidelines that encourage people to use the framework and prevent people from
abusing it.
They should:
Unlike the Java or C++ standard libraries, the InfoVis tools do not need to
be industrial strength. Both tools under active development and complete tools
should be encouraged, to help share ideas and promote collaboration. A few
minimum requirements should be met by projects. These include:
This requirement insures that anyone can use the tool 'out of the box'.
This is only a partial list, but is meant to address many common problems
people encounter with unknown and poorly supported software. Since this
repository will be how most people interact with the InfoVis community, it
should provide a pleasant experience!
I.2) What do you see as the main technical challenges for creating a
central but flexible and universally useful (information) visualization
software infrastructure (as opposed to 100 different ones)?
For the infrastructure described above, there are very few technical
challenges. Techniques for developing large repository-style Web sites are well
known and most of the common approaches (PHP/MySQL, Java/EJBs,
Plone/Content-managment, hybrid approaches, etc) would be fine for the
repository.
Developing a flexible set of component protocols will be more challenging,
but as long as a careful, iterative design process is employed, there is no
reason it will not be successful. The same holds for the reference framework.
Plugin based applications are common and Java provides good support for
developing easily extended applications.
One technical area that will be challenging is providing support for
scaleable components that support large datasets. While Java works well for
small and medium size data sets, it tends to degrade quickly when the number of
elements reaches the tens and hundreds of millions. Since large datasets are
important for many modern applications (e.g. genomics and Homeland Security),
care should be taken to design the component system in a way that encourages
the use of more appropriate languages for large-scale and high-performance
visualizations.
While the technical challenges are fairly tractable, the cultural ones will
be far more important to the overall success of the infrastructure. Some
cultural challenges that will arise and some suggested solutions are:
Please describe the (information) visualization software infrastructure you
are working on.
II.1) Project Name and Web Address
The Boost Graph Library
(BGL) is the main visualization infrastructure component we are developing.
The BGL is a high-performance graph library written in C++ that we are integrating
into Katy Borner's InfoVis
CyberInfrastructure (IVC).
In addition to the BGL, we are also in the early stages of exploring
large-scale and high-performance visualization systems for Bioinformatics.
II.2) Core Team Members (Please list in order, Role of Project Member,
Full Name, E-mail. e.g.: Developer, John Doe, jdoe@univ.edu)
Big Chief, Dr. Andrew Lumsdaine, lums at osl dot indiana dot edu
Lead Developer/Researcher, Dr. Douglas Gregor, dgregor at cs dot indiana dot
edu
Developer/Researcher, Christopher Mueller, chemuell at cs dot indiana dot edu
II.3) Project Start Date
1997 (sequential BGL), June 2004 (parallel and distributed BGL)
II.4) Targeted User Group
Infrastructure developers, graph algorithm developers
II.5) Supported User Tasks
High performance data structures and algorithms for graphs
II.6) Major Features of the System Architecture
Algorithms are generic, allowing them to operate efficiently on any
graph-like data structure. For instance, we have simple bindings permitting BGL
algorithms to operate on LEDA graphs, and
could even use Java
Native Interface (JNI) to allow BGL algorithms to operate on JUNG graphs.
II.7) Algorithms Provided
Various shortest paths (single-source and all-pairs), minimum spanning tree,
connected components, maximum flow, isomorphism, betweenness centrality,
Kamada-Kawai layout, etc. See the more complete list in the table of
contents.
II.8) Snapshot of the Interface
Not applicable, because the BGL is a low-level graph manipulation library.
II.9) Development Platform
Standard C++
II.10) Supported Operating Systems
Any.
II.11) Software Dependencies/Required Libraries
Requires a standards-compliant C++ compiler. Requires (and is part of) the Boost C++ Libraries.
II.12) Current License
The Boost Software
License, which is a BSD-style license.
II.13) Number of Users/Downloads
The BGL is part of a the larger Boost C++
Libraries, a collection of free, peer-reviewed libraries. Boost itself has
had approximately 194,000 downloads (currently averaging around 1,000 per day)
and its developer mailing list consists of 2,200 developers and users. The
Boost libraries are highly regarded in the C++ community, with several
libraries having been accepted
for standardization by the ANSI/ISO C++ committee. However, it is difficult
to characterize how much traffic is due to the BGL itself.
II.14) Pros and Cons
The BGL is a fast, stable, extensible library for manipulation of graphs.
It's implementation in C++ is both a pro and a con: there is essentially no
unnecessary overhead in a BGL graph or graph algorithm, even when adapting to
other graph libraries. For instance, the (sequential) BGL computes betweenness
centrality between 20 and 40 times faster than JUNG and requires much less memory. The
parallel BGL implementation of betweenness centrality then scales linearly with
the number of processors, permitting additional speedups.
On the other hand, the learning curve for the BGL is much more steep than
for, e.g., JUNG, and it may prove
too steep for casual developers. Also, integration of C++ into a Java project
is workable, and probably necessary for large data sets, but complicates
development, maintenance, and release management.
II.15) Planned Work
Our current research involves improvements to the existing BGL (new
algorithms, etc.), parallel and distributed enhancements to the BGL (for
computation on very large graphs), large graph visualization (eventually), and
integration of both the sequential and parallel BGL into an InfoVis framework
(currently working with IVC).
Please describe your main interest in participating in the workshop
We are interested in researching high-performance and scalable tools for
data processing and visualization.
Send the completed paper by Sept. 30, 2004
to katy@indiana.edu
and Jean-Daniel.Fekete@inria.fr.