* Phoenix Grid Computing Project *
|
Welcome to Phoenix Project!
Phoenix Grid Computing and Tools Project seeks programming models
and convenient tools suitable for Grid environment.
For programming models, we design a general message passing model
called Phoenix. In this model, one can express a wide variety of
parallel algorithms based on the familiar notion of message passing
paradigm. At the same time, applications written in Phoenix can add
and delete nodes from running applications. In addition implementation
of Phoenix message passing system is ``Grid-Enabled'' in the sense that
it works under typical network configurations of today, including
firewalls, NAT, and DHCP.
For tools, we design several tools that support common operations
that frequently occur when working in the Grid environment. Two such
operations are job (command) submissions and file copy. They are
simple operations, yet potentially become troublesome when the number
of resources involved becomes large and distributed over many subnets.
We aim at providing easy-to-install solutions to streamline our Grid
operations.
Research Goals
Specific research goals include:
- Programming models supporint general parallel algorithms on
dynamically changing number of resources.
Phoenix is a low-level message passing model supporing
joining/leaving nodes. When trying to add a node/process to or delete
one from running message passing applications, we face a basic
difficulty fundamental to message passing models. That is, the
destination of a message is specified by a node name that is
permanently bound to a node/process. This makes it difficult for nodes
to guarantee that messages will arrive ``at the right place'' in the
presence of joining/leaving nodes.
Phoenix thus abandons the basic idea of ``one name for one
node/process'' principle seen in regular message passing models (e.g.,
MPI). Rather, an application can define a large name space convenient
for that particular application and send a message to a name in that
space. A message destined for a name will be delivered to whichever
node/process is currently ``responsible for'' that name.
See our Phoenix design paper for details.
- Efficient implementation of Grid-Enabled message passing systems.
Message passing systems in the Grid must have application-level
routing to overcome communicaiton restrictions imposed in wide-area.
They include firewalls, NAT, and DHCP. Firewalls and NAT block
connections between nodes in different clusters/LANs. DHCP nodes make
it difficult to ``specify'' participating nodes offline. Batch
schedulers introduce similar problems to DHCP, in that the user cannot
know the name of the involved resources before a job gets started.
In Phoenix systems, all of these problems are uniformly solved by
having a fully dynamic application-level resource discovery and
routing layer. Nodes connect to each other where possible, and
dynamically build and maintain a routing table.
See our Phoenix implementation paper for details.
- High-performance computing (HPC) applications under background
loads and high latencies.
Today's ``demonstrating'' applications on the Grid are typically
High-througput computing (HTC) applications that are, from the point
of view of parallel computing, characterized by coarse-grained tasks
requiring little or no communication between tasks. We envision that,
with networking bandwidths even more rapidly advancing than CPU speed,
Grid environment will become a realistic pool of resources for HPCs.
The main difference to today's cluster environment is high latency and
dynamicity of resources.
To this end, we develop systems and applications that are highly
resillient to latency and loaded resources. Our LU factorization
achieved a performance close HPL under no-load conditions. It
surpasses HPL when some of the nodes are loaded and/or large message
latencies are inserted.
See our LU factorization paper for details.
- Scalable parallel command invocations to many nodes in the Grid.
Interactive command invocations are one of the most frequent
operations in any kind of computers. Grid environment is not an
exception. For file manipulation, diagnosis, trouble-shooting, and
program debugging, interactively invoking command for immediate
feedback is an indispensable to efficient development of Grid
programs. This simple task becomes troublesome when we deal with
many nodes, some of which may be behind filewalls and NAT routers.
We developed GXP, a tool to automate logging in many nodes, maintainning
connections, and submitting commands to them at the same time. Unlike
other similar tools, it is designed from the beginning with the
assumption that nodes may be in different clusters, some nodes may be
down, and many users do not bother to take hours to install such tools.
We also developed VGXP, a monitoring system
for many nodes providing a realtime, graphical view of resource usage.
Parallel command-submission and monitoring are integrated because
VGXP also works as a graphical front-end of GXP,
which enhances our daily computing experience with many nodes.
- Efficient, adaptive, and fault-tolerant file synchronization to
many nodes.
File replication and synchronization is another common task
developing Grid applications. Since network file systems often run
only in a single LAN explicitly copying files is necessary more often
in the Grid than in a LAN. Even with wide-area file systems, it is
often preferred and more efficient to explicitly copy/sync data among
all nodes occasionally. Explicit and simultaneous copy among many
nodes allow us to build an efficient data transfer pipeline so that
data do not cross WAN links many times and no nodes are overloaded to
feed other nodes. Based on this observation, we develop a
self-optimizing file replication tool called NetSync. It automatically
develops an efficient transfer pipeline (a tree in general) among
nodes.
See our NetSync paper for details.
|