* Phoenix Grid Computing Project *

Welcome to Phoenix Project!

Phoenix Grid Computing and Tools Project seeks programming models and convenient tools suitable for Grid environment.

For programming models, we design a general message passing model called Phoenix. In this model, one can express a wide variety of parallel algorithms based on the familiar notion of message passing paradigm. At the same time, applications written in Phoenix can add and delete nodes from running applications. In addition implementation of Phoenix message passing system is ``Grid-Enabled'' in the sense that it works under typical network configurations of today, including firewalls, NAT, and DHCP.

For tools, we design several tools that support common operations that frequently occur when working in the Grid environment. Two such operations are job (command) submissions and file copy. They are simple operations, yet potentially become troublesome when the number of resources involved becomes large and distributed over many subnets. We aim at providing easy-to-install solutions to streamline our Grid operations.

Research Goals

Specific research goals include:
  • Programming models supporint general parallel algorithms on dynamically changing number of resources.

    Phoenix is a low-level message passing model supporing joining/leaving nodes. When trying to add a node/process to or delete one from running message passing applications, we face a basic difficulty fundamental to message passing models. That is, the destination of a message is specified by a node name that is permanently bound to a node/process. This makes it difficult for nodes to guarantee that messages will arrive ``at the right place'' in the presence of joining/leaving nodes.

    Phoenix thus abandons the basic idea of ``one name for one node/process'' principle seen in regular message passing models (e.g., MPI). Rather, an application can define a large name space convenient for that particular application and send a message to a name in that space. A message destined for a name will be delivered to whichever node/process is currently ``responsible for'' that name.

    See our Phoenix design paper for details.

  • Efficient implementation of Grid-Enabled message passing systems.

    Message passing systems in the Grid must have application-level routing to overcome communicaiton restrictions imposed in wide-area. They include firewalls, NAT, and DHCP. Firewalls and NAT block connections between nodes in different clusters/LANs. DHCP nodes make it difficult to ``specify'' participating nodes offline. Batch schedulers introduce similar problems to DHCP, in that the user cannot know the name of the involved resources before a job gets started.

    In Phoenix systems, all of these problems are uniformly solved by having a fully dynamic application-level resource discovery and routing layer. Nodes connect to each other where possible, and dynamically build and maintain a routing table.

    See our Phoenix implementation paper for details.

  • High-performance computing (HPC) applications under background loads and high latencies.

    Today's ``demonstrating'' applications on the Grid are typically High-througput computing (HTC) applications that are, from the point of view of parallel computing, characterized by coarse-grained tasks requiring little or no communication between tasks. We envision that, with networking bandwidths even more rapidly advancing than CPU speed, Grid environment will become a realistic pool of resources for HPCs. The main difference to today's cluster environment is high latency and dynamicity of resources.

    To this end, we develop systems and applications that are highly resillient to latency and loaded resources. Our LU factorization achieved a performance close HPL under no-load conditions. It surpasses HPL when some of the nodes are loaded and/or large message latencies are inserted.

    See our LU factorization paper for details.

  • Scalable parallel command invocations to many nodes in the Grid.

    Interactive command invocations are one of the most frequent operations in any kind of computers. Grid environment is not an exception. For file manipulation, diagnosis, trouble-shooting, and program debugging, interactively invoking command for immediate feedback is an indispensable to efficient development of Grid programs. This simple task becomes troublesome when we deal with many nodes, some of which may be behind filewalls and NAT routers.

    We developed GXP, a tool to automate logging in many nodes, maintainning connections, and submitting commands to them at the same time. Unlike other similar tools, it is designed from the beginning with the assumption that nodes may be in different clusters, some nodes may be down, and many users do not bother to take hours to install such tools.

    We also developed VGXP, a monitoring system for many nodes providing a realtime, graphical view of resource usage. Parallel command-submission and monitoring are integrated because VGXP also works as a graphical front-end of GXP, which enhances our daily computing experience with many nodes.

  • Efficient, adaptive, and fault-tolerant file synchronization to many nodes.

    File replication and synchronization is another common task developing Grid applications. Since network file systems often run only in a single LAN explicitly copying files is necessary more often in the Grid than in a LAN. Even with wide-area file systems, it is often preferred and more efficient to explicitly copy/sync data among all nodes occasionally. Explicit and simultaneous copy among many nodes allow us to build an efficient data transfer pipeline so that data do not cross WAN links many times and no nodes are overloaded to feed other nodes. Based on this observation, we develop a self-optimizing file replication tool called NetSync. It automatically develops an efficient transfer pipeline (a tree in general) among nodes.

    See our NetSync paper for details.