HPC-Colony Project

Adaptive System Software For Improved Resiliency and Performance

Overview Goals Accomplishments FAQ News Participants Publications Links Internal Page

 

Goals

Develop infrastructure and strategies for automated parallel resource management.

  • Today, application programmers must explicitly manage these resources. We address scaling issues and porting issues by delegating resource management tasks to a sophisticated parallel OS.
  • "Managing resources" includes balancing CPU time, network utilization, and memory usage across the entire machine.

Develop a set of services to enhance the OS to improve its ability to support systems with very large numbers of processors.

  • We will improve operating system awareness of the requirements of parallel applications.
  • We will enhance operating system support for parallel execution by providing coordinated scheduling and improved management services for very large machines.