Particle Physics e-Science Programme Proposal

The UK Grid for Particle Physics Collaboration

GridPP

University of Birmingham,

University of Bristol,

Brunel University,

CERN, European Organization for Nuclear Research,

University of Cambridge,

University of Durham,

University of Edinburgh,

University of Glasgow,

Imperial College of Science, Technology and Medicine,

Lancaster University,

University of Liverpool,

University of Manchester,

Oxford University,

Queen Mary, University of London,

Royal Holloway, University of London,

Rutherford Appleton Laboratory,

University of Sheffield,

University of Sussex,

University of Wales Swansea,

University College London.

Contacts

Dr. Tony Doyle – A.Doyle@physics.gla.ac.uk

Dr. Steve Lloyd – S.L.Lloyd@qmw.ac.uk

 

Abstract

This document contains the bid from the UK Particle Physics Community to PPARC for resources to develop a Grid for Particle Physics research - GridPP. This request is for the funding of a £25.9M three year co-development programme with CERN and the EU DataGrid project. GridPP will deliver the Grid software (middleware) and hardware infrastructure to enable the testing of a prototype of the Grid for the LHC of significant scale. The GridPP project is designed to integrate with the existing Particle Physics programme within the UK, thus enabling early deployment and full testing of Grid technology and efficient use of limited resources. The project will disseminate the GridPP deliverables in the multi-disciplinary e-science environment and will seek to build collaborations with emerging non-PPARC Grid activities both nationally and internationally.

 

Table of Contents

1. Executive Summary *

2. Introduction *

2.1 Importance of the Grid *

2.2 Key Issues *

2.3 Extent of Full Programme *

2.4 Outline of this Document *

3. Overall Model *

3.1 Overview *

3.2 GridPP *

3.3 Major Deliverables of the GridPP Project *

4. GridPP Programme *

4.1 Programme Built from Components *

4.2 Component Definition *

4.2.1 Component 1: Foundation *

4.2.2 Component 2: Production *

4.2.3 Component 3: Middleware *

4.2.4 Component 4: Exploitation *

4.2.5 Component 5: Value-added Exploitation *

4.3 Financial Breakdown *

4.4 Deliverables against Components *

5. WorkGroup Structures and Remits *

5.1 The WorkGroups *

5.2 Remits *

5.2.1 A: Workload Management *

5.2.2 B: Information Services and Data Management *

5.2.3 C: Monitoring Services *

5.2.4 D: Fabric Management and Mass Storage *

5.2.5 E: Security *

5.2.6 F: Networking *

5.2.7 G: Prototype Grid *

5.2.8 H: Software Support *

5.2.9 I: Experiment Objectives *

5.2.10 J: Dissemination *

5.2.11 K: CERN *

6. Resource Analysis for GridPP *

6.1 Introduction *

6.2 Resources Requested *

6.3 Resource Requested for UK Tier Centres *

6.4 Resource Requested for UK WorkGroups *

6.5 Conclusions *

7. DataGrid Project Overview *

7.1 Description of the DataGrid Workpackages *

7.1.1 Relationship of GridPP WorkGroups to DataGrid Workpackages *

8. Collaborations *

8.1 Dissemination *

8.2 Collaboration with Astronomers *

8.3 Collaboration with Computer Scientists *

8.4 Collaboration with Industry *

8.5 Collaboration with UKERNA *

8.6 Collaboration with US Groups: GriPhyN and PPDG *

8.6.1 GriPhyN *

8.6.2 PPDG *

9. CERN *

9.1 GridPP and the CERN LHC Computing Project *

9.2 Development Work *

9.3 Hardware *

9.4 Oversight and Accountability *

10. Programme Management *

10.1 The Project Management Board (PMB) *

10.2 Collaboration Board (CB) *

10.3 The Technical Board (TB) *

10.4 The Experiments Board (EB) *

10.5 The Dissemination Board (DB) *

10.6 Peer Review Selection Committee (PRSC) *

10.7 "Buying-out" of Positions in Grid Management *

10.7.1 The Project Leader and Deputy Project Leader *

10.7.2 Chair of the Technical Board *

10.7.3 Chair of the Collaboration Board *

10.8 DataGrid *

11. Conclusions *

12. References *

13. Glossary and Acronyms *

14. APPENDIX WorkGroups *

14.1 A: Workload Management *

14.1.1 Programme *

14.1.2 Milestones *

14.2 B: Information Services and Data Management *

14.2.1 Information Model *

14.2.2 Directory Service *

14.2.3 SQL Database Service *

14.2.4 Schema Definition *

14.2.5 Interaction with Other WorkGroups *

14.2.6 Data Management *

14.2.7 Query Optimisation *

14.2.8 Data-Mining *

14.2.9 Tasks *

14.3 C: Monitoring Services *

14.3.1 Overview *

14.3.2 Tasks *

14.3.3 Deliverables *

14.4 D: Fabric Management and Mass Storage *

14.4.1 Introduction *

14.4.2 Relationship with DataGrid *

14.4.3 Tasks *

14.4.4 Current Experiments *

14.4.5 Collaborative Links *

14.5 E: Security Development *

14.5.1 Introduction *

14.5.2 Relationship with DataGrid *

14.5.3 Tasks *

14.5.4 Collaborative Links *

14.6 F: Networking Development *

14.6.1 Infrastructure *

14.6.2 Integration of Network Services *

14.6.3 Data Transport Applications *

14.6.4 Monitoring and Network Information Services *

14.6.5 Relation to EU DataGrid *

14.6.6 Collaborative Links *

14.6.7 References and Further Details *

14.7 G: Prototype Grid *

14.8 H: Software Support *

14.9 I: Experiment Objectives *

14.10 J: Dissemination - Astronomy Links *

14.11 Other WorkGroups *

15. APPENDIX DataGrid *

15.1 Description of the Workpackages *

15.1.1 Grid Middleware *

15.1.2 Grid Infrastructure *

15.1.3 Applications *

15.1.4 Project Management *

15.2 UK Responsibilities and Deliverables *

16. APPENDIX CERN Programme *

16.1 Development Teams and Programme of Work *

16.2 Resources *

16.3 Milestones and Deliverables *

17. APPENDIX Constitution of Management Boards *

17.1 The Project Management Board (PMB) *

17.1.1 Terms of Reference *

Membership of the PMB *

17.2 Collaboration Board (CB) *

17.2.1 Terms of Reference *

17.2.2 Membership of the CB *

17.3 The Technical Board (TB) *

17.3.1 Terms of Reference *

17.3.2 Membership of the TB *

17.4 The Experiments Board (EB) *

17.4.1 Terms of Reference *

17.4.2 Membership of the EB *

17.5 The Dissemination Board (DB) *

17.5.1 Terms of Reference *

17.5.2 Membership of the DB *

17.6 Peer Review Selection Committee (PRSC) *

17.6.1 Terms of Reference *

17.6.2 Membership of the PRSC *

18. APPENDIX Training *

19. APPENDIX Letters of Support from the US *

19.1 Letter from PPDG *

19.2 Letter from GriPhyN *

  1. Executive Summary

  1. A three-year £40M programme is presented which will enable the computing requirements of Particle Physicists to be met by the formation of the UK Grid for Particle Physics (GridPP).
  2. A bid to PPARC is made for £25.9M to deliver the GridPP programme.
  3. The main aim is to provide a computing environment for the UK Particle Physics Community capable of meeting the challenges posed by the unprecedented data requirements of the LHC experiments.
  4. Grid technology is the framework used to develop this capability: key components will be developed as part of this proposal.
  5. The implementation of GridPP is made through five interlocking components:

  1. The majority of the proposed funding will enable new and existing staff to work on applications, Grid services and core middleware based in the UK and at CERN.
  2. The proposal builds on the strong computing traditions of the UK at CERN. The CERN working groups will make a major contribution to this research and development programme and are integrated into this proposal.
  3. The proposal is also integrated with developments from the EU DataGrid, PPDG and GriPhyN in order to ensure the development of a common set of principles, protocols and standards that can support a wide range of applications.
  4. Provision is made for facilities at CERN (Tier-0), RAL (Tier-1) and up to four Regional Centres (Tier-2).
  5. These centres will provide a focus for dissemination to the academic and commercial sector and are expected to attract funds from elsewhere such that the full programme can be realised.
  6. The process of creating and testing the computing environment for the LHC will naturally provide for the needs of the current generation of highly data intensive Particle Physics experiments: these will provide a live test environment for GridPP research and development.
  7. A robust management structure is described with clear lines of responsibility enabling all institutes to collaborate to maximum effect.
  8. The major deliverables for each aspect of the project are defined by annual time-scales in March 2002, 3 and 4.

  1. Introduction
    1. Importance of the Grid
    2. The strategic importance of this area and the priority given to it by the UK Government are underlined by the following remarks is the Science Budget Allocation announcement.

      "e-science means science increasingly done through distributed global collaborations enabled by the Internet, using very large data collection, tera-scale computing resources and high performance visualisation.

      "Many areas of leading-edge science are facing major challenges in the processing, communication, storage and visualisation of ever increasing amounts of data. Solving these problems will be a global effort, and if the UK is to stay at the forefront of many key disciplines it needs to invest in the relevant technologies and infrastructure. We need to invest in solving the problems of individual disciplines, and in the generic core technologies common to all disciplines.

      "The overall e-science programme will combine two approaches: individual Councils will solve problems in their own areas; and a cross-Council ‘core’ programme will tackle issues of common concern to all Councils. The core programme will work across all of the different Council activities, developing and brokering generic technology solutions and generic middleware. This work will be carried out in a number of organisations across the science and engineering base, which could include international facilities such as CERN.

      "The first [major application] is the data handling requirement for the Large Hadron Collider, the new CERN accelerator which will be completed in 2005. The extremely large data flow rates are orders of magnitude larger than the current state of the art. The solution being developed is likely to have major elements of technology and middleware, which can be used in other science applications, particularly in areas of data flow and large scale computation."

    3. Key Issues
    4. The pivotal role of Particle Physics as continuing to drive the cutting edge developments in this area is clearly emphasised in the above. This proposal describes how UK particle physicists plan to set about delivering the Grid which not only meets their extreme requirements but also addresses many of the needs of the wider community.

      Key Aim

      The key aim of this proposal is to provide a computing environment for the UK Particle Physics Community capable of meeting the challenges posed by the unprecedented requirements of the LHC experiments. It is a central assumption that these needs can best be met by harnessing internationally distributed computer power and storage capacity efficiently and transparently.

      Grid Technology

      The emerging Grid technology is a natural framework within which to develop this capability. It is therefore an important objective that this development should provide the functionality required for full exploitation of the LHC. The requirements for Particle Physics should drive the development of a generic framework, which also will have a potential benefit to other sciences and the wider community. The tools and infrastructure will be certain to have more general applications, some with definite commercial implications.

      Computing Environment for the LHC

      The process of creating and testing this computing environment for the LHC will naturally embrace the needs of the current generation of highly data-intensive experiments (in particular, at the SLAC b-factory, the Tevatron and at HERA). These provide well-defined and well-focussed intermediate scale problems that can be used to "field test" the performance under actual running experiment conditions. This both complements the development of prototype systems for the LHC and deepens the experience of the UK in the issues to be tackled in delivering the robustness that will be essential for the LHC and other applications. This approach also allows value to be added to, and additional benefit to be extracted from, recent JIF and JREI awards for the development of computing infrastructure for BaBar, LHC and the Tevatron.

      Collaboration

      The nature of the Grid requires the development of a common set of principles, protocols and standards that can support the wide range of applications envisaged. Collaboration with the other initiatives is therefore essential if the UK is to achieve the aim of being a leader in the global development of the Grid technology. Problems of duplication and incompatibility are also avoided through a close dialogue with other leading players. In this context, it is important that there is close collaboration from the beginning with initiatives such as the EU DataGrid, PPDG and GriPhyN, all of which already have significant funding. The US has an ambitious programme with an overall e-science budget in excess of a billion dollars, and it is important that the UK investment also reflects a strong commitment to this highly strategic technology.

      Computing at CERN

      This bid builds particularly on the strong computing traditions at CERN (in which the UK has traditionally been strongly represented). It is essential that the appropriate computing infrastructure be developed at CERN in close co-ordination with that in the UK. This is explicitly recognised in the science budget announcement. "In particular, the computing for the Large Hadron Collider at CERN will require investment both in the UK and at CERN itself alongside the main CERN LHC programme." It is a great strength of this proposal that CERN is a full participant and substantial resources (both staff and equipment) will indeed be required there to develop the central infrastructure essential to demonstrate the full functional capability of the UK GridPP.

      Astronomy Community

      The UK Particle Physics Community will be looking to benefit wherever possible from collaboration with the Astronomy Community. There are some common applications, and of course the "middleware" is also likely to be common, but the initial emphasis of the AstroGrid project is focussed on developing techniques for optimal scientific exploitation of diverse distributed data archives. This is a very exciting area, complementary to the initial focus of the GridPP. However, there is potential within Particle Physics for the creation of new analysis methods built on top of the techniques developed for analysis of astronomical data. It is also important that other aspects of Grid functionality should be explored and exercised as part of GridPP, through projects initiated from within the LHC or other experiments.

      Other Sciences, Industry and Commerce

      Although the primary focus of this proposal is the development of the computing infrastructure required to support Particle Physics, particularly the LHC, it is an important aim to connect these developments to those in other sciences, industry and commerce. The Universities and CLRC are by their very nature multi-disciplinary and several collaborative programmes are developing in which Particle Physics groups are central. Close collaboration with Computing Science groups with the skills to deliver commercially exploitable products are a feature, as are close links with other disciplines whose requirements could eventually match or exceed those of the LHC. Whilst seed money from PPARC would clearly benefit these initiatives, they will mostly be seeking their funding from the resources for more generic R&D administered by the EPSRC and, in the case of the University sector, from SRIF, JREI etc. Where infrastructure can be financed through either of these routes, there is a clear benefit of added value to PPARC investment. CLRC should also see additional resources for more generic developments in this area which should also be able to bring added value to this proposal.

      Skill Development

      The emphasis in this bid is on resources for people, rather than equipment, reflecting a key objective in creating a closely integrated community of experts skilled in the new technologies needed to deliver the GridPP. A vital element will be the cohort of new post-docs working either in the UK or through UK institutions at CERN, who will be acquiring the skills needed to develop this technology not only for Particle Physics but also for broader applications. It is inevitable and desirable that many of these highly trained personnel will be attracted away from academia by the higher levels of remuneration on offer in the commercial world where they are to be expected to play a key role in the dissemination of Grid technology. The rigours of a training in Particle Physics are also seen to equip post-docs well for the job market. A recent survey by the DELPHI experiment at CERN found all of the respondents secured employment immediately at the end of their Ph.D. training of which 43% ended up in computing, high technology companies or management. Were the same trends to be followed by those gaining expertise in Grid technology through their collaboration on this proposal, this would provide a key strategic injection of appropriately skilled personnel into the UK economy.

    5. Extent of Full Programme
    6. The full programme outlined above, including appropriate dissemination to the wider community, results in an estimated cost of £40M. This sum corresponds to a bid to PPARC of £25.9M (this proposal) plus a £11.6M total of external bids to EPSRC and SRIF, as well as existing JIF/JREI money. Funding for a significant fraction of the University based computing infrastructure should be eligible for SRIF support and within CLRC there are possible new resources which also need to be vigorously pursued. More generic areas and projects which have a major dissemination component should also be able to attract additional support from the funds administered by the EPSRC. However, apart from the facilities already supported through JIF and JREI, none of the other options for funding are guaranteed. Therefore the extent to which the needs of the full programme can be met will depend on not just the level of support from PPARC, but also on the ability to attract matching funds from these other sources. Furthermore, the success in attracting these other funds will in turn depend critically on the full scale of the support provided by PPARC.

      Letters of support from the US are included in Appendix 19.

    7. Outline of this Document

    In the remainder of this document, the UK Particle Physics (PP) Grid, GridPP, bid is outlined, while further details are included in the Appendices. Section 13 contains a Glossary of terms, including lists of Acronyms.

    Section 3 provides an overview of the GridPP activities.

    Section 4 focuses on the programme associated with this bid to PPARC.

    Section 5 explains the structure of the WorkGroups which have been set up. The programmes of the individual groups are explained in more detail in Appendix 14.

    Section 6 contains the financial part of this bid. This provides a top-down breakdown of the requested funds.

    Section 7 sets out the relationship between the EU DataGrid and this proposal. The DataGrid is a European-wide initiative with significant PP participation and is described in more detail in Appendix 15.

    Section 8 describes the collaborations between UK-PP and other UK groups, in particular the Astronomers and UKERNA.

    Section 9 describes the special relationship between UK-PP and CERN. This is explained in more detail in Appendix 16.

    Section 10 outlines the proposed management for the UK-PP activities. This is explained in more detail in Appendix 17.

  2. Overall Model
    1. Overview

The Grid for Particle Physics (GridPP) has as its objective the efficient delivery of necessary the Information Communication Technology (ICT) infrastructure for our future experimental programme. This objective will be met by:

The implementation of GridPP will be made through the five components, shown in Figure 1 and discussed in Section 4.


 

 

 

 

 

 

 

 

Figure 1: Components of GridPP.

Each component is necessary to the construction and usefulness GridPP. The development of GridPP is contingent on the investment in the Foundation and Production components necessary to test the GridPP environment. These funding of these components will also satisfy the LHC Computing Review requirements [1]. At the same time that the first two components are being put in place, the basic middleware component will be developed that connects the fabric into a Grid-like shared resource, in collaboration with the Computer Science community. Only with the realisation of the fourth component (exploitation) with PP applications that utilise the fabric and the middleware will a true Grid take shape and enable the mimimum system required to deliver our science programme. Full exploitation of the massive potential of the GridPP will be possible when the first four components are in place.

The components are project-oriented expressions of the layered architecture of the Grid. Classically these five layers, shown in Figure 2, are considered: fabric, connectivity, resource, collective and user/application each with its own function.

Figure 2: Layers within the Grid. Each one of the layers will have its own Application Programming Interfaces (APIs) and Software Development Kits (SDKs).

Figure 3: Relationship between the components of GridPP and the Grid architecture layers.

The relationships between the different layers of the Grid architecture and the components of the GridPP are shown in Figure 3.

For the next generation of experiments at the LHC (ALICE, ATLAS, CMS and LHCb), the Grid will be the keystone underpinning the entire ICT structure of data storage, simulation and physics analysis. But it is planned that the Grid development be incremental, and will engage the section of our community actively involved in the analysis of data from SLAC, FNAL and HERA. Thus the existing ICT infrastructure of the experiments and collaborating UK institutes will form the initial "fabric" of the Grid with substantial expansion required to complete the Foundation and Production components.

    1. GridPP
    2. Figure 4: Structure of the Grid for Particle Physics (GridPP).

      Figure 4 shows the structure of GridPP. The user at Tier-4 level will access local (Tier-3) resources, analysis cluster and local storage via a portal device e.g. workstation, laptop or hand-held. Tier-3 are connected to regional Tier-2 centres which are able to service requests, for example, for Monte Carlo production. National Tier-1 centres are repositories for large quantities of data from the experimental sites (Tier-0’s). The hierarchical structure is aimed at minimising bandwidth requirements but the Grid nature allows peer-to-peer communication and direct access to all facilities (broken lines) where authorisation is approved.

      The organisation of the implementation of the Grid will take two forms: "clusters" and WorkGroups (described in Section 5). There will be up to four regional clusters representing Universities and Institutes in the UK. These local groupings will collaborate to create regional centres of excellence for Grid activities and build the Tier-1 and regional Tier-2 centres (see Figure 4). The WorkGroups, whose remits closely match those within the EU DataGrid, will ensure the delivery of the components of the Grid. Although in some Workgroups there will be a strong regional interest, institutes within a cluster will collaborate with groups in other clusters. The Grid management will ensure that this "matrix" of resource allocation avoids duplication of effort within Particle Physics. Thus implementation of the Grid will conform to the "virtual organisations" paradigm through the creation of geographically intersecting WorkGroups and clusters sharing resources.

    3. Major Deliverables of the GridPP Project

It is proposed to use prototypes as the major deliverables of the present proposal. The outline definitions of each of these prototypes is summarised in Table 1. The detailed definitions will be defined in collaboration with CERN. The LHC Computing Review has defined the requirements and long-term goals, and the EU DataGrid project is now defining the goals for their releases where workpackage deliverables are foreseen at three stages of the project (September of 2001/2/3). It is expected that the full GridPP prototype definitions will be agreed during 2001.

Table 1: Major Deliverables of the GridPP Project.

Deliverable

Date

Goals

Prototype I

Mar 2002

Performance and scalability testing of components of the computing fabric (clusters, disk storage, mass storage system, system installation, system monitoring) using straightforward physics applications. Testing of the job scheduling and data replication software from the first DataGrid release.

Prototype II

Mar 2003

Prototyping of the integrated local computing fabric, with emphasis on scaling, reliability and resilience to errors. Performance testing of LHC applications. Distributed HEP and other science application models using the second DataGrid release.

Prototype III

Mar 2004

Full scale testing of the LHC computing model with fabric management and Grid management software for Tier-0 and Tier-1 centres, with some Tier-2 components. This is the prototype system that will be used to define the parameters for the acquisition of the initial LHC production system. This will use the software from the final DataGrid release.

 

  1. GridPP Programme
    1. Programme Built from Components
    2. The programme for the UK Grid for Particle Physics consists of a series of interlocking "components". These components are ordered in the sense that the Foundation component, numbered 1, is required by the subsequent components. Components 1 to 4 are considered to be essential to meet the requirements of the GridPP programme, while component 5 would offer significant added-value. Each programme component covers e-Science activities within the UK as well as at CERN, and funding is assumed to be forthcoming from PPARC and also from external sources.

      The WorkGroups, described in detail in Section 5, define the overall GridPP programme and specify the remits within the programme components. Deliverables have been derived for each of the components by aggregating these from the relevant WorkGroups (see Section 4.4); indications are given as to whether these are PPARC or externally funded.

    3. Component Definition
      1. Component 1: Foundation
      2. The "Foundation Programme Component" establishes the infrastructure at CERN and within the UK and enables full geographical Grid connectivity and testing. By the third year, this component requires a total disk data storage of 50/35TB in UK/CERN, CPU farms of 16,000/13,200 SI95, and effort must be assigned to plan, commission and operate this computing hardware. The equipment is assumed to resource the Tier-1 centre and one or more Tier-2 centres. Networking between sites within the UK is provided by SuperJANET4 and local networking provision is assumed to be funded from SRIF. The Géant European network provides connectivity to CERN.

      3. Component 2: Production
      4. The "Production Programme Component" builds on the Foundation Component, and offers a production environment for experiments to use, and enables detailed Grid stress-tests to be undertaken using a structure within the UK which meets the hardware requirements specified by the LHC Computing Review [1]. The BaBar UK computing commitments are invested in this component. Current experiments are able to develop analysis procedures using the power of the Grid, and the UK is able to play a full role in the forthcoming LHC data challenges. By the third year, this component requires a total disk data storage of 115/86TB in UK/CERN, CPU farms of 40,000/33,000 SI95, and significant effort must be assigned to design, implement and operate the substantial resources for a production-level facility.

      5. Component 3: Middleware
      6. Programme components 1 and 2 produce the software required to meet the demands of connectivity and stress-tests. Component 3 enables the UK and CERN to make suitable contributions towards generating the middleware required for full Grid functionality, and in the process create generic software which will be used by other disciplines and in wider contexts. It makes it possible to optimise the network connections and meet fully its commitments to the DataGrid project. It also covers the collaboration required to link activities with Astronomers, other sciences, computer scientists and industry, and to effect technology transfer.

      7. Component 4: Exploitation
      8. With infrastructure in place from components 1 and 2, and the middleware produced in component 3, the experiments are then in the position to exploit the new technology. The experiments must develop their applications in order that they move from bespoke solutions for their analysis needs to full Grid functionality. The experience of undertaking this exercise and using the Grid infrastructure in earnest, will offer excellent experience for other disciplines. Component 4 is focused on providing the applications to deliver the science using the Grid infrastructure.

        Components 1 to 4 must be funded, as an absolute minimum, in order for the GridPP programme to be realised.

      9. Component 5: Value-added Exploitation

      Additional funds would enable further exploitation of the infrastructure established in components 1, 2 and 3 both at CERN and in the UK. They would enable PPARC to undertake true value-added activities, for CERN to undertake a wider role, and for greater dissemination in the UK. The application support within the experiments would be increased.

    4. Financial Breakdown
    5. Section 6 presents a full analysis of the financial breakdown. Table 2 shows the PPARC and external funds invested as a function of component.

      Table 2: Top-level financial breakdown.

      Component

      Description

      CERN contribution

      from PPARC (£M)

      Cost to PPARC

      (£M)

      Total PPARC

      (£M)

      1

      Foundation

      2.5

      8.5

      8.5

      2

      Production

      1.0

      4.1

      12.7

      3

      Middleware

      2.1

      4.4

      17.0

      4

      Exploitation

      1.5

      4.0

      21.0

      5

      Value-added exploitation

      2.9

      4.9

      25.9

      Integrated over the first 4 components (5 components), the amount required for funding CERN activities is £7M (£10M).

      It will be necessary to apportion the PPARC resources to institutes, investing in the teams that are best qualified, have the best existing resources and the best track records. Resources will need to be focussed carefully in order to obtain maximal return. The Particle Physics Community is conducting the exercise to define these allocations, but results are not ready in time for this submission. Furthermore, the optimal allocations will depend on the size of the overall award, and the precise form of the future programme.

    6. Deliverables against Components

    Table 3 is a compilation of the deliverables from all the WorkGroups (described in Section 5) presented as a function of component. The form of funding is indicated. The deliverables are considered in more detail in Appendix14.

     

    Table 3: Top-level deliverables.

    Component

    WG

    Name

    Funding

    1

    A

    Planning and Management

    PPARC

    UKPP

    1

    A

    Installation and test job submission via scheduler

    PPARC

    UKDG

    1

    A

    Develop JCL/JDL

    PPARC

    UKPP

    1

    B

    Develop Project Plan, Coord. + Mange

    External

    EU

    1

    B

    Schema Repository

    PPARC

    UKDG

    1

    B

    Releases A

    PPARC

    UKDG

    1

    C

    Release for Testbed-1

    External

    EU

    1

    C

    Release for Testbed-1

    PPARC

    UKDG

    1

    C

    Release for Testbed-2

    External

    EU

    1

    C

    Release for Testbed-3

    External

    EU

    1

    C

    Evaluation Report

    External

    EU

    1

    D

    Develop project plan

    PPARC

    UKPP

    1

    D

    COTS systems development B

    PPARC

    UKDG

    1

    D

    Integration of existing fabric

    PPARC

    UKPP

    1

    D

    Fabric benchmarking/evaluation

    PPARC

    UKPP

    1

    D

    User Portals

    PPARC

    UKPP

    1

    D

    Fabric demonstrator(s)

    PPARC

    UKPP

    1

    D

    Evaluation and API Design

    External

    EU

    1

    D

    Prototype API

    External

    EU

    1

    D

    Further Refinement and testing of API

    PPARC

    UKDG

    1

    D

    Definition of Metadata

    External

    EU

    1

    D

    Prototype Metadata

    PPARC

    UKDG

    1

    D

    Metadata refinement and testing

    PPARC

    UKDG

    1

    E

    Gather requirements

    PPARC

    UKPP

    1

    E

    Survey and track technology

    PPARC

    UKPP

    1

    E

    Design, implement and test

    PPARC

    UKPP

    1

    E

    Integrate with other WG/Grids

    PPARC

    UKPP

    1

    E

    Management of WG

    PPARC

    UKPP

    1

    E

    DataGrid Security

    PPARC

    UKPP

    1

    F

    Net-1-A

    PPARC

    UKDG

    1

    F

    Net-1-B

    PPARC

    UKPP

    1

    F

    Net-2-A

    PPARC

    UKDG

    1

    F

    Net-2-B

    PPARC

    UKPP

    1

    F

    Net-4-A

    PPARC

    UKDG

    1

    F

    Net-4-B

    PPARC

    UKPP

    1

    G

    GRID IS

    PPARC

    UKPP

    1

    G

    Network ops

    PPARC

    UKPP

    1

    G

    Tier-1 centre ops

    PPARC

    UKPP

    1

    G

    Management

    External

    EU

    1

    H

    Deployment tools

    PPARC

    UKPP

    1

    H

    Deployment tools

    PPARC

    UKDG

    1

    H

    Globus support

    PPARC

    UKPP

    1

    H

    Globus support

    PPARC

    UKDG

    1

    H

    Testbed team

    PPARC

    UKPP

    1

    H

    Testbed team

    PPARC

    UKDG

    1

    H

    Management

    PPARC

    UKDG

    1

    J

    Begin foundational package A

    PPARC

    UKPP

    1

    J

    Begin foundational package B

    External

    OTHER

    1

    K

    Support prototypes

    PPARC

    CERN

    1

    K

    Extension of Castor for LHC capacity, performance

    PPARC

    CERN

    1

    K

    Support prototypes

    PPARC

    CERN

    1

    K

    Fabric network management, and resilience

    PPARC

    CERN

    1

    K

    Support fabric prototypes

    PPARC

    CERN

    1

    K

    High bandwidth WAN – file transfer/access performance

    PPARC

    CERN

    1

    K

    WAN traffic instrumentation & monitoring

    PPARC

    CERN

    1

    K

    Grid authentication – PKI

    PPARC

    CERN

    1

    K

    Authorisation infrastructure for Grid applications – PMI

    PPARC

    CERN

    1

    K

    Base technology for collaborative tools

    PPARC

    CERN

    1

    K

    Support for grid prototypes

    PPARC

    CERN

    1

    K

    Evaluation of emerging object relational technology

    PPARC

    CERN

    2

    A

    Installation and test job submission via scheduler

    PPARC

    UKPP

    2

    B

    Query Optimsation and Data Mining A

    PPARC

    UKPP

    2

    C

    Technology Evaluation

    PPARC

    UKDG

    2

    C

    Evaluation Report

    PPARC

    UKPP

    2

    D

    Implementation of Production API

    PPARC

    UKDG

    2

    D

    Implementation of production metadata

    External

    EU

    2

    E

    Production phase

    PPARC

    UKPP

    2

    F

    Net-2-C

    PPARC

    UKDG

    2

    F

    Net-2-D

    PPARC

    UKPP

    2

    F

    Net-2-G

    PPARC

    UKPP

    2

    F

    Net-3-A

    PPARC

    UKDG

    2

    G

    Security operations

    PPARC

    UKPP

    2

    G

    GRID IS

    PPARC

    UKPP

    2

    G

    Tier-1 centre ops

    PPARC

    UKPP

    2

    H

    Upper middleware/application support

    PPARC

    UKPP

    2

    H

    Upper middleware/application support

    PPARC

    UKDG

    2

    J

    Focus and engage A

    PPARC

    UKPP

    2

    J

    Focus and engage B

    External

    OTHER

    2

    K

    LAN performance

    PPARC

    CERN

    2

    K

    High bandwidth firewall/defences

    PPARC

    CERN

    3

    A

    Modify SAM

    PPARC

    UKPP

    3

    A

    Further testing and refinement

    PPARC

    UKDG

    3

    A

    Profiling HEP jobs and scheduler optimisation

    PPARC

    UKPP

    3

    A

    Super scheduler development

    PPARC

    UKPP

    3

    B

    Directory Services

    External

    EU

    3

    B

    Distributed SQL Development

    PPARC

    UKDG

    3

    B

    Data Replication

    PPARC

    UKDG

    3

    B

    Query Optimsation and Data Mining B

    PPARC

    UKPP

    3

    B

    Releases B

    PPARC

    UKPP

    3

    B

    Liason

    PPARC

    UKPP

    3

    C

    Architecture & Design

    PPARC

    UKDG

    3

    C

    Technology Evaluation

    PPARC

    UKDG

    3

    C

    Release for Testbed-2

    PPARC

    UKPP

    3

    C

    Release for Testbed-3

    PPARC

    UKPP

    3

    D

    Fabric Management Model

    PPARC

    UKPP

    3

    D

    Establish ICT-industry leader partnerships

    PPARC

    UKPP

    3

    D

    COTS systems development A

    PPARC

    UKPP

    3

    D

    Proprietary systems development

    PPARC

    UKPP

    3

    D

    FM information dissemination

    PPARC

    UKPP

    3

    D

    Evaluation Report

    PPARC

    UKPP

    3

    D

    Tape Exchange evaluation & design

    External

    EU

    3

    D

    Design Refinement

    External

    EU

    3

    D

    Tape Exchange Prototype Version

    PPARC

    UKDG

    3

    D

    Tape Exchange Production Version

    PPARC

    UKDG

    3

    E

    Architecture

    PPARC

    UKPP

    3

    E

    Security development

    PPARC

    UKPP

    3

    E

    DataGrid Security development

    PPARC

    UKPP

    3

    F

    Net-2-E

    PPARC

    UKDG

    3

    F

    Net-2-F

    PPARC

    UKPP

    3

    F

    Net-3-B

    PPARC

    UKDG

    3

    H

    Globus development

    PPARC

    UKDG

    3

    H

    S/w development support

    PPARC

    UKDG

    3

    H

    Upper middleware/application support

    PPARC

    UKDG

    3

    J

    Begin production phase A

    PPARC

    UKPP

    3

    J

    Begin production phase B

    External

    OTHER

    3

    J

    QCDGrid – full Grid access of lattice datasets

    PPARC

    UKPP

    3

    K

    Scalable fabric error and performance monitoring system

    PPARC

    CERN

    3

    K

    Automated, scalable installation system

    PPARC

    CERN

    3

    K

    Automated software maintenance system

    PPARC

    CERN

    3

    K

    Scalable, automated (re-)configuration system

    PPARC

    CERN

    3

    K

    Automated, self-diagnosing and repair system

    PPARC

    CERN

    3

    K

    Implement grid-standard APIs, meta-data formats

    PPARC

    CERN

    3

    K

    Data replication and synchronisation

    PPARC

    CERN

    3

    K

    Performance and monitoring of wide area data transfer

    PPARC

    CERN

    3

    K

    Integration of LAN and Grid-level monitoring

    PPARC

    CERN

    3

    K

    Adaptation of databases to Grid replication and caching

    PPARC

    CERN

    3

    K

    Preparation of training courses, material

    PPARC

    CERN

    3

    K

    Adaptation of application – science A

    PPARC

    CERN

    3

    K

    Adaptation of application – science B

    PPARC

    CERN

    4

    I

    ATLAS

    PPARC

    UKPP

    4

    I

    CMS

    PPARC

    UKPP

    4

    I

    LHCb

    PPARC

    UKPP

    4

    I

    ALICE

    PPARC

    UKPP

    4

    I

    BaBar

    PPARC

    UKPP

    4

    I

    UKDMC

    PPARC

    UKPP

    4

    I

    H1

    PPARC

    UKPP

    4

    I

    ZEUS

    PPARC

    UKPP

    4

    I

    CDF

    PPARC

    UKPP

    4

    I

    D0

    PPARC

    UKPP

    4

    J

    Begin exploitation phase

    External

    OTHER

    4

    J

    Expand exploitation

    External

    OTHER

    4

    K

    Provision of basic physics environment for prototypes

    PPARC

    CERN

    4

    K

    Support of grid testbeds

    PPARC

    CERN

    4

    K

    Adaptation of physics core software to the grid environment

    PPARC

    CERN

    4

    K

    Exploitation of the grid environment by physics applications

    PPARC

    CERN

    4

    K

    Support for testbeds

    PPARC

    CERN

    5

    I

    ATLAS

    PPARC

    UKPP

    5

    I

    CMS

    PPARC

    UKPP

    5

    I

    LHCb

    PPARC

    UKPP

    5

    I

    BaBar

    PPARC

    UKPP

    5

    I

    CDF

    PPARC

    UKPP

    5

    I

    D0

    PPARC

    UKPP

    5

    J

    Value added through Comp. Sci. A

    PPARC

    UKPP

    5

    K

    Lambda switching prototypes

    PPARC

    CERN

    5

    K

    Security monitoring in a Grid environment

    PPARC

    CERN

    5

    K

    Portal prototyping

    PPARC

    CERN

    5

    K

    Integration of & performance issues with mass storage management at different testbed sites

    PPARC

    CERN

    5

    K

    Support of the simulation framework

    PPARC

    CERN

    5

    K

    Development of the simulation framework

    PPARC

    CERN

    5

    K

    Adaptation to and exploitation of grid environment

    PPARC

    CERN

    5

    K

    Development of portal components

    PPARC

    CERN

    5

    K

    Development of the base framework

    PPARC

    CERN

    5

    K

    Middleware packaging for other sciences

    PPARC

    CERN

    5

    K

    Middleware support for other sciences

    PPARC

    CERN

    5

    K

    Bibliographic metadata

    PPARC

    CERN

     

  2. WorkGroup Structures and Remits

In this section we give a brief overview of the proposed WorkGroups and their remits.

The technical work needed to deploy Grid testbed applications in the UK, and to participate in the world-wide development of middleware and services, has been broken down into several work packages. These work packages broadly overlap with those of the EU DataGrid project (see Section 7) but differ in detail, reflecting the strengths and expertise of existing UK groups and collaborations. In all cases, the general remit of each WorkGroup is:

  1. Participate in the relevant R&D, doing so in collaboration with international partners in the EU DataGrid and other US groups.
  2. Bring the necessary knowledge and expertise into the UK Grid community.
  3. Deal with all practical issues of deployment needed for GridPP testbeds.

Full details of the tasks and deliverables of each WorkGroup are provided in Appendix 14.

    1. The WorkGroups

The WorkGroups are:

    1. Remits
      1. A: Workload Management

The Workload Management group is responsible for the software systems that schedule application processing requests amongst resources. At the lowest level, this includes native schedulers running on individual hosts. At the highest level this covers the middleware access API to distributed resources (in Globus terms, this means GRAM services). There are many other distributed processing packages which need to be integrated into the Grid, including SAM, Condor and Java RMI. The issues which this WorkGroup will address are

      1. B: Information Services and Data Management

This WorkGroup is responsible for the software tools required to provide flexible, transparent and reliable access to data, regardless of where and how it is stored. The group also deals with the distributed information services required to satisfy client application requests for CPU, storage and transport. This requires a distributed and coherent resource information publication system which must provide answers to complex questions. A key point is that queries are not global, but depend explicitly upon the requesting user identifier and thus links to authentication and authorisation services are implicit. The key issues which this WorkGroup will address are:

Resource information includes:

      1. C: Monitoring Services

The Monitoring Management WorkGroup deals with all aspects of monitoring of Grid operations. This area involves interaction with almost all other WorkGroups, and covers very different levels of monitoring. Long term static information is needed for Grid planning and evolution, whereas short term dynamic information is used by matchmaking services to make intelligent decisions on how to satisfy a request. Monitoring includes:

      1. D: Fabric Management and Mass Storage

This WorkGroup deals with the integration of heterogeneous resources into the common Grid framework. The Grid will be composed of many interwoven threads including high performance computers (HPC), specialised SMP machines, large clusters of "commodity-off-the-shelf" (COTS) systems assembled from low cost components, dedicated archival systems, smaller clusters and individual PC/workstations that act as "portals" onto the Grid and the heterogeneous world-wide network. A large amount of experience already exists in managing these resources, and it essential to import this into the Grid programme. The key issues of this WorkGroup are:

The group also deals with all issues which concern access to physical storage by Grid applications. There are a myriad of diverse storage systems which must be integrated into the Grid. Each offers different features and native access mechanisms. These must be integrated into the Grid such that access to data or code is uniform from the application layer. This will enable the common higher level software for data management to function independently of the underlying hardware. The principle issues to be addressed are:

      1. E: Security

The Security WorkGroup covers an essential low-level component in the construction and operation of any Grid.

Security mechanisms must be managed at all levels from certification authorities down to local sites. For interoperability between Grids, it is essential that security developments both lead and follow the emerging standards and that the work is done in close collaboration with other Grid projects. All Grid services and middleware components will require security, thus an important part of the remit of this group will be liaison with other WorkGroups. The principle issues are:

      1. F: Networking

The Networking WorkGroup covers all aspects of networking ranging from fabric provision through to integration of network services into the middleware. Prototype planning makes the assumption that the core fabric provision will be via the National Research Network providers (NRNs). In the UK this means SuperJANET4, provided by UKERNA, who will collaborate as associated partners in this project (see Section 8.3). It is also essential to liase with all appropriate administrative authorities to ensure that SuperJANET4, MANs and sites can interoperate to deliver the end-to-end services. All inter-European traffic will be carried via Géant (run by DANTE). In early years the principle data sources will be in the US (SLAC and FNAL) necessitating adequate trans-atlantic links to also be provisioned. Grid network issues are:

      1. G: Prototype Grid
      2. We will implement a UK Grid prototype which will tie together new and existing facilities in the UK, in order to provide a single large-scale distributed computing resource for use by Grid application developers and users. This overlaps completely with the PPARC commitment to the DataGrid under which the UK must be integrated into the phased testbed programme. It is anticipated that by the end of the GridPP project, almost all computing resources available to UK PP will be linked to the Grid Prototype, and that this project will form the starting point for the provision of PP computing in the LHC era, in terms of hardware, software and expertise.

        This activity is covered by a distinct WorkGroup to emphasise the essential nature of the prototype and its objectives of overall integration of components.

      3. H: Software Support
      4. The Software Support WorkGroup will provide services to enable the development, testing and deployment of both middleware and applications at all collaborating institute sites. In addition, the group will take responsibility for the specification and development of interfaces between middleware and applications, and support the Grid Prototype in both small- and large-scale test programmes.

        The rapid take-up of Grid technologies will require the provision of robust installation tools for middleware and application software. The application installation tools are likely to take the form of "kick-start" packages, which will ensure that necessary resources are available for applications to run on appropriate remote facilities without explicit user intervention. Software Support WorkGroup members will provide expertise on the installation and distributed operation of experimental software in conjunction with the experimental collaborations.

      5. I: Experiment Objectives
      6. The Experiment Objectives WorkGroup will be responsible for ensuring that the development of GridPP is driven by the needs of the UK Particle Physics experimental collaborations.

        The LHC experiments have undertaken to meet various goals, through data challenges, with regards the storage, management, simulation, reconstruction and analysis of data. The UK needs to play an integral part in ensuring the success of these data challenges to maintain their prominent role in the development of computing in these collaborations. Experiments will generate large amounts of Monte Carlo simulated data and this calls for substantial investment in the computing infrastructure. The programme also includes high demand experiments already in the data-taking phase. The BaBar, D0 and CDF and MINOS experiments will be crucial to Grid development as they will provide the "live" use cases which will prove successful Grid operation. In addition, both the H1 experiment and UK Dark Matter collaboration (UKDMC) plan to take advantage of the Grid.

        It is only by focusing on the use cases provided by the experiments that true "end-to-end" demonstrations will be realised, which by definition ensure that all issues are addressed, and therefore that the Grid will be successful.

      7. J: Dissemination

This WorkGroup will ensure that there is good dissemination of developments arising from GridPP into other communities, and equally that knowledge is brought into the programme from other projects. It is also essential that the benefits of our activity are recognised by fellow academics, industry and the general public. The group will concern itself with:

This function is essential for both the Grid and e-Science in general.

      1. K: CERN

The CERN WorkGroup will make a major contribution to the R&D programme at CERN that is developing the LHC computing environment in collaboration with LHC experiments, the DataGrid project, future Tier-1 Regional Computing Centres and several national Grid projects. The WorkGroup will provide staff that will work within CERN teams responsible for various aspects of the development and prototyping programme over the next three years. It will also provide a proportion of the investment required for the prototype Tier-0 facility. By taking this leading position, the UK will provide adequate funding for the LHC Computing Project at CERN, during the period when CERN is making major investments in the construction of the accelerator. This ensures a realistic prototyping environment at CERN that can be integrated with the developing LHC Grid infrastructure in the UK.

The WorkGroup includes activities in the following areas:

More information can be found in Section 8 and Appendix 16.

  1. Resource Analysis for GridPP
    1. Introduction
    2. Section 3 introduces the overall model for the GridPP. Section 4 describes the programme, splitting it into interlocking components and showing the financial breakdown in Section 4.3. The WorkGroups are described in detail in Section 5, and define the overall GridPP programme. Throughout our planning, we have assumed funding both from PPARC and from external sources as shown in Table 4. The PPARC investment will be used as a lever to gain money from other sources.

      Table 4: Complete description of all funding for GridPP.

      Component

      Description

      Cost to PPARC

      (£M)

      Total PPARC

      (£M)

      External Funds

      (£M)

      1

      Foundation

      8.5

      8.5

      2.5

      2

      Production

      4.1

      12.7

      3.6

      3

      Middleware

      4.4

      17.0

      0.8

      4

      Exploitation

      4.0

      21.0

      3.5

      5

      Value-added Exploitation

      4.9

      25.9

      1.3

      The following sub-sections present a more complete analysis of resource usage within the GridPP programme.

      The overall allocation of resources to deliver the programme defined in Components 1 to 4 is shown in Table 5.

      Table 5: Allocation of resources for Components 1 to 4.

       

      PPARC Cost over 3 years

      (£M)

      Hardware + software

      2.6

      UK staff

      10.7

      CERN

      7.1

      Operational Costs

      0.4

      Miscellaneous

      0.2

      Total

      21.0

      Staff working on the DataGrid project are funded for 3 years, whilst the rest are funded for 2.5 years. The assumed cost is £50k per staff year in the UK UK, plus an additional £4k per year to cover travel and other overheads, and 170kSF per staff year for those funded through CERN. A total of 3 staff are employed to manage the overall project (see Section 10) and these appear under the UK staff total allocation.

    3. Resources Requested
    4. The resources requested covering components 1-4 are given in Table 6.

      Table 6: Resources requested for components 1-4.

      Components 1-4

      PPARC Total

      (£M)

      External Funds

      (£M)

      UK Staff

      10.7

      5.9

      UK Capital

      3.2

      4.5

      CERN Staff

      5.7

      0.0

      CERN Capital

      1.4

      0.0

      Total Costs

      21.0

      10.3

       

      The resources requested, including all five GridPP components, are given in Table 7.

      Table 7: Resources requested for components 1-5.

      Components 1-5

      PPARC Total

      (£M)

      External Funds

      (£M)

      UK Staff

      12.7

      7.2

      UK Capital

      3.2

      4.5

      CERN Staff

      8.6

      0.0

      CERN Capital

      1.4

      0.0

      Total Costs

      25.9

      11.6

       

       

       

    5. Resource Requested for UK Tier Centres
    6. The resources requested for the Tier centres are given in Table 8 (integrating over components 1 to 4). The "Tier Centres" constitute a Tier-1 based at the Rutherford Appleton Laboratory, together with a number of Tier-2’s which will be based on clusters around those centres which have already won resources from external funding agencies and displayed a keen interest in developing the GridPP infrastructure.

      Table 8: Resource requested for UK Tier centres.

      Tier Centres 1-4

      2001/2

      PPARC

      2002/3

      PPARC

      2003/4

      PPARC

      PPARC

      Total

      External

      Funds

      Staff (£M)

      0.3

      0.8

      1.4

      2.5

      2.8

      UK Capital (£M)

      1.1

      0.9

      1.0

      3.0

      4.5

      Total costs (£M)

      1.4

      1.7

      2.4

      5.4

      7.3

      Total disk (TB)

      28

      34

      53

      115

       

      Total tape (TB)

      120

      280

      750

      1150

       

      Total CPU (SI95)

      16000

      12000

      12000

      40000

       

    7. Resource Requested for UK WorkGroups
    8. The resources requested for the WorkGroups are given in Table 9 (integrating over components 1-4). These are required in order that the WorkGroups may discharge the deliverables given in Section 4.4. The DataGrid resources are shown separately and form a subset of the total resources presented. They are predominantly staff costs, and for the purposes here, the infrastructure required by the DataGrid project is assumed to be funded through the main programme. The external DataGrid funding reflects the contribution from the EU.

      Table 9: Resource requested for UK WorkGroups.

      Work Groups

      2001/2

      PPARC

      (£M)

      2002/3

      PPARC

      (£M)

      2003/4

      PPARC

      (£M)

      PPARC Total

       

      (£M)

      External

      Funds

      (£M)

      Project Managers

      0.1

      0.2

      0.2

      0.4

      0.0

      Staff

      1.2

      3.4

      3.2

      7.8

      3.1

      UK Capital

      0.0

      0.1

      0.1

      0.2

      0.0

      Total Costs

      1.4

      3.6

      3.5

      8.5

      3.1

      DataGrid part of Total

      0.8

      0.8

      0.8

      2.4

      0.8

       

      The allocation of effort to sets of WorkGroups (integrating over components 1-4) is shown in Table 10.

      Table 10: Allocation of effort to sets of WorkGroups.

      Work Groups

      2001/2

      PPARC FTE

      2002/3

      PPARC FTE

      2003/4

      PPARC FTE

      PPARC

      Total FTE

      External

      FTE

      A-F

      12.1

      24.2

      27.1

      63.5

      12.0

      G-H

      9.9

      27.3

      33.8

      71.0

      3.0

      I

      4.5

      21.2

      20.5

      46.2

      0.0

      J

      2.0

      5.0

      3.0

      10.0

      41.5

      K

      12.0

      39.5

      33.5

      85.0

      0.0

    9. Conclusions

    The programme defined by the deliverables shown against components 1 to 4 (1 to 5) in Section 4.4 can be discharged with PPARC funding of £21.0M (£25.9M) matched with external funding of £10.3M (£11.6M).

  2. DataGrid Project Overview
  3. The EU DataGrid project will develop, implement and exploit a large-scale data and CPU-oriented computational Grid. This will allow distributed data and CPU intensive scientific computing models, drawn from three scientific disciplines, to be demonstrated on a geographically distributed testbed. The project will develop middleware software, in collaboration with some of the leading centres of competence in Grid technology, bringing in practice and experience from previous and current Grid initiatives in Europe and elsewhere. The project both complements and helps to co-ordinate at a European level, several on-going national Grid projects. The project will centre around an international testbed based on advanced research networking infrastructure provided by another EU Research Network initiative. The three scientific disciplines (Particle Physics, Bioinformatics and Earth Sciences) will each exploit fully the project developments through the testbed and elsewhere. The four LHC experiments form the major component in exploitation of testbed prototypes. Strong relationships are being built with the GriPhyN and PPDG projects in the US and with the Globus and Condor Grid technology teams. The project extends the state of the art in international, large-scale data-intensive Grid computing, providing a solid base of knowledge and experience for exploitation by European industry.

    The UK Particle Physics Community is fully involved in the programme of the DataGrid project under the contract with the EU signed by PPARC, as one of 6 principal partners, at the end of 2000. The UK technical coverage in the project is quite broad, though the level of UK commitment is not uniform across all areas. The UK has leadership of two of the five middleware workpackages and has critical-path responsibilities in the areas of security and information services.

    The UK’s commitment is to provide 14.8 FTE over three years commencing January 2001 distributed between middleware development and application pilot work, as discussed in appendix 16. Through the contract the UK benefits from funding for 4.3 FTE over the project duration and as a Principal Contractor has strong associations with IBM-UK and the SZTAKI institute in Hungary (both of whom are Assistant Contractors to PPARC).

    1. Description of the DataGrid Workpackages
    2. Each of the workpackages (WP) starts with a user requirement-gathering phase and is followed by an initial development phase before delivering early prototypes to the testbed workpackage. Delivery to the testbed of increasing levels of functionality from the development workpackage deliverables are foreseen at three stages of the project (being the September of each of 2001/2/3). The application workpackages each year take up this increased functionality, providing feedback to the development cycles as well as full exploitation through data challenges etc. The relationship between the workpackages is shown in a simplified form in Figure 5.

      Figure 5: DataGrid workpackages.

      1. Relationship of GridPP WorkGroups to DataGrid Workpackages

    All the UK commitments to the DataGrid technical workpackages and the dissemination area are absorbed within the UK WorkGroups as part of their programme. For each of the DataGrid workpackages there is a person in the UK who undertakes liaison and UK co-ordination. Table 11 shows these relationships.

    Table 11: Relationship between GridPP WorkGroups and DataGrid Workpackages.

    UK GridPP WorkGroup

     

    EU DataGrid Workpackage

     

    A

    Workload Management

    1

    Grid Workload Management

    B

    Information Services and Data Management

    2

    Grid Data Management

    C

    Monitoring Services

    3

    Grid Monitoring Services

    D

    Fabric Management and Mass Storage

    4

    Fabric Management

    D

    Fabric Management and Mass Storage

    5

    Mass Storage Management

    E

    Security Development

    6

    Integration Testbed

    F

    Networking Development

    7

    Network Services

    G

    Prototype Grid

    6

    Integration Testbed

    H

    Software Support

    8

    High Energy Physics Applications

    H

    Software Support

    6

    Integration Testbed

    I

    Experiment Objectives

    8

    High Energy Physics Applications

    J

    Dissemination

    11

    Information Dissemination and Exploitation

  4. Collaborations
    1. Dissemination

It is essential for both the Grid in general and e-Science in particular that there is good dissemination of developments in and requirements imposed on the Grid technologies out of the GridPP work, and equally into the programme from other projects. We believe that the dissemination in both directions is best served through the involvement of the code-developers within the project, in collaboration with Computer Scientists and developers. It is also essential that the benefits of our activity are recognised by fellow academics, industry and the general public. For these reasons, the dissemination WorkGroup concerns itself with:

The dissemination into the UKQCD community is a particular special case. The ties with the Particle Physics programme are here so strong that we include their needs, plans and requirements as part of this proposal. Contacts with the Astronomy and other PPARC funded communities are also considered vital, but regarded as external contacts by the working group.

Collaborations with the GenGrid and other Computing Science groupings within the Tier-2 areas fall under the remit of the Dissemination WorkGroup, as the larger part of such collaborations are not specific to the Particle Physics or PPARC programme, and will be answerable to other Research Councils and subject areas. Various areas of common work with Computer Scientists are already being explored in the areas of Grid information, resource discovery and networking.

Contacts with the NSF and other US collaborations also fall under the remit of the WorkGroup, with the caveat that the technical details fall into the appropriate technical working group. For example, there are growing links with the GriPhyN project, and we are investigating possible involvement in the Virtual DataGrid Laboratory and the National Virtual Laboratory activities, and these are discussed in the dissemination context. On the other hand, the clear mutual US/UK interest in dedicated links to the US with flexible routing for testing purposes falls within the Networking WorkGroup remit. Thus, some umbrella resources may be ascribed to the dissemination activity, while the technical resources, for example Networking, would be ascribed to that WorkGroup. The Dissemination WorkGroup is also charged with identifying dissemination aspects of UK involvement in the US experimental programme.

In parallel with the general component model of the proposed e-Science programme, the dissemination activities can be decomposed into components:

Foundation

The development of a standard presentational toolkit in order to raise awareness of the Grid activities in academia and beyond.

Focussing and Engagement

The engagement with other scientists via the identification of a small number of key applications outside of PP into which the transfer of GridPP technologies will be beneficial.

Production

The broadening of the programme to engage with many areas of academic and scientific activity. From this component and beyond, Computer Scientists are close collaborators in the activity and will participate both in the generalisation of the technologies and in the raw middleware development; the links with Computer Scientists in the Tier-2 consortia and GenGrid have particular significance in this regard.

Exploitation

The engagement with the commercial sector to package the technologies for commercial use.

Value added

The close engagement of Computer Scientists with the programme, largely through funding from external sources.

It is proposed that GridPP dissemination is organised along the same structure as our Grid activities, namely developing regional centres of excellence in Grid technologies in each Tier-2 region. The foundation activity will be delegated to a single Tier. In the focussing and engagement phase, the activity would broaden to two Tiers. In the production phase, the activity would broaden to involve all Tiers. This organisation will exploit the close contacts with other Grid activities already established, and also the industrial contacts already made either explicitly for Grid work or through, for example, JREI projects. The Particle Physics Community is already involved with multi-institution, multi-disciplinary Grid consortia in several regions, and others are in the process of forming. There are, in addition, e-Science consortia within institutions in which the Particle Physics groups play an important role. Finally, the role of CERN with its pro-active Technology Transfer program and commitment and record of academic dissemination is naturally includeded as a partner in this proposal.

The close interrelation with other projects and disciplines inherent in our EU DataGrid activities should not be overlooked. These themselves involve significant academic and industrial partnerships. WG11 in the EU DataGrid will be exploited to pro-actively disseminate out emergent Grid technologies, and their support will be sought in our national Grid dissemination as part of their agreed DataGrid program.

The management of the overall dissemination activity will be directed by the Dissemination Board, which will involve industrial and commercial partners. Some funding would be expected from PPARC, but we note that this generic activity is a good candidate for funding from other sources such as those administered by EPSRC, and there is a strong case for PPARC to bid for them on behalf of the community. Other sources of funding will be approached at regional levels, such as the EU and the regional Development agencies.

The deliverables for the Dissemination WorkGroup are shown in Table 12.

Table 12: Dissemination deliverables.

Project month

Project time to deliver

Deliverables

0

3

Establish Dissemination Board with appropriate membership.

Begin Foundation package.

0

6

Establish Tier-2 dissemination consortia, focus and engage.

6

3

Establish regional work plans for production phase.

9

3

Initial appointment/allocation of new staff for production phase.

Value-added Component begins.

21

12

First annual review of dissemination, begin exploitation phase.

QCDGrid efficient data-sharing on existing farms.

33

12

Second annual review of dissemination activity.

Expand exploitation QCDGrid – full Grid access of lattice datasets

The broad areas of contact already in place are summarised in Table 13.

Table 13: Areas of contact between Particle Physics groups and other disciplines (Physics, Chemistry, Engineering, Biomedical Science, Environmental/Earth Science, Social Science, Statistics, Computing Science, Publicity).

Group

Physics

Chem.

Engin.

Biomed.

Envir.
+ Earth

Soc. Sci

Statistics

Comp.
Sci.

Publicity

Birmingham

X

X

 

X

     

X

 

Bristol

     

X

     

X

 

Brunel

X

   

X

     

X

 

Cambridge

X

 

X

X

X

   

X

 

ScotGrid

X

X

X

X

     

X

X

IC

X

   

X

     

X

 

Lancaster

X

   

X

X

X

X

X

 

Liverpool

X

   

X

X

X

     

Manchester

               

X

Oxford

X

     

X

   

X

 

Sheffield

X

               

QM London

X

X

 

X

         

UCL

               

X

CLRC

X

X

 

X

X

X

 

X

X

CERN

               

X

UK-QCD

                 

    1. Collaboration with Astronomers
    2. The requirements of the Astronomy and Particle Physics Communities complement each other in terms of their proposals for Grid development. The Particle Physicists require the Grid to provide a primary analysis tool, harnessing massive distributed processing power and data storage capabilities. In contrast, the astronomers are looking to build "virtual observatories", in the first instance mainly through the federation of existing databases to allow rapid interrogation of archived data from different projects. Example applications can be to search for multi-waveband images of the same astronomical object (either steady state or transient) or combining data from different missions/observatories working within the same waveband but observing different phenomena. The challenges faced by AstroGrid spring from the richness of the science opportunities offered through this combined-data approach linked with the development of appropriate data-mining tools.

      The organisers of AstroGrid are aiming at a two-stage programme with a request for initial funding for a major consultative process to identify requirements plus investigation of available tools etc. linked with building experience through application to a limited set of programmes. Four areas of database federation are identified: solar, optical-IR, X-ray and solar-terrestrial physics. The European, "Astrophysics Virtual Observatory" is the subject of an EU Framework-V proposal, with ESA, ESO, CDS and Terapix as partners along with AstroGrid and Jodrell Bank in the UK. Another consortium, planning the European Grid of Solar Observations, is also planning to prepare a bid for Framework-V funding. The next steps towards the final goal of establishing a world-wide Virtual Observatory with strong UK participation will then build on these projects. Just as with the LHC, the time-scale for achieving fully functional virtual observatories with all the appropriate protocols, data security, data and meta-data-mining tools etc. lies beyond three years but significant milestones can be set to monitor progress towards these goals during this period.

      Although the requirements are complementary to those of Particle Physics, collaborative projects in specific areas are being proposed. There is close linkage in several Universities and regions (for example in Edinburgh and Liverpool) where common tools or matching fabric provide the focus for additional bids. It is probable these will continue to be the main areas of collaboration, with the AstroGrid proposers being keen to work directly with the UK Particle Physics Tier-1 Grid node and with their Particle Physics colleagues in the Universities. At a more formal level, the two disciplines share a common funding source through PPARC and will enjoy the benefit of a common oversight mechanism designed to ensure proper co-ordination of the activities in each area. It is to be hoped that through appropriate communication via cross-members of committees, for example, the PPARC programme will be able to minimise duplication of effort in the two communities.

    3. Collaboration with Computer Scientists
    4. Several collaborations already exist with the Computer Scientists in the UK. We plan to expand these links along the lines of the Tier-2 centres for organisational purposes, making the full use of the existing institutional ties and the strengths in the various institutions.

      In the North West, an embryonic structure already exists in the region that incorporates PP, Computer Science and other subjects, under the North West Consortium for e-Science. This involves Daresbury, Lancaster, Liverpool and Manchester, and discussions have already taken place about mutually supportive work concerning job scheduling /resource discovery, reflective/adaptive middleware and networking (especially quality of service and monitoring). The expertise of the Daresbury team in COTS technologies will be integral with this work.

      In Scotland (and Durham), the ScotGrid consortium already involves Computer Scientists in Glasgow and Edinburgh. It also hosts many of the active participants in the GenGrid initiative, with whom there are close working ties. There has been dialogue between the GenGrid organisers and the GridPP over possible mutual areas of interest, which include resource discovery and parallel database management. There is also an overlap with a more generic activity in networking being undertaken largely in the PP community.

      The Midlands area includes RAL, which is to host a centre for e-Science. This will also include the UK Globus centre, providing a direct linkage and support from the developers of the Globus toolkit. RAL is clearly a centre for computing expertise that will service the GridPP activity, with many direct links to the Computer Science community, with whom they will collaborate whenever advantageous to the project.

      Finally, in the London area both Cambridge, Imperial College and UCL have collaborative efforts with their Computer Science communities (the High Performance Computing Facility, High Throughput Computing Groups and Computer Science Departments respectively). Joint applications for funding and studentships are being made, and shared use of facilities has already occurred.

    5. Collaboration with Industry
    6. The PP community intends to build upon its existing links with industry and its many joint projects. Industrial partners are already working with PP groups in JREI joint projects, in particular in the provision of compute-engines and mass-storage facilities such as the D0 farm. These JREI facilities are part of the core infrastructure for the national GridPP, and the industrial partners have joined with us precisely to gain from our experiences.

      Other more specific partnerships are already underway, such as that between the Liverpool PP group and SearchEngine.com in developing new search technologies to convert the Internet into a genuine linguistic database; they have also joined with the Airbus Consortium and National Power in a joint project. Another partnership is with IBM, who announced a $1billion investment in Linux and related technologies; part of this money is funding two IBM researchers who are working with RAL staff on Grid technologies. UCL, Manchester and the CLRC are working with CISCO systems in the area of network traffic engineering for provision of QoS. In addition, both Sun and SGI are working within the existing consortia. Finally, CERN as a partner with GridPP works collaboratively with many industrial organisations, both in software and hardware. An example of the latter is the recent agreement with Elonex Ltd in the UK to provide PC and disk servers, and with whom discussions about additional partnerships are ongoing.

      The PP community will take the best advantage of the opportunities of industrial liaison to enhance GridPP, and in turn will pro-actively seek to aid the transfer of technologies, rather than being a passive application driver.

    7. Collaboration with UKERNA
    8. In the UK, UKERNA provide and manage the SuperJANET4 academic and research network. UKERNA is already highly involved with e-science activities and have made clear their wish to support Grid operations upon the SJ4 backbone. In the wider sense UKERNA have identified the need to develop the advanced services which will be demanded for world-wide distributed computing applications and fully support the opportunity to develop these services in collaboration with the GridPP project. UKERNA will therefore collaborate as associated partners in all networking matters which will potentially benefit all Grid applications.

      As part of the SuperJANET4 network, a 2.5 Gbit/s national testbed infrastructure is available. The testbed is isolated from the production network and allows layer 2/3 network level experiments to be carried out that would otherwise compromise a production environment. This is a major resource which will be available (on a shared basis) for development projects and high performance application demonstrations.

      In a separate linked proposal, a collaborative project has been agreed with UKERNA and other partners to develop the core traffic management services which are required to deploy managed bandwidth and other QoS services. This proposal will be targeted at support lines for generic Grid development and industrial collaboration.

      The collaboration of UKERNA in this project will significantly aid the attainment of objectives and will ensure that the results from this project will be deployed rapidly into a production environment as appropriate, benefiting all future Grid applications.

    9. Collaboration with US Groups: GriPhyN and PPDG
    10. The GridPP project will collaborate with leading Grid projects in the USA.

      1. GriPhyN
      2. The GriPhyN project focuses on "Petascale" datagrid applications and is funded by the NSF. We have begun discussions with GriPhyN with a view to collaborating in several areas. We have identified a common interest in advanced network applications, including QoS and high performance data replication across trans-atlantic links, and work is underway to define a programme. We will also collaborate in the development of middleware tools and have identified distributed databases as one possible area. In the wider context, the involvement of GriPhyN in the iVDGL and its associations with the EU DataGrid will be very beneficial.

      3. PPDG

The PPDG focuses more specifically on Particle Physics applications and is funded by the DoE. The existing involvement which the UK has with BaBar, D0 and the LHC experiments makes collaboration with the PPDG both natural and essential. UK and US groups are already working together on the world-wide infrastructure required to handle the data which will flow from these experiments in 2001 and beyond, and the UK will be the site of a BaBar Tier-A centre. UK and US groups are also already working together on developments associated with data management (SAM) and Condor which has direct use for the D0 experiment at FNAL. We already have strong historical contacts with the networking sector of the PPDG and collaborate in wide area monitoring and an Internet-2 project. This will be extended to include QoS and high performance replication studies.

Both of these groups have provided letters of support which can be found in Appendix 19.

  1. CERN
  2. CERN has identified a significant short-fall in the resources it has available to realise its part of the computing needed for the LHC. By proposing to commit a significant fraction of our request actually at CERN, we are recognising that development of GridPP requires integration and co-ordination with the corresponding development at CERN. Furthermore, it may be expected that such a significant commitment by the UK will encourage others to make similar contributions. It is not our intention, nor can we or should we, meet the entire CERN short-fall. However, it is our intention to make the best use of the complementary strengths and expertise at CERN and in the UK.

    Further, although the provision of staff effort is the highest priority contribution to CERN, it is important that significant investment is made in prototype hardware systems with which to test the Grid developments in a realistic environment. The recent LHC Computing Review [1] stated that the construction of a realistic LHC Tier-0 prototype at CERN was an essential component of any development plan. Without a realistic Tier-0 development at CERN, it will be impossible to work effectively towards an integrated LHC Grid infrastructure in the UK. We do not propose to fund the prototype hardware investments at the same level as the staff effort, reflecting the relative importance of the latter in the development of Grid computing; this will also afford many opportunities for scientific leadership in the project by members of the UK community. Rather we propose funding at a level which better reflects the UK participation in the major LHC experiments and which will facilitate the required UK/CERN Grid developments.

    1. GridPP and the CERN LHC Computing Project

The CERN WorkGroup makes a major contribution in terms of staff and materials to the R&D required at CERN in the context of the LHC Computing Project. The LHC Computing Model, proposed by the collaborations and strongly supported by the Steering Group of the LHC Computing Review in their recently published report [1] is a new approach to Particle Physics data analysis - a world-wide distributed computing environment closely integrating the facility at CERN with large regional and national computing centres, interconnected by very fast networks and organised using Grid technology. The report also emphasises the importance of constructing a testbed, or prototype system, involving CERN and several potential regional computing centres, evolving to reach by 2004 a scale of about 50% of the final size of the facility required by one of the large experiments. A large e-Science R&D programme is required at CERN, in Particle Physics institutes and in national computing centres to

A document outlining the programme of work and resources required has been prepared by CERN [2].

CERN has serious difficulty in funding this work during the next few years which correspond to the construction phase of the LHC accelerator and detectors. The shortfall in the funding for both personnel and materials has been described in a paper presented to the CERN Council on 15 March 2001 [3].

The CERN WorkGroup involves the UK Particle Physics Community collaborating in the core activities of the LHC Computing Project at CERN by contributing personnel to work within the teams responsible for the development of the LHC computing infrastructure and by partial funding for the prototype system. The developments at CERN will be closely co-ordinated with work being undertaken by other WorkGroups in the UK and, more generally, through the EU DataGrid project and the LHC physics collaborations. The joint development of the required software and services and the deployment of significant testbeds at CERN and in the UK, combined with facilities in other countries, will enable the project to fulfil its key goal of harnessing internationally distributed computing resources to provide a computing environment that meets the challenges of LHC.

At CERN, the e-Science programme for LHC computing will be developed and deployed as a primary activity of the teams responsible for the long-term planning, development and operation of the computing services, mainly in the Information Technology (IT) Division but also in groups attached to experiments. The personnel funded from UK e-Science sources will work at CERN as members of these teams, taking part in the full range of their team’s work. This is very important to

This approach also ensures that young people taking part in the programme will obtain a broad training in at least one area of information technology, preparing them well for returning to national institutes to take a leading role in the deployment of the LHC computing infrastructure, to join organisations supporting e-Science activities in other disciplines or to UK industry, helping to transfer the new technology to other areas. This training aspect is a major component of the practical output of the project, of direct and tangible value to UK science and industry as well as the GridPP project. It is also important that a number of more senior people from the UK take up technical management positions at CERN during the project, ensuring an adequate level of UK oversight and leadership and helping facilitate the co-ordination with the rest of the GridPP project. Also this will ensure that the management of the UK LHC facilities, when they begin operation in 2005-6, will include staff with close contacts with and experience of the CERN centre. This will be a significant factor in ensuring the effective and efficient operation of the distributed data analysis environment.

We believe that the UK commitment outlined above would make a very significant contribution to meeting CERN’s short-fall and would show a strong lead which will encourage others to make similar offers.

    1. Development Work
    2. The CERN activities are described in detail in Section 16 in terms of the responsibilities of the different teams involved. The majority of the personnel will be CERN staff members, but each of these teams has the requirement and opportunity of new staff which can funded by the project. In order to complete the target staffing level CERN is exploring additional funding possibilities through similar agreements with other member states. Note that the total target strength of these teams, 100 FTEs, represents rather less than 50% of the total target for IT Division in 2002, the remaining staff being involved in the provision of other service, including infrastructure and engineering support. Since the new posts are integrated in the on-going work of the division, candidates with a wide range of backgrounds and experience are required, including, as explained above, people with management experience. With the proposed staffing level the project will fund a major part of the overall LHC computing project at CERN during the three year period, in addition to making possible a significant Grid technology transfer project.

      The major deliverables of the overall CERN project are the three prototypes or "testbeds" referred to in Table 1, providing increasingly larger and more complex environments for progressively more demanding applications. These testbeds are synchronised closely with the corresponding testbed releases of the EU DataGrid project, which forms an essential component of the overall LHC computing strategy. In order to account more directly for the UK contribution and ensure the mutual requirements of the CERN and UK components of the GridPP project are met, specific deliverables will be defined for the teams with UK funded staff. The present proposal includes examples of these deliverables, but the final deliverables should be agreed as the UK staff are identified, taking account of their experience, the interests of the university concerned, the places available within the project and the evolution of the GridPP project as a whole. There is no question that finding appropriate candidates with a suitable IT background will be difficult and so it is important to maintain full flexibility for mapping candidates to available positions.

      CERN is participating with other European scientific institutes and companies in the EU DataGrid project. CERN is responsible for overall project management, and leads the Fabric Management (WP4) and Data Management (WP2) workpackages, in which there is significant UK participation. CERN also participates in other workpackages, in particular Mass Storage Management (WP5, which is led from the UK), the Testbed (WP6) and Networking (WP7). As explained above, these activities are mainly carried out within the appropriate support teams and, in all cases, in association with UK participants.

    3. Hardware

Although the provision of staff effort is the highest priority contribution to CERN, it is important that significant investment is made in prototype hardware systems with which to test the Grid developments in a realistic environment. The recent report of the LHC Computing Review stated that the construction of a realistic LHC Tier-0 prototype at CERN is an essential component of any development plan. Without a realistic Tier-0 development at CERN, it will be impossible to work effectively towards an integrated LHC Grid infrastructure in the UK. The funding level proposed reflects the UK participation in the major LHC experiments and will allow the required UK/CERN Grid developments, namely about 16% of the total cost of the CERN testbed investments during the three-year period.

If further investment in prototyping efforts at CERN was not forthcoming, the UK contribution could be increased; however, this must be lower priority than the developments of the UK Grid itself as it would be unrealistic to sacrifice UK infrastructure, development effort and the current program (BaBar, CDF, D0 etc.) for providing computing systems for a large fraction of the CERN experimental program.

The prototype computing system for LHC will be developed over the years 2001-04 as a single facility shared by the four LHC experiments. The system will evolve in capacity during this period to reach a size at the beginning of 2004 that is about 50% of the size of the final facility required in 2007 for one of the LHC experiments, in terms of numbers of components. This facility will be used for many different purposes including:

The last point is an essential part of the strategy for ensuring that the overall system is capable of providing production quality services – consistent quality of service over sustained periods.

Finally, CERN is committed to dissemination and technology transfer, and an element of these activities is woven into the programme. The dissemination activity will be co-ordinated with that in the UK, and CERN will support the development of generic technologies.

The evolution of the capacity of the prototype system, corresponding to the recommendations of the LHC Computing Review, is outlined in Table 14. The funding requested from the project covers investments only for processors and disk storage.

Table 14: Evolution of CERN prototype.

    1. Oversight and Accountability

The present proposal is for major funding for a large project for which funding is also provided by CERN and other member states, and which has to provide services for the LHC collaborations and the institutes that will supply computing facilities for LHC in regional centres. Without the UK funding CERN would not be able to complete the project. Oversight of the resources provided by the United Kingdom, and an appropriate level of visibility of the results of the project, will be assured by a CERN-UK Computing Review Board, reporting to the appropriate PPARC and CERN committees. The chair, membership and mandate of the board will be agreed by PPARC and CERN prior to the start of the project.

  1. Programme Management
  2. The delivery of the Particle Physics Grid (GridPP) will require the co-ordination of many people in many institutes and across disciplines other than Particle Physics. It is essential that a robust and clear management structure be established that has clear lines of responsibility while allowing the maximum flexibility.

    This is best achieved by a small executive Project Management Board (PMB) chaired by the Project Leader. To be effective, membership of this board must be kept small; hence it is impracticable to have representatives from all experiments, working groups etc.. These needs are satisfied by other bodies, the Experiments Board (EB), the Technical Board (TB) and the Dissemination Board (DB) which report to the PMB. Interaction with the collaborating institutes is through the Collaboration Board (CB). A new peer review body will be established to advise PPARC on the allocation of funds for Grid positions. Figure 6 shows an organogram of this structure, while Figure 7 shows the flow of information between the various bodies.

    1. The Project Management Board (PMB)
    2. The PMB is the central part of the management structure. It must be small and coherent, while encompassing the major elements of the Grid project. It is chaired by the Project Leader, who is responsible to the PPARC e-Science Director and to the Collaboration Board for the delivery of a timely, functional and coherent Grid structure as defined in this proposal. The Project Leader is the Chief Executive of GridPP. He/she is the person ultimately responsible for its success or failure. Once the basic proposal has been approved by PPARC, he/she must establish the full details of the project, time-scales, milestones etc. He/she must then put into place the necessary technical and managerial resources to ensure success. Monitoring and reporting procedures have to be established in order to identify quickly any part of the programme which is failing and, should this happen, the Project Leader must take corrective action to rectify the situation. This is clearly a full-time role and is the key position within the entire project.

    3. Collaboration Board (CB)
    4. The Collaboration Board is the "governing body" of the Project. It consists of the Group Leaders of each of the collaborating Particle Physics institutions. Its purpose is to exercise a general supervision over all areas of the project, to react to problems with available resources, as well as to ensure that the interests and priorities of the wider Particle Physics Community are fully taken into account in the construction of the Grid.

    5. The Technical Board (TB)
    6. The Technical Board is the body at which technical developments and problems wider than a single WorkGroup or Tier Centre are discussed and solved. It is the main working forum for the Grid project, in which all aspects of the project meet and interact.

    7. The Experiments Board (EB)
    8. The Experiments Board ensures that the requirements for the major Particle Physics experiments that will use the Grid are properly taken into account in the strategy of the PMB. It serves as a forum in which the experiments discuss common approaches and problems and should catalyse harmonisation and changes in the strategies of the experiments in order to ensure a coherent Grid environment can be produced for the benefit of all.

    9. The Dissemination Board (DB)
    10. The Dissemination Board ensures that the developments and strategy of the GridPP are disseminated as widely and rapidly as possible to interested parties outside the project. It also ensures that there is good communication between cognate bodies in other research councils. It ensures that there is good communication with industrial partners and that the needs and wishes of industry are properly taken into account.

    11. Peer Review Selection Committee (PRSC)
    12. The Peer Review Selection Committee is an expert body to consider requests for funding from the collaborating institutes of the GridPP project and make recommendations to PPARC.

    13. "Buying-out" of Positions in Grid Management
    14. Finally in this section, those positions within the structure defined above which need to be fully or partially funded by PPARC in order to ensure the success of the project are outlined.

      1. The Project Leader and Deputy Project Leader
      2. Both of these positions should be full-time. They are the crucial executive officers of the project; it is obvious that a project of this size can be run only by high-quality people. The size and the challenging time-scale of the project is such that more than one dedicated person is required at the top of the structure. A half-time deputy will allow the Project Leader to travel on essential business in the knowledge that the management of the project will continue to be closely monitored and controlled. Thus, the Project Leader and Deputy Project Leader should be funded by PPARC at the 100% and 50% levels respectively.

      3. Chair of the Technical Board
      4. The flow of information through the TB will be immense and it will be a great challenge to ensure that it is properly controlled and disseminated. The necessary size of the TB and its rather disparate nature will require high-level management and full-time attention. It is therefore essential that the Chair of the TB be fully funded by PPARC

      5. Chair of the Collaboration Board

      The Chair of the Collaboration Board has major "diplomatic" and advisory functions in the project. The fact that he/she is also Chair of the DB adds a very time-consuming and important task. It is vital that the successes and promise of the Grid approach be as widely disseminated as possible and the structure specified above foresees a major role for the CB chair in this endeavour. It is therefore proposed that he/she be 50% bought out by PPARC.

      These proposals would mean that 3 FTEs would be associated with the management of the project. For a project of this size and scope, this seems very modest. Nevertheless, in the interests of freeing up as many resources as possible for those who will actually produce the Grid rather than manage it, and assuming a very strong involvement and dedication from all the members of the management team specified above, we believe that this structure will produce a successful Grid for Particle Physics.

      Figure 6: Outline of the GridPP Project Management Structure. Membership of each board is shown within the appropriate box. Reporting lines are shown as arrows; members from subordinate boards sitting on the PMB are indicated over the arrows.

    15. DataGrid

    The DataGrid project forms a significant part of GridPP and its UK specific components are encompassed completely by it. The UK has specific major overall responsibilities for workpackages on mass storage and monitoring services and has "critical path" responsibilities for security and information services. In order to handle this effectively, the UK obligations to DataGrid must be integrated fully into the UK management structure. This is reflected in the Terms of Reference of the TB and PMB (see Appendix 17).

    Figure 7: Information flow and links between PPARC bodies (in blue) and others (in white). Those bodies exchanging management information are surrounded by the blue hatching, those exchanging technical information by the yellow hatching.

     

  3. Conclusions
  4. In this proposal, we have described a programme of work which will deliver a prototype Grid to the UK Particle Physics Community.

    Grid technology will fundamentally change the way that e-science is undertaken by integrating distributed computing resources throughout the world into single entity. The ultimate goals of Grid technology are to provide applications with transparent access to resources regardless of location, and at the same time manage these distributed resources in a coherent and efficient way. These goals will lead to a revolution in informatics. This proposal embodies a programme which will ensure that the UK stays at the forefront of this revolution, building a strong expertise base which will be disseminated widely to the benefit of the UK as a whole.

    GridPP will meet the e-science requirements of current experiments as well as working toward the needs of the LHC programme. Although LHC data-taking starts in 2006, the computing programme is already well underway with the requirement for large simulated datasets. Therefore this prototype addresses not only generic long-term issues, but it is firmly rooted in the real needs of today. We believe that for a successful programme, it is crucial to have such a well-focused application as the driving force, and this is why the Particle Physics Community is in the position to play such a pivotal role in Grid development.

    We have presented a programme consisting of several components. The foundation of the programme is a fully functional infrastructure linked to the world-wide Grid. This will be stress-tested based upon live experimental data and analysis requirements. We will participate in the development of middleware to both benefit from and contribute to the world-wide programme, and following from this, we will be able to enable full exploitation through integration of experimental analysis applications.

    GridPP will form part of an international development programme. We have a major responsibilities already in the EU DataGrid project, and in this relation, support for CERN is integrated into our programme. We will also collaborate strongly with leading groups in the USA including the PPDG and GriPhyN, and have already identified common areas of middleware and networking upon which to build a programme.

    We submit this proposal as a consortium of all Particle Physics groups within the UK who will work together to deliver e-science technology to the UK. The Grid has the potential to do for e-science what WWW has done for information access. The Particle Physics Community is dedicated to making this happen.

  5. References

  1. Report of the Steering Group of the LHC Computing Review, CERN/LHCC/2001-004, http://lhc-computing-review-public.web.cern.ch/lhc-computing-review-public/Public
  2. "Solving the LHC Computing Challenge", CERN-IT-DLO-2001-003.
  3. "Building the LHC Computing Environment", Hans Hoffmann, presented to the CERN Committee of Council 15 March 2001.
  4. R. Raman, M. Livny, M. Solomon, "Matchmaking: Distributed resource management for high throughput computing", Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing, July 1998.
  5. S. Czerwinski, B. Zhao, T. Hodes, A. Joseph, R. Katz, "An Architecture for a Secure Service Discovery Service", Computing Science Division, University of California, Berkeley.
  6. B. Tierney, R. Wolski, R. Aydt, V. Taylor, "A Grid monitoring service architecture", Global Grid Forum, 2000.
  7. G. Cancio, S. Fisher, T. Folkes, F. Giacomini, W. Hoscheck, "The DataGrid Architecture v.1", http://grid-atf.web.cern.ch/grid-atf/documents.html
  8. W3C Document, http://www.w3.org/XML/Schema
  9. I. Foster, C. Kesselman, S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations", (to be published in Intl. J. Supercomputer Applications, 2001) and http://www.globus.org
  10. IITS Organisation, "Understanding LDAP", IBM Corporation, 1998.
  11. S. Fitzgerald, G. von Laszewski, M. Swany, "GOSv2: A Data Definition Language for Grid Information Services", Global Grid Forum, GWD-GIS-011-5.
  12. http://www.sql.org
  13. XSL Transformations (XSLT), http://www.w3.org/TR/xslt
  14. A. Samar and H. Stockinger, "Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication", IASTED International Conference on Applied Informatics (AI2001).
  15. H. Stockinger et al., "Towards a Cost Model for Distributed and Replicated Data Stores" to appear in "9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001", Mantova, Italy, February 7-9, 2001, IEEE Computer Society Press.

 

  1. Glossary and Acronyms
  2. API (abbr.) Application Programming Interface

    An API defines how programmers utilise a particular computer feature. APIs exist for windowing systems, file systems, database systems and networking systems.

    COTS (abbr.) Comercial Off The Shelf

    It can be software, computer hardware, board products, power systems, etc.. Some definitions say it is anything that is catalogue-orderable versus custom-developed. Normally custom-developed items that are re-used with minimal change are considered non-developmental items (NDI) and not COTS.

    DANTE The company that plans, builds and manages advanced network services for the European research community.

    DataGrid EU Grid project.

    FTE (abbr.) Full-time Equivalent.

    Géant A project to provide the Next Generation of European Research Networking from DANTE. The proposal, known as Géant, advocates an evolutionary approach to the development of the network based on the existing structure, to create a shared multi gigabit core network available to all of the national research networks across Western, Central, and Eastern Europe.

    GEANT4 PP detector simulation package, developed at CERN.

    Globus The Globus Project provides a software toolkit that make it easier to build computational Grids and Grid-based applications.

    GridPP Grid for Particle Physics

    A collaboration of UK Particle Physics Institutes working on the Grid for Particle Physics. This proposal.

    GriPhyN US-funded research project. The collaboration is a team of experimental physicists and information technology researchers who plan to implement the first Petabyte-scale computational environments for data intensive science in the 21st century.

    ICT (abbr.) Information Communication Technology.

    JIF (abbr.) Joint Infrastructure Fund – to supply infrastructure for universities.

    JREI (abbr.) Joint Research Equipment Initiative.

    Middleware Used in reference to Grid, this is the low-level software, rather than application software, that enables the fabric (computers, storage and networks) to intercommunicate and allows the sharing of these resources. It works by virtue of having a common set of Grid protocols.

    PP (abbr.) Particle Physics.

    PPDG (abbr.) Particle Physics Data Grid.

    US equivalent (DOE funded) to GridPP.

    QoS (abbr.) Quality of Service - especially used in reference to networks.

    Connection with a good quality of service will, for example, supply a guaranteed bandwidth between two points.

    SDK (abbr.) Software Development Kit.

    SI95 (abbr.) SPECint95

    SPEC is an acronym for Standard Performance Evaluation Corporation, a non-profit corporation set up by many computer and microprocessor vendors to create a standard set of benchmark tests. The most widely used set of tests, known as SPEC95, results in two sets of measurements, one for integer operations (SPECint95) and one for floating-point operations (SPECfp95). The SPEC95 benchmark tests are also called CPU95 tests

    SRIF (abbr.) Strategic Research Infrastructure Fund

    UK based funding initiative from HEFCE.

    SuperJANET The UK high-speed, joint-academic computer network.

    SY (abbr.) Staff Year

    The work done by one person (one FTE) in one year.

  3. APPENDIX WorkGroups
    1. A: Workload Management
      1. Programme

The Particle Physics Grid will consist of a heterogeneous collection of resources located at sites across the country (and the world). Each site will have its own combination of policies, priorities CPU and data resources, and will be connected to all the other sites by networks of varying bandwidth. Distributed scheduling, involving either function migration or data migration or both, will be central to the successful utilisation of such a Grid. When a job is submitted, the scheduler will make the decision as to where that job runs based factors such as:

The scheduling services that we produce must be scalable to scenarios where an individual user may have his job running at tens of sites, on thousands of CPUs, accessing hundreds of thousands files located around the world. It must also be able to handle these scenarios in a fault-tolerant way

The development of a common job description/control language is an important element in job submission. This language will have to be able to specify/describe:

This will be the mechanism by which information about the job is supplied to the scheduler. Currently existing languages are RSL (Globus) and ClassAds (Condor), and it is likely that the language eventually used will be derived from one or both of these. The scheduler will then use a matchmaking service (such as the Condor Matchmaker) to allocate appropriate resources to that job.

The UK commitments to the EU DataGrid in WP 1 are in the areas of requirement definition and testing and refinement. Whilst it is important that we fulfil our commitments to the EU DataGrid, clearly the UK should not restrict its activities in this very important area to those required by the EU.

D0 UK plan to collaborate with FermiLab in adding Grid scheduling and submission capability to SAM (Sequential data Access with Meta-data). BaBar UK have similar intentions towards their software. The UK is very well situated to link activities in this area in the EU DataGrid to those in the US Particle Physics DataGrid, and already provides the liaison person between the WP 1 of the EU DataGrid and the D0 part of the US Particle Physics DataGrid.

The UK effort will be concentrated in the following three areas:

Adapting, Testing and Refinement

This broadly aligns with task 1.7 of the EU DataGrid, but would also include the planned activity of D0 UK, and would involve close collaboration with people working in WP 8 and WP6. This work would start straight away and would be on going throughout the 3 years.

Development of a Common Job Submission/Description Language

This work needs to be carried in the near future, and our involvement would help to ensure consistency between EU and US work.

Modelling and Profiling of Different Sorts of PP Jobs

The information gathered will then be used to optimise scheduling for the different kinds of jobs. Further it may be used in the construction of a super-scheduler (scheduler of schedulers) for PP. This is a longer-term project, but may well provide useful information at an early stage.

      1. Milestones

Installation of Scheduler and Submission Mechanism

It is very likely that job submission mechanisms which incorporate Condor-G will be adopted in the EU DataGrid and the US PPDG. It is essential that we deploy and test a scheduling and job submission mechanism as early as possible. The EU DataGrid will be first to produce such a mechanism and so it this that will be used for the early tests. These tests will reveal failing of the existing scheduler mechanisms. These failings will be addressed in the section called "Further Testing and Refinement". This milestone will be the highest priority during the first 6 months of the project.

Deliverable

Dependencies

Production of Scheduler by WP1 of the EU DataGrid. (Component 1)

Resources Required

Staff: 0.5 SY all in first 6 months.

Equipment: Access to 100 CPUs and 1TB of Disk for short (1 week-long) periods of time.

Develop JCL/JDL

The development of the JCL/JDL will be on going throughout the three years. This development will rely heavily on the work being carried out by DataMat as part of the EU DataGrid. We will endeavor to produce a JDL/JCL that is common to both the EU DataGrid and the US PPDG.

Deliverables

Dependencies

EU DataGrid scheduler development.

Resources Required

Staff: 0.5 SY Split equally between first two years

Equipment: Access to 100 CPUs and 1TB of disk for short (1 week-long) periods of time

Modify SAM to Allow Condor-g Submission:

SAM is part of FermiLab’s contribution to the US PPDG. It allows users to access data stored at different sites around the world. However, at the moment, it only allows job mission to the LSF batch system. This means that while its data access uses a Grid approach, its job submission does not. Modifying this submission mechanism will make SAM the most complete example of how Particle Physics can utilise the Grid. The large amounts of data that D0 (hopefully) will collect will make this a very real test for the LHC. A working version will be produced within the first year of the project. The work in the second year will make this working version more robust and optimised.

Deliverables

Dependencies

The existence of a Grid scheduling mechanism.

Resources Required

Staff: 1 SY Split 0.25%/0.5%/0.25%.

Equipment: Access to FermiLab, Lancaster and one other D0 environment.

Further Tests and Refinement

This will be a continuation of the work in Milestone 2. Problems encountered by attempts to run real PP jobs using Grid schedulers will be addressed.

Deliverables

Dependencies

The existence of a Grid scheduling mechanism.

Resources Required

Staff: 1 SY split 0%/25%/75%.

Equipment: Access to 1000 CPUs and 30TB of Disk for short (1 week-long) periods of time.

Profiling PP Jobs and Scheduler Optimisation

PP jobs are different in nature from many other applications. Even within PP, Monte Carlo production is very different from data analysis. Profiling and modelling these different jobs will allow the scheduler to make an intelligent choice as to where to run the job. It will also address questions such as "Do you move the data to the CPU power or visa-versa, and on what time-scales do you make such decisions?"

Deliverables

Dependencies

A working Grid scheduling mechanism.

Resources Required

Staff: 1.5 SY split 9%/33%/58%.

Equipment: Access to 100 CPUs and 1TB of disk for short (1 week-long) periods of time.

Super-Scheduler Development

The overall optimisation of resources may require the existence of a super-scheduler, that is, a scheduler that talks only to other schedulers. If the work performed for Milestones 5 and 6 shows that the existence of a super-scheduler is required then a specific instantiation will be required for Particle Physics. It is likely that this super-scheduler will be built from an existing scheduler (probably the Condor Matchmaker), and therefore its development by one person in the final year of the project is not unrealistic. This work will be carried out in collaboration with the EU DataGrid project.

Deliverables

Dependencies

The scheduling decisions will be made according to the results of the work carried out for Milestone 5.

Resources Required

Staff: 0.5 SY All in final year.

Equipment: Access to 1000 CPUs and 30TB of disk for short (1 week-long) periods of time.

Planning, Management and Co-ordination.

The activity of this WorkGroup will be very strongly linked to both the EU DataGrid and the US PPDG, and so will require careful planning and management. It is also important that the direction which the work will take is clearly documented.

Deliverable

Dependencies

None.

Resources Required

Staff: 1.5 SY. Split equally into the three years.

Equipment: None.

    1. B: Information Services and Data Management
    2. A Grid Information Service is required which efficiently and consistently publishes and manages a distributed and hierarchical set of associations. Research is required into maintaining global consistency without sacrificing performance. A practical approach could be to ensure local consistency within a domain but to allow for an unreliable and incomplete global state. It will be necessary to define suitable formats for generic and domain dependent meta-data. The initial goal of this WorkGroup is to define and provide the core software infrastructure linking producers and consumers of Grid information and the linking of information resources via index servers. This will require the development of common protocols, APIs and schema.

      The developments of the Monitoring Services WorkGroup will rely heavily on this WorkGroup. Indeed these two are combined as one entity in the DataGrid project.

      1. Information Model
      2. Generalised large-scale datagrids consist of a multitude of differing type of services across differing administrative domains (e.g. meta-data catalogues, datastores, CPU farms, replication services). A client must be able to issue efficiently a complex service query and be given a list of services that can perform a certain function. The system must be globally scalable, efficient (limiting network load) and robust against network or service failures. It must also be secure, responding only to authenticated users, and permitting service-providers to limit who has access to their service description.

        In administratively centralised small and medium scale environments, there are existing systems of varying complexity ranging from centrally administered configuration files to systems such as Condor ClassAds [4]. The former lack complexity (only basic service information can reasonably be stored), robustness (they do not respond well to service failure) and scalability (since they become impractical to manage after a certain point). The latter is centralised, thus has a single point of failure and a performance bottleneck, and is not suitable to global scaling. Some work on larger-scale solutions exists [5], but these have concentrated on hierarchical models of service indices. This is problematic since a single hierachy cannot adequately cope with the wide variety and complexity of service queries that will be made on a global scale datagrid.

        The core information model used will be that proposed by the Global Grid Forum [6], which has been designed to be very flexible. The main interfaces are the producer interface which is given the events to make available and a consumer interface which is able to obtain to register and obtain all events or take events one at a time. The producer must register its presence with a directory service so that the consumer processes may find it and the directory service itself must be either "well known" or must be known by another directory service.

        Service discovery is a core aspect of any large-scale datagrid, before a user or any Grid service acting for a user can accomplish anything, it must be able to locate the Grid services it requires to do its job. A novel solution is proposed [7], which will allow for the construction of a distributed "web" or graph of services and their associated descriptions. An XML-based schema [8] is the proposed candidate with which to define the service descriptions, since it permits complex descriptions to be expressed, while enforcing a definable syntax for the query. A Grid service can then register itself with one or more of these indices; services must continually re-register themselves (with an appropriate period to be dynamically determined using information from the network-monitoring Grid services) so that if a service or network fails, it will be removed from the list of active services. The resulting flat topology with dynamic updating presents a scaleable, robust and manageable solution.

      3. Directory Service
      4. Initially the priority will be given to the development of enhanced information services based on the Globus GIS model [9]. It is intended that this work will be closely co-ordinated with the Globus project. Our requirements include the development, packaging and deployment of a robust, modular structure using the LDAP v3 protocol [10]. This will require the identification and enhancement of key components such as the LDAP server. Using these components together with the Globus infrastructure, we intend to develop a flexible model allowing the plug-in of multiple LDAP backends and the flexible deployment of resource-oriented and collaboration-oriented index servers. This will facilitate the construction of multiple, possibly overlapping, virtual organisations. Security will be handled by a combination of secure sockets (SSL) and by adding GSI-based access controls to the directory servers.

      5. SQL Database Service
      6. The ability to store, query and retrieve generalised meta-data is fundamental to any large-scale datagrid activity. The majority of Grid services (e.g. data indices, replica management, monitoring services) are either producers or consumers of meta-data and they need to have some common method of querying and storing their data so that they are interoperable. The generalised database service must be able to operate on any type of local or remote RDBMS. All current global systems for publishing meta-data are built on top of general purpose directory services such as LDAP and DNS. These are based on a hierarchical query model, which limits the inter-object relationships. As the volume and complexity of meta-data increases, so do the relationships between them. In a large-scale Grid environment, it will be desirable to be able to make complex relational queries of the available meta-data, which the current hierarchical directory services cannot support.

        A proposed solution [7] is to use Open Source, standard technologies to implement a general relational-database access-protocol. A 3-tier model is proposed to maximise the modularity of the solution and allow interoperability with existing and developing technologies. The required database operation will be encoded by the client application in a suitable mark-up language (XML Schema [8] is proposed). This will then be transported to the database server using a standard secure transport protocol (envisaged to be HTTPS, as used by secure world-wide web communication). At the server end, the XML request can be converted into a standard database query (e.g. SQL [12]) and the operation performed on whatever back-end database is being used on the server. In the case of a query, the result of the query can be packaged using an XML template and transported back over HTTPS to the client.

        Core database functionality (supporting insert, delete, update and query operations) will be implemented in the first year of the project. It will be tested using the MySQL database. Command-line and web-based querying will be available. In the case of web-based queries, the XML result can be converted via an XSLT [13] stylesheet on the server side directly into HTML. Additionally, a Java API for the protocol will be designed and implemented. The second stage of the project is to extend the basic functionality to allow for more complex querying using the techniques developed in the Query Optimisation project.

      7. Schema Definition
      8. A key task for information services will be the setting up and maintenance of a schema repository. It is intended to support all schema and services required by DataGrid; however, we wish to maintain flexibility, particularly during the early stages of the project, in order to facilitate easy development and integration with other requirements. It is expected that a standard Schema Definition Language such as GOS [10], which may be readily transliterated into other formats including SQL and the RFC2256 format required by LDAP servers, will be adopted.

        In general our policy is, primarily through the DataGrid, to try to establish global standards for the key schema on which the project depends. The most likely channel to do this is via the GIS group of the Global Grid Forum.

      9. Interaction with Other WorkGroups
      10. Since the information services form a key component of the middleware, the interaction with the other WorkGroups will be critical. In general it is for the other WorkGroups to define and implement their information producers and consumers; however, we aim to recognise common requirements and, wherever possible, provide general solutions. This will be facilitated by the provision of standard APIs and the development of toolkits for standard services. For the testbed, including existing experiments, we plan to provide some support and act in a co-ordination role.

      11. Data Management
      12. Large databases emerge as an important resource across science and commerce and a new generation of software tools are required to provide transparent and reliable access to data, regardless of where and how it is stored.

        The approach to Grid Data Management is closely linked to on-going developments at CERN within the DataGrid project, and has as objectives the implementation and comparison of various data management approaches including cacheing, file replication and file migration. Such middleware is critical for the successful handling of large distributed datasets. This work will be an integral part of a general-purpose information sharing solution with unprecedented automation, ease of use, scalability, uniformity, transparency and heterogeneity. It will enable users to securely access massive amounts of data in a universal global name-space, to move and replicate data at high speed from one geographical site to another and to manage the synchronisation of remote copies. WithinGridPP, we will be focusing on two areas of data management:

        Data Replication

        Where copies of files and meta-data (data describing data) need to be managed in a distributed and hierarchical cache so that a set of files (e.g. Objectivity databases) can be replicated to a set of remote sites and made available there. To this end, location-independent identifiers are mapped to location-dependent identifiers. All replicas of a given file can be looked up. Plug-in mechanisms to incorporate registration and integration of datasets into Database Management Systems will be provided.

        The work package will develop a general-purpose information sharing. Novel software for automated wide-area data caching and distribution will act according to dynamic usage patterns. Generic interfacing to heterogeneous mass storage management systems will enable seamless and efficient integration of distributed resources.

        Figure 8: Interaction of components in the Data Management workpackage.

        The overall interaction of the components foreseen for this work package is depicted in Figure 8. Arrows indicate "use" relationships; component A uses component B to accomplish its responsibilities. The Replica Manager manages file and meta-data copies in a distributed and hierarchical cache. It uses and is driven by plug-able and customisable replication policies. It further uses the Data Mover to accomplish its tasks. The Data Mover transfers files from one storage system to another one. To implement its functionality, it uses the Data Accessor and the Data Locator, which maps location-independent identifiers to location-dependent identifiers. The Data Accessor is an interface encapsulating the details of the local file system and mass storage systems such as Castor, HPSS and others. Several implementations of this generic interface may exist, the so-called Storage Managers. They typically delegate requests to a particular kind of storage system. The Data Locator makes use of the generic Meta-data Manager, which is responsible for efficient publishing and management of a distributed and hierarchical set of associations, i.e. {identifier ® information object} pairs. Query Optimisation and Access Pattern Management ensures that for a given query an optimal migration and replication execution plan is produced. Such plans are generated on the basis of published meta-data including dynamic logging information. All components provide appropriate Security mechanisms that transparently span world-wide independent organisational institutions. The granularity of access is currently limited to the file level. This WorkGroup would extend Grid access to individual objects and enable fast access to a globally distributed OO database.

        Querying for a specific service involves traversing this graph to find a matching service. The project is defined in two key stages. The initial stage is to design and implement the protocols and APIs necessary to set up the basic graph topology and those necessary to publish service descriptions and match queries. This will leverage existing work as much as possible, utilising standard, modular and open source components to build the new architecture. It will use the common security and authentication infrastructive of the DataGrid. It is envisaged that the service descriptions will be stored using the new SQL Database Service. A prototype system offering this basic functionality will be available in the first year of the project. It will support command-line querying, web-based queries and a Java API.

        The second stage is to develop some "intelligent" mechanisms for searching this graph of services, so that Grid services can be efficiently discovered, limiting the load placed on the network. These will build upon the tools and protocols developed in the initial phase of the project and the methods developed in the Query Optimisation project.

      13. Query Optimisation
      14. Query Optimisation ensures that for a given query an optimal migration and replication execution plan is produced. Such plans are generated on the basis of published meta-data including dynamic logging information. The current status is that preliminary use cases are being defined and the query optimisation architecture is being refined for Particle Physics applications. The WorkGroup will optimise SQL-like query expressions.

        The main optimisation problem can be described as follows: given a set of CPU-nodes and a set of I/O-nodes in the Grid, find an optimal solution for a hybrid approach of query shipping and data replication for handling the current query.

        A basic data replication service is currently implemented with the GDMP [14] software tool enabling automatic and asynchronous replication of Objectivity files over a Particle Physics DataGrid. Its modular design allows it to implement a wide variety of replication policies. It is designed to interface fully with the Grid Replica Catalogue so that all replicas of a given object or file can be located and processed as required, and provides mechanisms for fault tolerance and recovery from network or site failures. It operates using the Globus security framework, allowing local resources to retain full control over their local security policies.

        This project seeks to develop aspects of GDMP. Early key developments will be to extend the tool to arbitrary databases (it is currently limited to Objectivity), and to link in with the developing Globus Replication Manager. It is also necessary to improve the granularity of replication. The granularity is currently limited to Objectivity files - there is a need for this to be improved to the replication of single database objects as this will support fast physics analysis optimisations on limited sub-sets of the data.

        Later work is required on query optimisation to build some intelligence into the replication system. It will be extended (with input from other WorkGroups) so that new replicas are made in response to monitored user access-patterns and dynamic load balancing and scheduling information. This will involve implementing and testing "cost" models such as those proposed in [15].

      15. Data-Mining
      16. Although this is not directly part of the DataGrid definition of this package, it is appropriate to discuss it in the context of data management. Data-mining is the semi-automatic discovery of events, patterns, associations, changes, anomalies and other features of stored data using high-level tools. This activity is data-driven and, within the context of DataGrid developments, needs to be extended to handle a distributed data environment. A particular feature of data-mining in a Grid context is the ability to perform searches etc. in a parallel rather than serial format.

        The database sizes that are being reached are currently of the order of Terabytes and are expected to reach Petabytes in the LHC era. Most RDBMS use data warehousing to store data and handle high dimensional relationships. However RDBMS have a technical limit on the size and complexity that such relationships can have. It is possible to improve the situation using Object-Oriented Databases (ODBMS) such as Objectivity, but the extension to very large databases is still revealing problems that have to be resolved.

        Within the context of the development of high-level tools for handling database queries, database services and data replication, it also makes sense to support work on the development and incorporation of high-level tools for data-mining within a distributed environment. This work will make use of fundamental research on data-mining algorithms which is being done by Computer Scientists and will support the application of such tools in a wide variety of areas.

      17. Tasks

      1) Project Plan, Co-ordination and Management

      The project will require careful co-ordination and management in order to maintain consistency and uniformity across services. The project will require co-ordination with other WorkGroups, with DataGrid and the US PPDG, and also will need to involve the Globus team.

      2) Directory Services

      The initial release of the Information Services structure will rely heavily of the GIS LDAP based services provided by Globus. We plan to adapt these in collaboration with the Globus team to meet our short-term needs. In addition, it likely that LDAP-based services will form an important component of the structure throughout the lifetime of the project.

      3) Distributed SQL Development

      During the first year we intend to develop and test a prototype distributed SQL-based information system. Depending on the success of this work, increasing effort will be required to develop and deploy a fully functional release. If a fully fledged distributed SQL-based service proves impractical, we anticipate adopting a hybrid approach using a combination of SQL and directory services.

      4) Schema Repository

      The establishment of a Schema Repository will be a significant task during the first year of the project. Once established it will need to be maintained.

      5) Data Replication

      We anticipate adopting the Data Replication solution proposed by the DataGrid. Effort will be required for liaison with the CERN-centred effort and support within the UK.

      6) Query Optimisation and Data-mining

      In the first instance, this will require a generalisation of the GDMP software to arbitrary databases. Later work will improve the granularity of the replication and build in intelligence.

      7) Releases

      It is intended provide an annual release of the core software. The Basic, Interim and Final releases will each require testing and maintenance. Documentation will be provided.

      8) Liaison

      The Information and Data Management services form a key component of the project. The liaison with the other WorkGroups and existing experiments will be critical. Part of this task will include the establishment of common APIs and toolkits.

    3. C: Monitoring Services
      1. Overview
      2. Monitoring, in all its aspects, of what is going on in the Grid is critical to successful exploitation. It is tightly coupled with the Information Services WorkGroup as it uses these services to provide monitoring information to a number of clients and taking information from a number of sources as shown in Figure 9.

        Figure 9: Relationship between Monitoring Services and other WorkGroups.

        At the top of Figure 9 are the major providers of monitoring information and at the bottom are the major consumers. This is an over simplistic view, but gives the overall situation.

        The combined work of the Monitoring and Information Services WorkGroups maps well into the DataGrid Monitoring Workpackage (WP3). Indeed, the technical programme of this WorkGroup is almost completely a subset of DataGrid WP3.

        The monitoring and information architecture will follow broadly the direction of the Grid Monitoring Architecture as proposed by the Performance group of the Global Grid Forum (GGF). A schematic view of the architecture is shown in Figure 10 and revolves around a producer-consumer scheme. Producers are connected to domain specific sensors (provided in other WorkGroups) via an API. The availability of information is registered in a directory service, which is used by consumers to locate relevant information. Consumer access to information is via a different API which may hide the directory lookup.

        Using this approach, it is reckoned to be possible to satisfy most use-cases (surveyed in the Performance Group of the GGF). Services for archival/retrieval, real-time application monitoring, basic information retrieval for decision making etc. can all be constructed within this framework.

        Figure 10: An architecture for Grid Monitoring.

      3. Tasks
      4. Requirements and Design

        A full requirements analysis will be performed to evaluate the needs of all classes of end-users (which will include all classes of producers and consumers). Interfaces to other sub-systems will be defined and needs for instrumentation of components will be identified. This will be carried out in liaison with other WorkGroups as appropriate. This WorkGroup will participate in the DataGrid Architecture Task Force and will take on board user requirements, which will be gathered through this forum. An architectural specification of the components (and their relationships) necessary to meet the objectives will be established. Boundary conditions and interfaces with other Grid components will be specified and where appropriate APIs will be defined. Standards for message formats will be set up within the project, taking into account the work also done in other standards bodies.

        Technology Evaluation

        Evaluation of existing distributed computing monitoring technologies will be carried out to understand potential uses and limitations. Tests will be made in the first demonstration environments of the project to gain experience with tools currently available. Issues studied will include functionality, scalability, robustness, resource usage and invasiveness. This will establish their role in the Grid environment, highlight missing functionality and will provide naturally input into the requirements and design task.

        Infrastructure

        Software libraries supporting instrumentation APIs will be developed and gateway or interface mechanisms established to computing fabrics, networks and mass storage. Where appropriate, local monitoring tools will be developed to provide the contact point for status information and to be a routing channel for errors. Directory services will be exploited to enable location and/or access to information. Methods for short- and long-term storage of monitoring information will be developed to enable both archiving and near real-time analysis functions.

        Analysis and Presentation

        This task will cover development of software for analysis of monitoring data and tools for presentation of results. High levels of job parallelism and complex measurement sets are expected in a Grid environment. Techniques for analysing the multivariate data must be developed and effective means of visual presentation established. This task will exploit expertise already existing within the project, particularly through DataGrid colleagues in Hungary.

        Test and Refinement

        The testing and refinement of each of the software components produced by the above tasks will be accomplished by this task, which continues to the end of the project. Evaluations will be performed in terms of scalability, etc.. This task will take as its input the feedback received from the Prototype Grid WorkGroup and will ensure the lessons learned, software quality improvements and additional requirements are designed, implemented and further tested.

      5. Deliverables

The following list of deliverables are fully synchronised with those of the Monitoring WorkGroup of the DataGrid project.

    1. D: Fabric Management and Mass Storage
      1. Introduction

The fabric of the Grid will be woven from several heterogeneous threads:

The UK Grid for Particle Physics recognises the need to build and develop on the experience that exists in managing these resources and the existing work on integrating these into the Grid. Two areas critical for GridPP are dealt with within this WorkGroup: the management of large compute and storage resources.

GridPP is critically dependent on large scale computing given the need to analyse and simulate petascale datasets. This will ultimately effect the UK’s ability to fully exploit the potential of the LHC programme. GridPP also needs active work on data storage for several reasons. PPARC has commitments to the DataGrid project workpackage 5 in this area. The UK also has a number of large data repositories (RAL, Edinburgh, Lancaster) and distributed systems (CDF, BaBar) with the prospect of more repositories (Sheffield, Liverpool) and growing requirements (Tier-1 and Tier-A regional centres). Co-ordination of the access to and management of these facilities will both reduce the overall development effort required across the UK and ensure that all facilities can be accessed by a common set of protocols in the spirit of the Grid. This will then enable the common higher-level software for data management to be used at all these sites.

The objectives of the Fabric Management WorkGroup will be:

      1. Relationship with DataGrid
      2. WorkGroup D has a programme of work that is closely linked to the DataGrid Workpackages 4 and 5 on Fabric and Mass Storage Management. In its current form, it does not envisage extending the DataGrid work to non-LHC experiments for mass storage, although for compute infrastructure it is intended that the existing fabric will be Grid-enabled.

        This is still being discussed actively and developed within the DataGrid project, for both WP4 and WP5, and the exact details may still change. The EU will provide 1 FTE for three years to the UK and PPARC has committed to a further 1.5 FTE for the same period. This only covers work at the Tier-1.

         

      3. Tasks

Below we outline the major tasks of the Fabric Management WorkGroup.

Task D1: Integration of Existing Compute Fabric

Description

Modify and enable and write APIs for BaBar, CDF, D0 and LHC compute resources to enable publishing of resources and compute-cycle brokerage on the Grid. To provide actual access to these resources on the Grid.

Deliverables

Dependencies and Risks

Co-ordination with existing UK production facilities.

Resources Required

Staff: 1.0 SY/year.

Equipment: Access to existing facilities.

Task D2: COTS Compute System Development

Description

Develop system software to manage large-scale (10,000+) processor farms for analysis and Monte Carlo production within the UK

Deliverables

Dependencies and Risks

Access to large-scale CPU farms.

Resources Required

Staff: 1.0 SY/year.

Equipment: Pre-production level processor farms.

Task D3: DataGrid Common API

Description

Define and implement a common API to be used for access to mass storage systems by Grid middleware and applications.

Deliverables

Dependencies and Risks

Co-ordination with other DataGrid workpackages.

Resources Required

Staff: 1.0 SY/year.

Equipment: Significant hardware resources will be needed to meet the production requirements of a BaBar Tier-A centre and the development of a prototype Tier-1 centre for LHC. This WorkGroup will build on the basic hardware installed as part of the Implementation WorkGroup.

Task D4: DataGrid Meta-data for Mass Storage

Description

Define a set of meta-data for a mass storage resource and the files contained therein. Publish this meta-data in the appropriate DataGrid Information Service for use by Grid middleware and applications.

Deliverables

Dependencies and Risks

Relies on Information Service being provided and information publishing API to be defined. Co-ordination with other DataGrid workpackages.

Resources Required

Staff: 2.0 SY with a profile of 0.5/1.0/0.5.

Task D5: DataGrid Tape Exchange Design

Description

Develop a method for interchange of physical tapes between collaborating regional centres running different mass storage systems.

Deliverables

Dependencies and Risks

Suppliers of propriatory systems may not divulge their internal meta-data. Solution only works between systems which share a common physical tape solution.

Resources Required

Staff: 1.0 SY split between years 1 and 2.

Equipment:

        1. Component 2

Task D6: DataGrid Tape Exchange Implementation

Description

Implement the tape exchange mechanism designed by Task D3.

Deliverables

Dependencies and Risks

Resources Required

Staff: 1.5 SY in years 2 and 3.

Equipment:

      1. Current Experiments
      2. Current experiments will need support if they are to make use of the new Grid techniques and resources. In particular, the FNAL and SLAC experiments will need help to develop their Grid plans. This is not covered in the current proposal although any software developed would be available to them.

      3. Collaborative Links

Through DataGrid WP4 and WP5, links have been established with DESY, SLAC, FNAL, Jefferson Lab and the PPDG and Griphyn projects. No formal collaborations exist.

    1. E: Security Development
      1. Introduction
      2. Computer and network security is an essential low-level component in the construction and operation of any Grid. The users of Grid applications require simple logon procedures, e.g. single sign-on, to identify themselves across the whole Grid and also on collaborating Grids. This secure authentication of users, machines or services can then be used in the authorisation of access to resources. Owners of Grid resources require simple methods and tools to control the authorisation database. This will include the assignment of individuals to one or more groups and the allocation of various roles and privileges to individuals or groups. Another ingredient of the security model will need to meet the requirements for auditing and discovery and tracking of attacks. Resource allocation, quotas and Grid accounting, while not within the remit of this WorkGroup, will also be based on the authorisation entities.

        The remit of this WorkGroup is to evaluate and track the emerging standards and implementations of Grid security mechanisms and, in collaboration with other middleware WorkGroups and Grids, develop, test and implement solutions for the various security requirements. This is likely to require the development of short-term solutions while we wait for the more complete and generic implementations to appear.

        Much work has already been done on Grid security both by the Globus team and others in the Global Grid Forum and the IETF. At this point, the problem of authentication is well on the way to being solved via the use of Public Key Infrastructure using X.509 certificates. However, the whole area of authorisation is much less mature and significant work will be required in this area over the coming years. For interoperability between Grids it is essential that these security developments both lead and follow the emerging standards and that the work is done in close collaboration with other Grid projects.

        All Grid services and middleware components will require security, so an important part of the remit of this group will be to work closely with other WorkGroups.

        Institutes providing Grid facilities will have to be convinced that the middleware and services are running in a secure way so that their resources will not subject to successful attack. In addition to the technical developments, this will require careful specification of Certification Policies and Practice Statements and rigorous operation of Registration, Certification and Authorisation Authorities. While the implementation and operation of these will be the duty of the Grid Prototype WorkGroup, the Security Development WorkGroup will need to help define the policies.

      3. Relationship with DataGrid
      4. There is no work package for Security in the DataGrid project. The whole area of security is being handled by a sub-group of WP6, the testbed integration package, together with the close collaboration of the middleware and application work packages.

        WorkGroup E will work closely with the DataGrid security sub-group to ensure that common tools and methods are used.

      5. Tasks

Task E1: Gather Requirements

Description

Gather and document the initial security requirements from the middleware and applications. On-going requirements in years 2 and 3 will be collected through tasks G4 and G8.

Deliverables

Dependencies and Risks

This will require co-ordination with other WorkGroups and Grid projects.

Resources Required

Staff: 0.2 SY during the first year.

Task E2: Survey and Track Technology

Description

Make a survey of current work and developments in GGF, IETF, Globus and elsewhere and make recommendations.

Deliverables

Dependencies and Risks

Input required from GGF, Globus and other Grid projects.

Resources Required

Staff: 0.5 SY with a profile of 0.1/0.2/0.2.

Task E3: Design, Implement and Test

Description

Design, implement and test short-term and long-term security solutions for authentication, authorisation, auditing and other security services according to the architecture specified in Task E5. Input to this task will come from tasks G6 and G9 and the implementations identified in Task E2

Deliverables

Dependencies and Risks

Input will be required from the Architecture task (G5). The industry and other Grid projects may not produce security implementations quickly enough. Where functionality is missing, short-term solutions will need to be developed by tasks G6 and G9. Every effort will need to be made to ensure that the security policies are sufficient to allow and encourage trust such that the security methods will be used.

Resources Required

Staff: 1.6 SY with a profile of 0.2/0.7/0.7.

Equipment: This task will need some dedicated hardware in the form of a number of PCs for testing new implementations (£4k).

Task E4: Integrate with Other WG/Grids

Description

Work with other WorkGroups and Grid projects to ensure common security solutions.

Deliverables

Dependencies and Risks

This will require collaboration with many different Grid projects.

Resources Required

Staff: 0.7 SY with a profile of 0.1/0.3/0.3.

Task E5: Architecture

Description

Work with other WorkGroups and Grid projects to define the architecture of the security services and their relation to the architecture of other Grid components.

Deliverables

Dependencies and Risks

This will require co-ordination with other WorkGroups and Grid projects.

Resources Required

Staff: 0.25 SY with a profile of 0.05/0.1/0.1.

Task E6: Security Development

Description

Development of security tools and services as required both for short-term and long-term use.

Deliverables

Dependencies and Risks

The main risk is that the small amount of effort available may not be sufficient to develop the tools and services required. If this turns out to be the case, we will have to work with other Grid projects to make sure that the requirements are met.

Resources Required

Staff: 0.75 SY with a profile of 0.15/0.3/0.3.

Task E7: Management of WG

Description

Project planning and management, including WG meetings, documentation etc..

Deliverables

Dependencies and Risks

None.

Resources Required

Staff: 0.25 SY with a profile of 0.05/0.1/0.1.

Task E8: DataGrid Security

Description

The DataGrid project proposal and contract did not specify any direct effort for work on security, neither funded by the EU nor the project partners. The effort required for this task will be needed for the participation in and leadership of the DataGrid security sub-group.

Deliverables

Dependencies and Risks

The main risk is that the small amount of effort available will not be sufficient. If this becomes a problem this will be reported to the DataGrid management for them to solve.

Resources Required

Staff: 0.5 SY with a profile of 0.1/0.2/0.2.

Task E9: DataGrid Security Development

Description

This task will complement Task E6. The effort will be used to work on the development of security tools and services for the DataGrid project. These will also be useful in the UK and other Grid projects.

Deliverables

Dependencies and Risks

The main risk is that the small amount of effort available will not be sufficient. If this becomes a problem this will be reported to the DataGrid management for them to solve.

Resources Required

Staff: 0.75 SY with a profile of 0.15/0.3/0.3.

      1. Collaborative Links

Collaborative links, as mentioned in Task E4, will be required with many Grid projects either bi-laterally or via the Global Grid Forum. Every effort will be made to ensure that these links are made in the most efficient way, either via the DataGrid project or other suitable groups.

    1. F: Networking Development

In the networking sector, the Grid requirements can be broken down into the following components which are listed moving from the fabric layer upward toward the middleware layer:

The remit of the Networking WorkGroup is to address each of these issues and ensure timely delivery of the components required for testbed operations, and ensure that the necessary underlying fabric services are developed in conjunction with the network providers.

In the sections below, we describe the context and high-level tasks and deliverables associated with each item. The second and third items overlap strongly with other Grid applications and therefore should be considered for core e-science support. The last two items overlap with other WorkGroups within the GridPP.

      1. Infrastructure

EU DataGrid and GridPP planning makes the assumption that the core fabric provision will be via the National Research Network providers (NRNs). In the UK this means SuperJANET4 (SJ4), provided by UKERNA. UKERNA have made clear policy statements of their intention to actively support Grid needs insofar as possible. We welcome this and will work closely with them. Most UK sites are served by MANs and it is therefore also necessary to influence MAN planning and importantly ensure that SJ4, MANs and sites can inter-operate to deliver the end-to-end requirements.

All inter-European traffic will be carried via Géant (run by DANTE). We are already in discussion with DANTE through the EU DataGrid networking group.

In early years the principle data sources will be in the US (SLAC and FNAL) necessitating adequate trans-atlantic links to be provisioned. At present, SJ4 has approximately 1 Gbit/s capacity from London to the US. It is assumed that this will be upgraded and/or alternative provision made as planning proceeds.

The approximate scale of bandwidth i/o requirements for Tier-1 and 2 centres are: 2001 155 Mbits/s; 2002: 360 Mbits/s; 2003 622 Mbits/s.

The fundamental networking task is to ensure that adequate raw bandwidth exists between Grid sites, and that this is available to Grid traffic either by over-provisioning or as some form of managed bandwidth. The choice of solution is assumed to be the responsibility of the providers in response to Grid requirements. The tasks under this heading are:

Task NET-1: Management, Requirements and Infrastructure Provisioning

Description

This task covers the management of all work within the networking sector, including the domestic programme, relations with national and international bodies (e.g. UKERNA, DANTE) and representation within the EU DataGrid project (PTB and WP7). Implicit within this is a detailed survey to define capacity, service and topology requirements of GridPP in each project year and negotiations with all relevant administrations to assess ability to meet these requirements. Following from this will be the need to identify actions needed to ensure GridPP requirements can by met in each project year, to constitute collaborative work where appropriate, and to seek resources from other support lines.

Deliverables

Dependencies and Risks

Availability of testbed use-case documents. Agreements with administrative authorities

Resources Required

Staff: 1 SY/year.

Equipment: Local site routing equipment, local loops to transport data from supplier end-points into MANS or sites, additional routing equipment. Detailed costs cannot be known until survey and negotiations have taken place. An estimate of local loop rental for key sites is £300k/year.

It is assumed that additional resources will be made available through the JISC which will fund general Grid-specific SuperJANET core upgrades, additional routing equipment, additional peering arrangements and conceivably switching equipment for dedicated wavelengths.

      1. Integration of Network Services

Grid operation will require traffic management services running on top of the IP service. In the immediate future, there may be need for to provide managed bandwidth over IP between some key sites. In the longer term, more general Quality of Service (QoS) provision is probably crucial to Grid applications. This figures prominently in discussions of the EU DataGrid and US Grid networking groups, and in the planning of both UKERNA and DANTE. All network providers are aware of these requirements but no such service has been demonstrated yet in the Grid context and end-to-end across multiple administrative domains, in particular.

The GridPP netWorkGroup needs to work with the UK network provider (UKERNA) and international partners to develop and demonstrate these services running between applications at Grid sites.

The work can be split into two categories:

Generic e-science

The first category covers the core network services development which needs to take place in the UK. This is generic and by definition applicable to all Grid applications (i.e. not specific to GridPP). A project has been developed to address these issues involving a collaboration between UKERNA and a Computing Science group. It has been submitted separately to PPARC, in concert with this proposal, for generic e-science funding and for industrial collaborative support.

Application Specific

The second category covers all of the work needed to integrate such services into the GridPP applications. In the GridPP there will be many testbed applications ranging from the near-term experiments through to the LHC data challenges. Managed bandwidth and QoS needs to be integrated into the middleware tools set, and then the experiment specific applications need to be modified to make use of these services.

In this proposal, we only request support for the second category, i.e. for the application specific work. It is assumed that the core work is supported separately through generic and industrial support lines.

This work is essential to honour EU DataGrid commitments to understanding and providing traffic management services (VPNs).

Task NET-2: Integration of Network Services into Middleware and Applications

Description

This task covers the work needed to integrate and demonstrate end-to-end managed bandwidth and QoS services within the GridPP project. Services must first be integrated into the lower levels of middleware (e.g. the data access layers and information services). Then application specific software needs to be adapted to make use of these services (e.g. SAM, GASS). These services will be demonstrated for live Grid testbed applications in each of the project years. In the initial stages, key sites in the UK will be connected - most probably those producing large Monte Carlo datasets which must be transported to the Tier-1 and Tier-0 centres. In parallel, we will collaborate with leading Grid groups in the USA to attempt to demonstrate trans-atlantic services in the context of the US experiments. Also we will be working within the scope of the EU DataGrid project. We will seek to configure the same types of traffic management to CERN or other European sites. This is an on-going area of work which will evolve as both UK and EU DataGrid development progresses.

Deliverables

Dependencies and Risks

Technical capabilities and agreement of administrative domains. Trans-atlantic interconnection between SJDN and ESNET (or Abilene) upon which such routing development work can take place. This is assumed to be available through UKERNA. Availability of suitable infrastructure through DANTE. Assumes generic development work is funded.

Resources Required

Staff: 2 SY/year.

Equipment: Approximately £300k for additional routing equipment at UK sites, for trans-atlantic link and for links to Europe.

      1. Data Transport Applications

Grid data rates will exceed 100s of Mbit/s over long latency routes. Such transfers will rely upon the availability of high-rate, high-volume, reliable data transport applications. Also the transport protocols will have to deal with efficient replication and update to multiple sites. It is already apparent that protocols such as "standard" TCP based FTP are unlikely to be adequate and therefore today we do not know how to satisfy the DataGrid demands. In the longer term, applications able to respond to explicit network service availability information will need be developed.

Task NET-3: Data Transport Applications

Description

As per the previous task, this has both a generic and application specific element. The generic work is included in the other proposal. Here we only request support for GridPP stress-testing and integration into the GridPP middleware and applications.

Deliverables

Dependencies and Risks

Assumes generic development work is funded.

Resources Required

Staff: 0.5 SY/year.

      1. Monitoring and Network Information Services

Network monitoring is required at several levels. This is an area where the UK has significant expertise through existing work of the PPNCG, and as a result has taken the co-ordination role in this activity for the EU DataGrid. Monitoring requirements include static information for fault detection and planning (i.e. latencies, packet loss) and dynamic throughput information for specific protocols (e.g. TCP). Work is also required to define the presentation and publication within the middleware information services.

This work is essential to honouring PPARC commitments to the EU DataGrid commitments.

Task NET-4: Network Monitoring Services

Description

Define EU DataGrid monitoring tools (by definition GridPP uses the same). Implement in phased approach starting with test sites, moving later to all participating sites. Work in conjunction with other WorkGroups to develop suitable publication mechanism to Grid information services.

Deliverables

Dependencies and Risks

None known.

Resources Required

Staff: 1 SY/year.

Equipment: 5 Linux boxes for dedicated monitoring tasks at some sites.

      1. Relation to EU DataGrid
      2. Much of this work programme overlaps with the PPARC commitment to the EU DataGrid project, most of which falls under the remit of WP7 (Networking) and the rest WP3 (Information). Planning for provisioning of infrastructure to and within the UK is already underway and this has lead directly to the generic traffic engineering work. The UK has committed to co-ordinating the EU-wide monitoring and the work is underway supported by CLRC at present. Some work has been undertaken both in the UK and at CERN in the area of multiple parallel-stream FTP applications.

      3. Collaborative Links

The following collaborative projects are currently foreseen.

      1. References and Further Details

Most of the work areas described above were identified approximately one year ago by the PPNCG and other interested parties. A work-plan was produced at that time in anticipation of resources becoming available. This is still largely correct and gives a much more detailed description of many of the aspects of the tasks listed here. Some of the work has taken place at a low level based upon ad-hoc resources and this has been reported above. The work-plan is available at: http://www.hep.grid.ac.uk/ppncg/workplan.html

The EU DataGrid project Work Package 7 covers networking requirements. Details of work can be found at http://www.eu-DataGrid.org. Many of the personnel involved with this proposal are also fully active in WP7, and as indicated, some of the tasks listed above have already been committed to in order to honour EU DataGrid commitments.

    1. G: Prototype Grid

The development of the GridPP will be driven by the needs of the UK Particle Physics experimental collaborations. It is vital that Grid technologies be made available at an early stage in the project, to enable real physics applications. This will allow experience to be gained quickly, and result in the best use of the initial investment. Of equal importance, requirements and feedback from the user community must be applied as an input to the middleware development process in order to steer the project correctly.

To address these issues, we propose the rapid implementation of a UK Grid Prototype. The Prototype will tie together new and existing facilities in the UK, in order to provide a single large-scale distributed computing resource for use by Grid application developers and users. It is anticipated that by the end of the GridPP project, almost all computing resources available to UK PP will be linked to the Grid Prototype and that this project will form the starting point for the provision of PP computing in the LHC era, in terms of hardware, software and expertise.

The Grid Prototype will provide a platform for the development and testing of Grid tools, for their exploitation in data challenges (for LHC experiments) and real physics analyses. The requirements of the running experiments will provide an essential guide to the correct approach for the LHC era. The Prototype will therefore be required to support a variety of concurrent activities, with contrasting requirements and timescales. These include:

It is therefore desirable that the Prototype be a flexibly partitionable resource. The partitioning is likely to be static in the early part of the project, but eventually dynamic and automatic according to the demand of the different groups of users and the priorities set by the project management.

The Grid Prototype WorkGrpup, by its very nature, overlaps with all other areas of the GridPP project. The Prototype will implement on a large-scale tools developed within the middleware WorkGroups and within other Grid projects. Technical decisions on software architecture and development will of course be taken by the relevant WorkGroup experts, and these policies will be reflected in the implementation. In particular, there will be close and constant interaction with the networking, security and fabric management teams, since these services are fundamental to the provision of any large-scale Grid system. However, the overall direction and priorities of the Grid Prototype project must be driven by the needs of physicists. The day-to-day operation and management of the Grid middleware services, such as information services, resource monitoring, distributed storage management and network performance monitoring, will be the responsibility of the Grid Prototype WorkGroup, in conjunction with local experts at Grid node sites. This will require significant dedicated manpower.

The UK is committed to participation in the EU DataGrid Testbed, and expects to be highly active in this area. As a result of the exchange of expertise and approaches between the two projects, there is likely to be extensive commonality between the solutions proposed for the DataGrid Testbed and those used in the UK Grid Prototype. The resource requirements of the DataGrid Testbed will be met through the allocation of some fraction of the UK Grid Prototype resources.

    1. H: Software Support

This WorkGroup combines the roles of both Application Support and Middleware Support. The WorkGroup will provide services to enable the development, testing and deployment of both middleware and applications. In addition, the group will take responsibility for the specification and development of interfaces between middleware and applications, and support use of the Grid Prototype in both small- and large-scale test programmes. These tasks will be undertaken in conjunction with the EU DataGrid and other Grid projects, in order to minimise duplication of effort and ensure compatibility between the tools and approaches developed.

The rapid take-up of Grid technologies will require the provision of robust installation tools for middleware and application software. The application installation tools are likely to take the form of "kickstart" packages, which will ensure that necessary resources are available for applications to run on appropriate remote facilities without explicit user intervention. Software Support WorkGroup members will provide expertise on the installation and distributed operation of experimental software in conjunction with the experimental collaborations.

Middleware tools developed in the GridPP project are likely to be based, at the outset at least, upon the Globus toolkit. As the project progresses, additional or alternative software toolkits may be used. The Software Support group will assist the middleware development projects in the UK by providing support services for both Globus and the layers of software developed on top of it. These services may include

Generic software development services, such as documentation tools, software repositories, software librarianship, group mailing lists, and so on, will also be provided where required, in conjunction with other Grid projects.

Application developers will be further supported through deployment on the Grid Prototype of a common "upper middleware" software layer, which is currently being discussed in Grid projects world-wide. This layer will provide an interface to generic Grid middleware tools in the form of a set of PP-specific services, such as conditions databases, tag database and meta-data management and automatic parallelisation of event-sequential jobs. Effort within the Software Support WorkGroup will be dedicated to the development and testing of upper middleware tools in conjunction with the PP application developers.

The WorkGroup will support the scheduling, management and monitoring of large-scale data challenges and analysis efforts using Grid resources. This will be of particular importance during the era of the statically partitionable Grid Prototype. It is likely that both running and LHC experiments will undertake infrequent short-term computing exercises for which the use of a large fraction of the Grid Prototype resources will be appropriate. Examples are scalability tests during data challenges and "fast turnaround" efforts for urgent analyses. By the end of the GridPP project, it should be possible to perform such scheduling dynamically through resources prioritisation, but in the early part of the project, special scheduling may be required, according to the policies decided by the project management.

It is likely that the Grid Prototype will also be used for special smaller-scale tests. These include the EU DataGrid Testbed software release programme, which requires a smooth programme of middleware functional and scalability tests in preparation for the scheduled releases to application developers. It is anticipated that tests of the most basic Grid infrastructure such as the information and security services will often require a completely partitioned "mini Prototype" in order that the stability of the main Grid Prototype is not compromised. The planning and execution of such tests will be the joint responsibility of the Software Support WorkGroup and the relevant developers.

It is anticipated that strong links will be developed between the Software Support team and the software developers within the experimental collaborations. It is therefore desirable that some members of the WorkGroup are also participants in core software activities.

    1. I: Experiment Objectives

The UK, through PPARC, has invested heavily in the construction of the LHC experiments to answer fundamental questions about the universe around us. The LHC will address, amongst other things, the origins of mass, the difference between matter and anti-matter and the unification of the four forces. To fully exploit this huge investment, it is essential the UK takes a leading role in the analysis of the data. To achieve this, it seems natural that a Grid computing environment is established to deal with the large resource requirements, with regards to the processing, transfer and storage of massive amounts of data over the disperse nature of collaborations in modern day Particle Physics Experiments. The computing needs, as estimated for the LHC Computing Review [1], of the four approved LHC experiments are shown in Table 15. The LHC will provide first physics collisions in the year 2006 and reach the full design luminosity in 2007.

Table 15: Computing Resources planned by the four LHC Experiments in 2007.

ALICE

ATLAS

CMS

LHCb

Total

ATLAS

Total tape storage/year (PB)

4.7

10.4

10.5

2.8

28.5

19.8

Total disk storage (PB)

1.6

1.9

5.9

1.1

10.4

2.57

Total CPU (kSI95)

1758

1760

2907

925

7349

1944

WAN, Bandwidths (Mbps)

Tier-0 – Tier-1 link, 1 expt.

1500

1500

1500

310

4810

1500

Tier-1 – Tier-2 link

622

622

622

622

The LHC experiments have undertaken to meet various goals, through data challenges, with regards the storage, management, simulation, reconstruction and analysis of data. The UK needs to play an integral part in ensuring the success of these challenges to maintain its prominent role in the development of computing in these collaborations. The design of the software, both experimental and generic, to exploit Grid computing needs to start now in order to evaluate fully its performance and assess whether it will meet future demands. In parallel to addressing these demands, it is also crucial that the experiments generate large amounts of Monte Carlo simulated data in order to understand and optimise the design of the detectors to maximise the physics return come 2006. Both these needs call for substantial investment in computing infrastructure, both in the area of computing hardware and software development. CERN and the experimental collaborations have recognised this fact and are preparing computing Memoranda of Understanding that will need to be signed by participating institutes and funding authorities.

The PPARC programme also includes a large element of experiments already in the data-taking phase. The BaBar experiment at the PEP II collider at SLAC and the D0 and CDF experiments at the Tevatron accelerator, FermiLab, are at the core of UK Particle Physics programme until the start up of the LHC accelerator. These experiments already have substantial storage requirements and demand large CPU resources to meet not only their data reconstruction and analysis needs but also the simulation of their detectors’ behaviour. In addition, the H1 and ZEUS experiments and UK Dark Matter collaboration (UKDMC) also plan to take advantage of any Grid developments for their data storage and Monte Carlo needs. The requirements of these experiments are, in general terms, very similar to those required for the LHC experiments. The current physics exploitation of PPARC’s investment would benefit hugely from a Grid computing environment. The current UK programme would undoubtedly be enhanced by investment in distributed computing, but it is also an ideal testing ground for the future demands of the LHC era.

The experimental requirements will drive the decisions on the test sites and the integration process, operational policies and procedures that need to be planned within the technical WorkGroups. The Grid services to be installed and supported at the different locations and the time-scales for the middleware software need to match the early goals and aims of the experiments. The integration of the first release of the middleware into LHC experiment software has to be completed to allow Monte Carlo production and reconstruction, followed by data replication and analysis of the data. The priorities of those experiments already taking data will have the emphasis on the latter two goals i.e. data replication and analysis. It is essential that the evaluation of the first release of the Grid software by the experiments should be used in the development of the next release of the Grid software. As the project progresses the functionality of the Grid services will have to increase in order to match the more sophisticated needs of the experiments. By the end of the project, the Grid environment will be ready to be used in a routine way, including balancing the workload of physics analysis jobs with that of major detector simulation needs. The needs and requirements of the experiments will continue to develop up to (and beyond) the start of the LHC data-taking era and, of course, the Grid should evolve to meet those needs.

All the UK experiments envisage a multi-Tier hierarchical model for their computing, with a common (UK) national Tier-1 centre based at RAL. The LHC experiments have set themselves a number of milestones (the so-called Mock Data Challenges, MDCs). These consist of generating Monte Carlo simulated data of the detector output and passing it through the whole of the experimental software chain including the analysis of data. Each of these MDCs will be more demanding than the last, not only in the amount of processing power and data volumes to be handled but also in the demand for ever more sophisticated tools to increase the exploitation potential of the Grid. These sophisticated tools will be very much integrated in the experimental core software (an upper layer of applications software) that is hoped to have many common features across all the LHC experiments. It is crucial that the experiments perform these scalability tests to assess the capability of the Grid to meet the future performance requirements. Table 16 is based on the LHC Computing Review [1] and outlines, in very simplistic terms, the goals of the LHC experiments.

 

Table 16: The Goals of the LHC Experiments’ Data Challenges

Experiment

Size

Date

ALICE

5%*3PB

2002

10%

2003

25%

2004

50%

2005

ATLAS

100 GB (0.01%)

end 2001

10 TB (1%)

2002

100 TB (10%)

2003

CMS

50 TB (5%)

Dec 2002

200 TB (20%)

Dec 2004

LHCb

3*106 event bursts

2000-2001

6*106 event bursts

2002-2003

107 event bursts

2004-2005

BaBar, CDF and D0 are already taking data and therefore have a greater emphasis on ensuring that the data storage and access is Grid-enabled. The experiences gained in this area will be advantageous for the UK LHC programme. The development and implementation of optimal data transfer tools, beyond the standard tools, will be driven by the needs of accessing or transferring data from across the Atlantic. In particular, BaBar is involved already with major Monte Carlo production in the UK, distributed around institutes, with resources purchased through successful JREI and JIF bids. To bring the management and access of these resources through the Grid is a medium-term aim of BaBar UK.

The UK collaborations have assessed their needs over the next 3 years and have developed a series of objectives. The LHC experiments are aware that the UK resources and goals should recognise the need for the UK to play a major role in the MDCs, which includes a commitment to providing the core experimental software. The objectives are not, in general, independent of the needs of the rest of the experimental collaboration. It is essential that the UK effort be integrated within the common objectives of the experiments. Each year’s worth of deliverables outlined below depends on progress in the previous year. A prominent UK position would place the community at the forefront of the physics exploitation stage of the LHC and currently active experiments. It is vital this opportunity is not lost.

Year 1 Deliverables

 

 

Year 2 Deliverables

Year 3 Deliverables

    1. J: Dissemination - Astronomy Links
    2. The plans for collaboration within PPARC with the Astronomy Community are discussed in Section 8.2. The two disciplines have largely complementary applications for the Grid but local collaborations within institutes or regions are taking shape. The two fields will nevertheless benefit from access to common middleware tools and it is important that good channels of communication are established to avoid duplication.

      Given the contrasting emphasis in the two disciplines, the main links are expected to come through cross-members within the GridPP and AstroGrid committee structures and from the common oversight provided at the highest level by PPARC itself. Both communities welcome the presence of such cross-members and it is to be expected that these, coupled with the encoragement of dialogue within institutes, will provide the main mechanism for cross-fertilisation.

    3. Other WorkGroups

Dissemination (WorkGroup J) and CERN (WorkGroup K) are discussed extensively in other parts of this document (Sections 8.1, Section 8 and Appendix 16).

  1. APPENDIX DataGrid
    1. Description of the Workpackages
    2. The structure of the DataGrid work programme is as follows:

      1. Grid Middleware
      2. Each workpackage (WP) is a "mini-project" in itself.

        WP1 Grid Workload Management

        Deals with workload scheduling and has the goal of defining and implementing an architecture for distributed scheduling and resource management. This includes developing strategies for job decomposition and optimising the choice of execution location for tasks based on the availability of data, computation and network resources.

        WP2 Grid Data Management

        Deals with the management of the data, and has objectives of implementing and comparing different distributed data management approaches including caching, file replication and file migration. Such middleware is critical for the success of heterogeneous datagrids, since they rely on efficient, uniform and transparent access methods. Issues to be tackled include: the management of a universal name-space, efficient data transfer between sites, synchronisation of remote copies, wide-area data access/caching, interfacing to mass storage management systems (see below).

        WP3 Grid Monitoring Services

        Provides facilities for monitoring the status of both the Grid system and the applications running on it and also encompasses information services.

        WP4 Fabric Management

        The objective is to develop new automated system management techniques that will enable the deployment of very large computing fabrics constructed from mass market components with reduced system administration and operation costs.

        WP5 Mass Storage Management

        Provides both the framework for integration of mass storage systems into Grid environments as well as a uniform means of access to stored data.

      3. Grid Infrastructure
      4. WP6 Integration Testbed

        Production quality international infrastructure is central to the success of DataGrid. It is this workpackage that will collate all of the developments from the technological workpackages (WPs 1-5) and integrate into successive software releases. It will also gather and transmit all feedback from the end-to-end application experiments back to the developers, thus linking development, testing and user experiences.

        WP7 Network Services

        Oversees the provision to testbed and application workpackages of the necessary infrastructure to enable end-to-end application experiments to be undertaken on the forthcoming European Gigabit/s networks.

      5. Applications
      6. WP8 High Energy Physics Applications

        This brings in all four LHC collaborations providing, critical input on requirements. The experiments, whilst acknowledging that the testbed is a prototype, will still be doing serious production work with it.

        WP9 Earth Observation Science Application

        This is driven primarily by the need to handle data from the new ENVISAT satellite, due for launch in Summer 2001. It is seen as a pilot leading to significant further work.

        WP10 Biology Science Applications

        Will provide and operate end-to-end application experiments, which test and feedback their experiences through the testbed to the middleware development workpackages.

      7. Project Management

      WP11 Information Dissemination and Exploitation

      Will provide a key access point to the project for external bodies, in particular it will host an Industry and Research Forum coupled with the project. The development of a web-based project portal is also foreseen.

      WP12 Project Management

      Will ensure the active dissemination and results of the project and its professional management.

    3. UK Responsibilities and Deliverables

    The UK has leadership of the Monitoring Services and Mass Storage Integration workpackages and thus has ultimate responsibility for delivery in these areas (even though some of the work is carried out by partners outside the UK). The UK has also undertaken commitments to participate in most of the other workpackages to varying degrees ranging from low-level (requirements input and test/deploy products) to much more active involvement (with specific responsibilities to deliver to external co-ordinating partners). In some cases, the UK commitments are not fully developed as the role has not been completely established pending more complete understanding of the architecture and design phase and/or identification of personnel to fulfil low-level commitments. In these cases, UK deliverables will be developed in the first stages of the DataGrid project.

    Table 17 shows the identified UK deliverables by workpackage and the overall effort committed composed of the UK funded (14.8 FTE) and EU funded parts.

     

    Table 17: DataGrid workpackage deliverables.

    Workpackage

    Specific UK Deliverables

    UK funded

    (FTE)

    EU funded

    (FTE)

    WP1 Workload

    Requirements/test/deploy

    0.5

     

    WP2 Data Management

    Information Services and Query Optimisation

    1.5

     

    WP3 Monitoring Services

    Overall responsibility

    Information Services R&D

    1.8

    3.0

    WP4 Fabric Management

    Requirements/test/deploy

    0.5

     

    WP5 Mass Storage

    Overall responsibility

    1.5

    1.0

    WP6 Integration Testbed

    Full participation with multiple sites

    Security R&D

    3.0

     

    WP7 Network Services

    End-to-end monitoring

    Managed bandwidth studies

    2.0

     

    WP8 PP Applications

    Participate in data challenges

    4.0

    0.3

    WP9 EO Applications

    (none)

    0

     

    WP10 Bio Applications

    (none)

    0

     

    WP11 Dissemination

    Link UK and DataGrid dissemination

    0

     

    WP12 Management

    Provide DataGrid-UK management

    0

     

    Totals

     

    14.8

    4.3

     

  2. APPENDIX CERN Programme
    1. Development Teams and Programme of Work
    2. As explained in Section 8, the e-Science programme for LHC computing at CERN will be developed and deployed by the teams responsible for the long-term planning, development and operation of the computing services. The personnel funded from UK e-Science sources will work at CERN as members of these teams, taking part in the full range of their team’s work. The CERN activities are described below in terms of the responsibilities of the teams. For each team the target staffing level for 2002 is given. It should be noted that whilst in many instances the descriptions of the team activities appear to overlap with those of other WorkGroups, the work will be organised in such a way that the different WorkGroups will address complementary aspects of the problem. Along with GridPP and the EU DataGrid, we will seek to exploit collaborations with the Computing Science Community.

      For each team with UK funded staff, a specific set of deliverables will be defined reflecting the contribution of the project funding. Section 16.3 presents examples of these deliverables, with resource estimates and a reference to the high-level programme component. However, as explained in Section 8, the full programme of work of the LHC Computing Project will be funded from several different sources. The final deliverables for the UK project will be agreed as the staff are identified, taking account of their experience, the interests of the university concerned and the places available within the LHC Computing Project.

      Fabric Planning and Management (staff target 11)

      The fabric includes the hardware and basic software required for the operation of the local computing fabric - the processors, disks, tape drives and robots, the equipment providing the local area and storage networks, operating system installation and maintenance, file systems, storage area networks, gateways to external networking facilities, monitoring consoles, power distribution, etc.. The fabric management team has many years of experience of providing scalability by integrating basic computational and storage building blocks with a standard networking infrastructure. This has had the advantage of allowing the exploitation of low cost components. This must now be extended to handle the very large number of components (tens of thousands) that will be needed, while minimising the operational costs. This requires the development of management and maintenance facilities with a level of automation not available today. This is needed in areas such as system installation and maintenance, system partitioning, dynamic re-configuration, monitoring, fault isolation and repair, etc.. At present, this work is being carried out in the framework of the EU DataGrid project, in which CERN leads the fabric management workpackage. In addition to the development of the management tools described above, this involves the development of the middleware for the integration of computing fabrics within the Grid environment. The team also operates current services, including the evolving LHC testbed, in addition to developing the future management systems. The basic systems administration is performed by a contractor, the CERN team being responsible for planning, service management and development.

      Mass Storage Management (staff target 7)

      The team responsible for data and mass storage provides support for file services on secondary (disk) and tertiary (magnetic tape) storage. The disk file systems have been implemented using industrial standards and products. However, the scale and performance requirements of the mass storage system are not met even for current experiments by industrial products. As a consequence, PP has had to develop special purpose mass storage management systems. CERN has recently completed the initial implementation of a new system called Castor, a simplified mass storage system that fulfils the current requirements of PP in terms of capacity, file size and number, performance and reliability, with the possibility of extensions to satisfy future requirements. Further work will be required to meet the requirements of the LHC. Different Tier-1 centres may use different mass storage systems. Within the mass storage workpackage of the DataGrid project, the team (led by the UK) must participate in the definition of a standard application program interface and implement this for the CERN mass storage system. Also, exchange formats must be defined to facilitate the replication and migration of data between these centres.

      The computing environment is not only concerned with bulk scientific data, but provides an all-embracing framework – a complete computing environment enabling an effective collaboration between widely dispersed researchers. A major component of this is the distributed data service used for the storage and management of "conventional" data (including program sources, binaries, libraries, etc. – the data maintained in "home directories" and "group directories"). The team is also responsible for this developing this service, which is at present built on the Andrew File System (AFS) technology, no longer actively supported by the originator. Over the next few years, a new file service will have to be selected, deployed and integrated with the Grid, providing a range of features for supporting world-wide scientific collaboration. It is probable that this will involve collaboration with an industrial partner.

      Grid Data Management (staff target 7)

      CERN leads the DataGrid Data Management workpackage. This will provide the necessary middleware to permit the secure access of massive amounts of data in a universal global name-space, to move and replicate data at high speed from one geographical site to another, and to manage the synchronisation of remote data copies. Novel software will be developed such that strategies for automated wide-area data caching and distribution will adapt according to dynamic usage patterns. It will be necessary to develop a generic interface to the different mass storage management systems in use at different sites, in order to enable seamless and efficient integration of distributed storage resources. Several important performance and reliability issues associated with the use of tertiary storage will be addressed.

      LAN Management (staff target 6)

      Moving to a Grid-based solution for the whole computing environment inevitably places significantly greater demands on the local network infrastructure if the performance goals are to be achieved. The perceived performance will be determined more by the peak bandwidth between the various nodes rather than by the aggregate bandwidth. It is essential that the local infrastructure be appropriately developed and integrated with the overall wide-area Grid monitoring system.

      The campus management team is responsible for the planning and management of the internal CERN network. This is based on Ethernet technology with multiple layers of switches and routers. There are over 15,000 pieces of equipment connected to the network, requiring a significant expertise and investment in network monitoring techniques. The same base technology is used for the infrastructure network (desktop and office connectivity), the networking within experimental areas, the connection of experiments to the computing centre and the computing fabrics used for physics data processing.

      Wide-area Networking (staff target 7)

      The wide-area networking team is responsible for the provision of the WAN services that are required to interconnect the Tier-1 regional centres to CERN for the LHC. This will require careful network engineering and monitoring. Work is needed to establish protocols with the required performance and reliability characteristics. There are a number of on-going projects that are tackling the issue of file transfer performance on very high-speed long distance (high latency) networks, requiring better instrumentation of TCP/IP in order to study in detail the behaviour of the protocol. It will also be important to evaluate alternative protocols. Research in this area is likely to develop rapidly with the increasing availability of multi-gigabit connections, and it is important that CERN develops the expertise to participate actively in such work and apply the results rapidly to the production networking infrastructure. The team will work with the Tier-1 institutes and the organisations providing research network infrastructure in Europe and in the world to plan the LHC data network. This is likely to include evaluation of new techniques such as wavelength switching. A firewall environment with appropriate performance, accessibility and security characteristics must be developed, operating effectively in this very high bandwidth environment. The team is also responsible for the operation of the CERN Internet Exchange Point, the major IXP in the Geneva area, which has attracted a large number of ISPs and telecom operators to install fibres and gateways at CERN. It is expected that CERN will be a point of presence on the new GEANT research network infrastructure.

      Security (staff target 4)

      Keeping pace with Internet security is a growing problem and the distributed nature of the Grid adds a major new dimension, both in terms of the requirements for uniform authentication and authorisation mechanisms, but also in terms of the defence mechanisms that must be developed to protect against attack in this new and complex environment. The Grid imposes another major step forward in tackling the classic computer security dilemma – facilitating access while protecting against hackers. Sites hosting Grid resources must agree coherent security policies, and adequate protection developed to avoid their abuse. New tools and working methods will be required to detect efficiently and track down security breaches across site boundaries. In addition, security needs to be an integral part of all Grid applications. This activity includes the design and deployment in collaboration with other PP sites of a Public Key Infrastructure (PKI), including operation of certificate authorities. It is also expected that Privilege Management Infrastructure (PMI) technology for passing authorisation data to services will have to be deployed prior to the start-up of the LHC. In the shorter-term, ad-hoc solutions and early PMI implementations will have to be deployed. The team responsible for the overall computing and network security at CERN will carry out this activity.

      Internet Services for Inter-working (staff target 10)

      As noted above, the Grid environment involves not only the sharing of scientific data and computational resources. Another essential component is the inter-working environment to facilitate close-to-real-time collaboration between geographically separate researchers. For example, the support for two or more people working together on the same data plots, generated and re-generated in real-time using the distributed Grid resources. The basis for building such intimate collaborative environments, using portals, chat rooms and other techniques, must be acquired, developed and supported for the many thousands of users participating in the LHC experiments at CERN. This is the responsibility of the Internet Services team. The base technology of collaborative tools, video-conferencing, web services, email, etc. is not specific to Particle Physics, but it is important that the best tools are acquired and adapted for PP users.

      Databases (staff target 12)

      The database group provides services that are used across all areas of the lab’s work, from handling parameters of the particle accelerators to providing solutions to the data management needs of the High Energy Physics Community. CERN has been and continues to be a pioneer in the usage of database technologies, having adopted Relational Database technology in the early 1980’s to assist in the construction of LEP, and Object Database technology in the mid 1990’s as part of investigations into data management solutions for the LHC era. Much of this work is done in close collaboration with other institutes both within PP and beyond, such as with the Astronomy Community, who share many similar problems. Database technology is widely used in virtually every discipline and CERN offers an excellent environment in which to gain experience with these systems: experience that later can be equally well deployed both within the academic community – for example, to provide data management support in a regional centre for LHC computing or in industry.

      In particular, the database team provides support for a production data management solution, based on a combination of an Object Database Management System coupled to a Mass Storage System. Activities include the development and support of application-specific tools and class libraries, such as database import/export facilities and mechanisms to handle detector calibrations. The team assists the experiments in the application of database technology and in their choice of systems for the production phase of the LHC. Despite the early promise of Object Databases, it is felt today that the risks associated with dependence on a single product with a limited market acceptance (Objectivity/DB) are considered unacceptably high and alternative solutions are being evaluated. A project to investigate the costs of developing in PP an object database system concluded that this is not feasible in terms of the resources required. On the other hand, the established Relational Database vendors have been systematically adding support for Object features to their products to the extent that these are now felt to offer a viable alternative. The team is currently participating in the evaluation of such systems, which are felt by many destined to become the dominant database technology for the foreseeable future.

      Much of the new development required is concerned with adapting database technology to the scale of the LHC problem, including the integration with large-scale storage management, and with the distributed environment of the Grid. Most current work on Grids is only addressing access to conventional files or a client-server model for access to remote databases. For the LHC, the problem of replication database data across the Grid will have to be solved.

      It should be noted that databases are at the heart of any computing technology, required for the management, configuration, allocation, control, monitoring and report of all sorts of resources.

      Application Environment (staff target 5)

      This includes the provision of the basic environment for physics software development – general scientific libraries, class libraries for PP applications, software development tools, documentation tools, compiler expertise, etc.. A significant amount of support activity will be necessary to ensure that a common Grid-enabled environment is available at all Grid sites.

      Simulation Framework (staff target 7)

      The development of a modern, object-oriented toolkit for the simulation of physics processes and particle interaction with matter is well advanced, organised by the GEANT4 Collaboration. This represents an important component in the strategy of providing a common, advanced computing environment for all of the collaborations, and will facilitate the exploitation of resources at all levels of the LHC computing Grid. The simulation team in CERN/IT is responsible within the collaboration for the support of the basic framework, distribution, maintenance process and for a limited number of simulation algorithms.

      Analysis Framework (staff target 8)

      The analysis team in CERN/IT is developing a modular toolkit adapted to the needs of LHC data analysis – very large datasets, an object model, appropriate statistical algorithms and displays. Using formal software engineering techniques, this toolkit will be re-usable in many applications in physics and other disciplines requiring large dataset traversal. The toolkit will have to be developed to exploit the distributed Grid resources, implementing data-parallel techniques. A portal environment will be required to mask the complexity of the Grid from the researchers and encourage inter-working.

      e-Science - Physics (staff target 6)

      This activity involves the direct assistance to experiments at the interface between the core software and the Grid and fabric environment. Initially working in the context of the physics applications workpackage of the DataGrid, this team will develop expertise in the deployment of physics applications on successive generations of the testbed. Individual members of the team will work directly with one of the LHC experiments as part of their core software effort.

      e-Science - Other Sciences (staff target 6)

      These posts will be for dissemination activity to support other sciences, enabling them to benefit from the developments and experience in Grid and associated technologies carried out within the Particle Physics Community, in concert with the dissemination activity in the UK and elsewhere. This will take the form of a team of scientists and software professionals providing advice to other sciences and some degree of practical participation in deploying Grid technology within their computing environments. It would include education activities, distribution of PP or DataGrid-developed software tools, consultancy on setting up computing infrastructure and hands-on involvement in adapting applications to the distributed environment (paralleling the activity described under "e-Science – Physics").

      Bibliographic Applications with Grid Meta-data (staff target 4)

      Meta-data will be essential if people are to be able to exploit effectively the massive amount of bibliographic information which will be available within the LHC collaborations. In the area of meta-data, the CERN Library and bibliographic data support team are working with other PP libraries on how to use meta-data to better disseminate information, especially PP information, among the PP user community. The aim is to develop meta-data schema that will support better classification of the information in the CERN Library database. Until now, we have been content to use bibliographic information in library specific classification schemes. With the arrival of XML, we now see the opportunity to describe the information itself and not just the bibliographic information. If we can develop and agree a PP-wide XML schema, then we can use this structured data to facilitate selective information dissemination to the PP community. Another area of particular interest is the dissemination of selected information about the science emerging from the LHC programme to schools and the general public.

    3. Resources
    4. The resources requested are: personnel to join the CERN teams described above and material investments for the prototype Tier-0 system at CERN. In making the estimates, it is assumed that UK universities employ the personnel and second them to CERN for the period of the project. The average FTE cost assumes a Ph.D. person, aged about 30 years, with a UK university salary, and this cost includes an allowance to cover the additional costs of living at CERN, medical insurance, travel, etc.. It is assumed that the home university would charge a reduced overheads fee. CERN will not charge the overhead costs at CERN. Alternatively, the personnel could be employed by CERN as limited-duration staff members.

      The materials resources are processors and disks for the CERN testbed. The testbed is being constructed as an evolution of the CERN computing fabric: a component-based system with low-cost dual processor PCs mounted in racks and integrated into an Ethernet network. Disk storage uses a NAS model, constructed from PC-based file servers with low-cost disks, connected to the Ethernet infrastructure. Mass storage is provided through PC-based network connected tape servers, managing tape drives mounted in cartridge handling robots. There is considerable scope for exploring alternative processor and disk storage models, but the choice of magnetic tape equipment will be more limited. The removable nature of magnetic tape drives, the economies of scale associated with large robots and the inherent lack of reliability in this electro-mechanical technology require that a standard product be carefully selected through a lengthy acquisition process. On the other hand, the standardised "commodity" nature of PCs and disk systems allows products from different manufacturers to be included in the same computing fabric, and so the request for funding is restricted to these components. The estimated costs include the PCs, disk servers and the associated racking, power and Ethernet networking equipment. The equipment may be acquired in the UK, purchased directly by CERN or acquired through collaboration with a UK manufacturer. One such company, Elonex Ltd., with whom discussions are at present under way with a view to exploring possible collaboration, has won recent open acquisitions of PC and disk servers at CERN.

      The resource request is summarised in Table 18.

      Table 18: Resources requested by CERN.

    5. Milestones and Deliverables

    The development phase of the LHC computing service at CERN will be organised around annual prototype systems, synchronised with the releases of the EU DataGrid project, with a target of providing a series of stable environments with increasing functionality essentially at the beginning of each of the calendar years 2002, 2003 and 2004. This will lead at the end of 2004 to a "pilot service", incorporating the technology elements selected for implementation of the production LHC service and providing a final proof of concept prior to the acquisition of the production facility.

    When candidates are identified to participate in the work, specific deliverables will be defined related to the work of the team, taking account of the interests and background of the candidate and the interests of the university concerned. Table 19 provides sample deliverables of this type. These deliverables would be used for detailed monitoring of the project of the project and the utilisation of the UK funds. The column "Programme Component" refers to the high-level programme component of the UK GridPP project: 1-Foundation, 2-Production, 3-Middleware, 4-Exploitation, 5-Value-added Exploitation.

     

    Table 19: Sample deliverables of CERN Teams.

     

    Deliverable

    Year

    FTEs

    Component

    Cost k£

    FTEs by year

               

    2001/2002

    2002/2003

    2003/2004

     

    Fabrics

                 

    1

    Scalable fabric error and performance monitoring system

    2003

    1

    3

    66.7

    0.5

    0.5

     

    2

    Automated, scalable installation system

    2003

    2

    3

    133.3

    0.5

    1.5

     

    3

    Automated software maintenance system

    2003

    1

    3

    66.7

     

    1.0

     

    4

    Scalable, automated (re-)configuration system

    2003

    1

    3

    66.7

     

    1.0

     

    5

    Support prototypes

    2004

    3

    1

    200.0

    0.4

    1.0

    1.6

    6

    Automated, self-diagnosing and repair system

    2004

    2

    3

    133.3

     

    1.0

    1.0

     

    Mass Storage

                 

    7

    Extension of Castor for LHC capacity, performance

    2004

    5

    1

    333.3

     

    2.0

    3.0

    8

    Implement Grid-standard APIs, meta-data formats

    2003

    2

    3

    133.3

    0.4

    1.6

     

    9

    Support prototypes

    2004

    3

    1

    200.0

    0.2

    1.4

    1.4

     

    Grid Data Management

                 

    10

    Data replication and synchronisation

    2004

    3

    3

    200.0

    0.5

    1.0

    1.5

    11

    Performance and monitoring of wide area data transfer

    2004

    3

    3

    200.0

     

    1.5

    1.5

     

    LAN Management

                 

    12

    Fabric network management, and resilience

    2004

    2

    1

    133.3

     

    1.0

    1.0

    13

    Integration of LAN and Grid-level monitoring

    2003

    1

    3

    66.7

     

    1.0

     

    14

    LAN performance

    2004

    1

    2

    66.7

     

    0.2

    0.8

    15

    Support fabric prototypes

    2004

    2

    1

    133.3

    0.4

    0.6

    1.0

     

    Wide-area Networking

                 

    16

    High bandwidth WAN - file transfer/access performance

    2003

    3

    1

    200.0

    0.8

    2.2

     

    17

    WAN traffic instrumentation & monitoring

    2003

    2

    1

    133.3

    0.5

    1.5

     

    18

    High bandwidth firewall/defences

    2004

    2

    2

    133.3

       

    2.0

    19

    Lambda switching prototypes

    2004

    3

    5

    200.0

     

    1.0

    2.0

     

    Security

                 

    20

    Grid authentication – PKI

    2002

    1

    1

    66.7

    1.0

       

    21

    Authorisation infrastructure for Grid applications – PMI

    2003

    2

    1

    133.3

    0.5

    1.5

     

    22

    Security monitoring in a Grid environment

    2004

    2

    5

    133.3

     

    0.5

    1.5

     

    Inter-working Internet Services

                 

    23

    Base technology for collaborative tools

    2003

    2

    1

    133.3

    0.5

    1.5

     

    24

    Portal prototyping

    2004

    3

    5

    200.0

     

    1.0

    2.0

    25

    Support for Grid prototypes

    2004

    2

    1

    133.3

    .

    0.5

    1.5

     

    Databases

                 

    26

    Evaluation of emerging object relational technology

    2003

    2

    1

    133.3

    0.5

    1.5

     

    27

    Adaptation of databases to Grid replication and caching

    2004

    5

    3

    333.3

    0.5

    2.0

    2.5

    28

    Integration of & performance issues with mass storage management at different testbed sites

    2004

    3

    5

    200.0

     

    1.0

    2.0

     

    Application Environment

                 

    29

    Provision of basic physics environment for prototypes

    2003

    2

    4

    133.3

    0.5

    1.5

     

    30

    Support of Grid testbeds

    2004

    5

    4

    333.3

    0.3

    2.0

    2.7

     

    Simulation Framework

                 

    31

    Support of the simulation framework

    2004

    5

    5

    333.3

    0.5

    2.0

    2.5

    32

    Development of the simulation framework

    2004

    5

    5

    333.3

    0.5

    2.5

    2.0

     

    Analysis Framework

                 

    33

    Adaptation to and exploitation of Grid environment

    2004

    5

    5

    333.3

    0.5

    2.2

    2.3

    34

    Development of portal components

    2004

    3

    5

    200.0

     

    1.0

    2.0

    35

    Development of the base framework

    2003

    2

    5

    133.3

    0.5

    1.5

     
     

    e-Science - Physics

                 

    36

    Adaptation of physics core software to the Grid environment

    2003

    6

    4

    400.0

    2.0

    4.0

     

    37

    Exploitation of the Grid environment by physics applications

    2004

    6

    4

    400.0

     

    1.0

    5.0

    38

    Support for testbeds

    2004

    3

    4

    200.0

    1.0

    1.0

    1.0

     

    e-Science - Other Sciences

                 

    39

    Preparation of training courses, material

    2004

    4

    3

    266.7

    1.0

    1.0

    2.0

    40

    Adaptation of application – science A

    2004

    3

    3

    200.0

     

    1.0

    2.0

    41

    Adaptation of application – science B

    2004

    3

    3

    200.0

     

    1.0

    2.0

    42

    Middleware packaging for other sciences

    2004

    5

    5

    333.3

     

    3.0

    2.0

    43

    Middleware support for other sciences

    2004

    4

    5

    266.7

     

    2.0

    2.0

    44

    Bibliographic metadata

    2004

    4

    5

    266.7

    1.0

    2.0

    1.0

     

  3. APPENDIX Constitution of Management Boards
    1. The Project Management Board (PMB)
      1. Terms of Reference

  1. Ensure the delivery of a functional, stable Particle Physics Grid according to the specifications and time-scale in the proposal approved by PPARC.
  2. Supervise all aspects of the Particle Physics Grid and ensure efficient and effective use of resources to deliver it.
  3. Receive regular progress reports, through their chairs, from the Experiments, Technical and Dissemination Boards.
  4. Ensure that the Experiments, Technical and Dissemination Board operate effectively, are open to inputs from the community and provide an efficient forum in which technical and organisational issues can be discussed and recommendations made.
  5. Liase, through its Chair, with the Collaboration Board on matters of staff power and resources required to deliver point 1.
  6. Report, through its Chair, to the PPARC e-Science director and to the PPARC Steering Committee on progress.
  7. Report annually, through its Chair, to the strategy meeting of the Particle Physics Committee on progress in delivering the Particle Physics Grid, and to other PPARC bodies and Council as required by the PPARC Executive.
  8. Ensure effective integration of the EU DataGrid into the UK programme, taking into account the strategic directions of both projects.

      1. Membership of the PMB

The Project Leader is to be nominated by a sub-committee of the CB in consultation with the PPARC Director e-Science and the leader of the CERN Grid project. The appointment must then be ratified by the full Collaboration Board. It should be for an initial period of two years, with the possibility of re-appointment for subsequent terms. The CERN liaison person will be appointed by CERN in consultation with the Project Leader for a period of two years, with the possibility of re-appointment.

    1. Collaboration Board (CB)
      1. Terms of Reference

  1. Exercise a general supervision over all areas of the project.
  2. Take action, including in the last resort removal from post, should it become clear that anyone holding a position within the structures of the Particle Physics Grid, including the Project Leader, is failing to deliver to a sufficiently high standard.
  3. Liase, through its Chair, with the Project Management Board on matters of staff power and resources required to deliver the Particle Physics Grid.
  4. Ensure that the community is fully aware of progress and that the concerns and priorities of the community are transmitted to the PMB.

      1. Membership of the CB

The Chair of the CB is elected by the CB for an initial term of two years, which is renewable.

    1. The Technical Board (TB)
      1. Terms of Reference

  1. Implement the strategy of the Project Management Board on the construction of the Particle Physics Grid.
  2. Report to the Project Management Board, through its Chair, on progress within each working group and Tier centre.
  3. Ensure that problems and ideas with implications beyond individual working groups are discussed and solved in the most efficient manner.
  4. Provide a forum in which all constituent parts of the project can exchange information on technical issues of mutual interest.
  5. Ensure that resources deployed at the various Tier centres are efficiently and effectively used to deliver the Particle Physics Grid with the minimum of unnecessary duplication.
  6. Ensure that technical developments elsewhere, particularly at CERN, are efficiently integrated into the project.
  7. Ensure technical integration of the EU DataGrid into the UK programme, in particular, manage timely production of deliverables.

      1. Membership of the TB

The Chair and Deputy Chair of the TB will be appointed by the Project Leader for a period of two years and will be eligible for re-appointment. Working Group Chairs will be chosen by the Chair of the TB in consultation with the Project Leader. The cross-members will be chosen by the Experiments Board for a two-year term and will not be eligible for immediate re-appointment.

    1. The Experiments Board (EB)
      1. Terms of Reference

  1. Ensure that the requirements of the experiments that need to use the Particle Physics Grid are incorporated into the PMB strategy at the earliest stage.
  2. Report back to each experiment on changes and compromises to the requirements of individual experiments which are necessary to deliver an efficient and functional Grid to all, and to provide common solutions to similar problems.
  3. Through its Chair, report to the PMB on progress and initiatives in other countries collaborating in each experiment that may affect the strategy of the PMB.
  4. Ensure that developments at CERN are fully taken into account in the plans of the experiments.

      1. Membership of the EB

The Chair and Deputy Chair are elected by the Experiments Board from within its membership. The Chair and Deputy should be from different "experiments", and it would normally be expected that these positions would rotate among the experiments. The Chair would initially be elected for a term of one year. The Deputy Chair will succeed as Chair one year after election. After the initial one-year term for the Chair, necessary to establish the phase difference, all elections would be for a single two-year term. The other members serve for periods to be individually decided by the nominating "experiments". The cross-members will be chosen by the Technical Board for a two-year term and will be eligible for immediate re-appointment

    1. The Dissemination Board (DB)
      1. Terms of Reference

  1. Ensure that progress and technical solutions in the Particle Physics Grid project are disseminated as rapidly as possible to Grid-like projects in the Astronomy Community and to interested scientists in other research councils.
  2. Collect information on developments of interest to the Particle Physics Grid from other projects and by other scientists.
  3. Act as an interface to the requirements and interests of industry, and to make the PMB aware of possible changes in strategy required to maximise acceptance and penetration of Grid technologies in an industrial environment.
  4. Through its Chair, maintain good contacts with the strategic planning of the other research councils in Grid-like areas.

      1. Membership of the DB

The Deputy Chair is nominated by the Chair from within the membership for a term of two years, which is renewable. The cross-members are elected from within the membership of each of the other Boards for a period of two years, which is renewable, and from the other bodies according to their wishes. The membership of the committee can take final shape only once the structure associated with the Director e-Science has been set up. It is essential that this committee "reaches out" to all those with an interest in Grid computing, so that designated members from particular areas will act as enablers and conduits of information rather than passive representatives. This may involve the creation of small ad-hoc groupings in specific areas.

    1. Peer Review Selection Committee (PRSC)
      1. Terms of Reference

  1. Assess requests for e-Science funds for both staff and equipment using external referees, where appropriate.
  2. Recommend cases for funding to the PPARC Director e-Science.

      1. Membership of the PRSC

The Chair is chosen by PPARC in consultation with the Project Leader, the PPARC e-Science Director and the Chair of the CB. The Chair need not be a Particle Physicist and, depending upon the relationship with the PPARC e-Science structure, it may be appropriate to include other non-particle physicist members.

  1. APPENDIX Training
  2. For a number of years, CNAP (PPARC’s Computing and Network Advisory Panel) has been running a number of training courses in Object Oriented techniques for the UK PP Community. The main courses are a 4-day "Introduction to Object Oriented Analysis and Design" and a 3-day "Introduction to OO Programming using C++". Those who want to go further with C++ are offered a more advanced 4-day course. Courses can also be run on Java (3 days), C++ templates and the STL (1 day) and a 4-day course provided on program development with Objectivity (an Object Oriented Data Base). The courses are all provided by an independent consultant except that the Objectivity company provides training on their own product. At around £100/person/day, these courses are extremely cost effective and we believe it is necessary for these to be organised and funded centrally to ensure that the whole community and others can be trained together in modern computing methods relevant to PP programming, the Grid, industry and other scientific disciplines.

    A large fraction of the permanent members of the PP Community now have been through these courses and the feedback has been very encouraging. An increasing number of participants are now research students. The current CNAP training budget of just over £20k can support around 30 people per year which at the moment is insufficient to allow even all of the UK PP postgraduate students to attend. In future, this training will have to be funded from e-Science money.

    We would like to expand our training programme to all PP postgraduate students, to make them available to other disciplines and to offer additional Grid specific courses, such as use of the Globus Toolkit. For this, we estimate a training budget of at least £50k/year is required.

  3. APPENDIX Letters of Support from the US
    1. Letter from PPDG

 

STANFORD UNIVERSITY

 

 

STANFORD LINEAR ACCELERATOR CENTER

P. O. Box 4349

Mail Stop 97

Stanford, California 94309

Phone: (650) 926-2467

FAX : (650) 926-3329

April 19, 2001

Dr S.L. Lloyd

Department of Physics

Queen Mary, University of London

Mile End Road

London E1 4NS

United Kingdom

 

Dear Steve,

Re: UK Consortium Proposal for a Grid for Particle Physics (GridPP)

On behalf of the Particle Physics Data Grid Collaboration (PPDG) I would like to express my strong support for the UK proposal for a Grid for Particle Physics. The goals of PPDG and GridPP are closely aligned, both at the level of Grid services to be offered and at the level of the physics experiments that will benefit.

The PPDG project brings together high-energy and nuclear physicists, computer scientists directly involved in HENP support, and computer scientists at the center of Grid middleware development in the US. The particular focus of PPDG is to drive Grid development at both the architectural and middleware levels through a coordinated series of sub-projects delivering functional Grid services to HENP experiments. The most important measure of PPDG productivity will be the increased effectiveness of distributed physics analysis for experiments such as BaBar, D0, ATLAS, and CMS. The intercontinental scope of all experiments collaborating in PPDG means that sub-projects involving UK and other European partners are a key part of our plan.

PPDG physicists and computer scientist have jointly identified major work areas in:

In all of these areas coordinated work together with partners outside PPDG will enhance the quality and usefulness of the products and greatly improve the support of particle physics.

One of our major areas of common interest is the BaBar experiment, which is a prime focus for Grid development in the next few years as the quantity of BaBar data rises much faster than "Moore’s Law". The BaBar computing model is founded on "Tier-A" centers that will be partners with SLAC in offering data analysis facilities to the whole collaboration. The Tier-A center to be situated at CLRC/RAL in the UK will be a major force in BaBar data analysis. Direct collaboration already exists in the development of middleware to enable this as an early Grid prototype.

The PPDG also has strong commitments to D0 and welcomes the GridPP groups in the UK who are already active in a program to develop D0 facilities involving SAM and Condor. Miron Livny, creator of the Condor system is the lead PI for the PPDG computer science team. Miron has already begun collaboration with UCL on Condor-G development that is complementary to work within PPDG.

The PPDG program includes demonstration of high performance file replication using differentiated network services where available. This matches closely the work outlined in the GridPP proposal and forms a clear basis for collaboration. Joint work has already started in this area also through an Internet-2 QoS project between the UK and SLAC.

The long terms goals of PPDG and GridPP are to demonstrate the use of the Grid to handle the unprecedented data rates that will be produced by the LHC program. In this respect the very close ties between PPDG, GridPP and the EU-DataGrid will be invaluable to attaining our common goals.

The grand vision of an international Particle Physics Grid, exploiting the network and hardware fabric to the full, is a challenge for the combined resources of PPDG, GridPP, EU-DataGrid and GriPhyN. The PPDG leadership is fully committed to join with you and our colleagues in EU-DataGrid and GriPhyN in coordinating the creation of an unprecedentedly productive environment for international science.

Sincerely,

 

Richard P. Mount

Director, SLAC Computing Services
Assistant Director, SLAC Research Division

    1. Letter from GriPhyN

April 1, 2001

Dr. Steve Lloyd

Collaboration Board Chairman

Re: Consortium Proposal for a Grid for Particle Physics (GridPP)

Dear Steve

On behalf of the GriPhyN project we would like to give our strongest possible support to the program of work proposed in the GridPP project.

As you are aware, GriPhyN is a five-year project funded by the US National Science Foundation to develop "Petascale Virtual-Data Grids" (PVDGs) for applications in which geographically distributed collaborations must apply immense computing resources to petabyte-scale data collections, all of which are themselves distributed for technical and political reasons. An integral part of our mission addresses Tier-2 computing requirements of the LHC program since in the US these sites will reside primarily at Universities and it is expected that the totality of Tier-2 sites will have approximately the same amount of computing resources as in all the national Tier-1 facilities.

GriPhyN is committed to developing Grid infrastructure for the four frontier experiments (ATLAS, CMS, LIGO and SDSS) that participate in our effort, and our project is actively engaged with the computing leaders of these experiments to develop software and other tools on a schedule that matches their requirements. Our main research areas include virtual data technologies, policy-driven planning and scheduling, execution management, performance monitoring, and management of resources on global scales. Ultimately, we will produce a Virtual Data Toolkit (VDT) that will distil our research effort into a suite of computational tools of direct utility to application scientists.

The Grid is by its nature a worldwide activity whereby all participants benefit from international coordination and collaboration. GriPhyN wholeheartedly welcomes this opportunity to collaborate with the UK GridPP project to further this vision. We are aware that GridPP activities also form a major and integral part of the EU-DataGrid project and it will benefit all of our activities to have this close link between groups involved centrally in the Grid in both the USA and Europe.

We have already begun discussions with a view to collaborative work in specific areas. For example:

In summary, the GriPhyN project looks forward to exploiting its close links with the UK GridPP project, and we give you our full support.

Sincerely,

Paul Avery Ian Foster

University of Florida University of Chicago