In the second of the series, Jerry Horgan, TSSG Infrastructure Manager, WIT, discusses the case for national research data centres.
Most HEIs have been outsourcing their productivity services to cloud providers and hence havedecommissionedalargeportionoftheiron- premise Data Centres for cost savings and efficiency gains. Whilst this is very beneficial for standard services it does create a problem when it comes to research requirements. This drive to outsourcing rightly meets a lot of resistance when it comes to data, especially sensitive and personal data. Private Industry is not trusted to operate in good faith with this data, and indeed their scale can allow them to suppress innovation when it comes to public data-sets by leveraging their resources on them.
Research Data Centres are required to host this data, in a secure but accessible fashion, with the accompanying Research Infrastructures (RIs) necessary to generate or process that data. They need to be flexible enough for very high-density HPC (Super Computing) requirements and medium density virtualisation (cloud) and storage needs. They need to have the potential for physical segregation; to provide additional security, environmental conditions, or to provide space for research in new paradigms – such as wireless only Data Centres . And finally, they need to be large enough to have the capacity to host multiple HPC clusters (or other high resource demanding systems) to allow for resilience, smooth service migrations and opportunistic deployments.
Our own research data centre in WIT is an example of this, where we run EU funded research projects on the operations and management of the data centre, whilst also hosting national research testbeds and RIs including ICHEC’s HPC cluster ‘Kay’. Additionally, we are seeing the need to obtain internationally recognised certification for security and privacy, especially around health informatics, to engage with larger research projects.
The issue being that research Data Centres also need to be of sufficient scale to be viable and not a millstone around the host HEI. We need to consider national scale facilities, that are publicly run for public research needs. HEIs have great connectivity via HEAnet and so can be service providers to each other by co-locating facilities with the required human expertise, technical infrastructure, operational standards and best practices.
Ireland’s investment in RIs, over the past decade in particular, is not sufficient, and whilst some funding for systems may be increasing the actual systems are getting comparatively weaker based on international standards. For example, ICHEC’s 2008 HPC cluster – ‘Stokes’ cost ~€1.9M and was ranked 118th in the world, funding since increased to €4.1M and €5.4M for subsequent systems and yet relative international performance rankings have slipped to 358th and ~815th respectively. Compare this with the £100M investment in the UK MET Office’s latest HPC cluster that was ranked 20th in the world. This highlights our slide in international competitiveness. To follow on with a HPC example again, I was at a talk in 2016, the speaker was from Los Alamos National Laboratory who was describing the 20 HPC clusters that they had there (Ireland has only 1) and how they operate them. He gave an example of 7 HPC clusters, where 5 were always active, 1 was down for maintenance and 1 was in testing/pre-production.
There is a lack of research Data Centre capacity in Ireland. Recently when ICHEC were changing their HPC Clusters (Fionn being replaced by Kay) they had to suspend the national service for 2 months whilst they removed Fionn and then deployed Kay back into that same space. Ideally, ICHEC should have been in a position to run both clusters side by side until Kay was fully ready to take over.
This lack of capacity is also affecting our ability to compete in Europe. We have a very low rate of leadership roles in European Research Infrastructure Consortiums (ERICs). These European infrastructures are growing, such as the €1B EuroHPC exascale joint undertaking or the €600M European Open Science Cloud (EOSC), and Ireland’s ability to participate and contribute resources to them is very limited. As ERICs are based on national participation it again makes sense to have national research facilities.
Data is growing at an exponential rate, estimated at 61% per annum , effectively doubling every 18 months. This is the age of Big-Data. Funding rounds now need to be annual to keep up with both the growth in data and the advancements in technology to maintain efficient RI deployments. Additionally, Data Centre network speeds are getting very fast in comparison with Internet speeds. 100 Gigabit networks are now common with 400 Gigabit networks on the horizon. In our research Data Centre we can achieve up to 5.6 Terabits per server cabinet with a total network capacity of 20 Terabits per second. Due to the potentially very large data-sets that would be housed in an EOSC or research data centre, data mobility could be very costly or slow (taking weeks or months), therefore it would make sense to co-locate processing (HPC) and storage (EOSC) nodes where possible in national research Data Centres.
What we need is fit-for-purpose funding schemes, especially around long-term facilities such as research Data Centres. These facilities last so long that the initial upfront capital costs cannot cover their lifetime. They should be considered national facilities, rather than simply belonging to the host HEI and thus should have a dedicated funding line available for researchers or facility providers to bid on. It is also very difficult to bid for large RIs, when there is nowhere suitable to host them or the available facilities are too impractical to host them in a cost effective manner.
The funding for RIs to date has relied on a high level of Industry support, especially around access charges to support operation and maintenance costs. If these costs cannot be met then the RI enters a degraded state and eventually becomes unusable. Additionally, this focus on Industry promotes a more applied style of research which naturally reduces research capability in FET and ERC areas. In particular, we are witnessing scientific research that leads to major breakthroughs would have close alignment to HPC facilities, by making use of novel algorithms and techniques developed in Machine Learning and Artificial Intelligence to crunch large quantities of data.
A great example is the Human Brain Project , where one of its aims is to unify all the neuroscience data that will allow us to understand the functions of neuronal networks within the brain, also known as the Brain Simulation Platform. Therefore, if Ireland is to target major EU FET and ERC projects at all levels, a national research Data Centre infrastructure is a major requirement to pave the way for researchers to embark on novel research fields and topics of the future.
This Industry focused funding model is not the norm in Europe and in the long-term will have a detrimental effect on Ireland’s fundamental research capability. The government needs to hit its investment target of 2.5% GERD of GNP by 2020 . This should include the upfront capital funding of national facilities, as well as their operating costs; such as technical staffing, research community outreach for training and onboarding, and additionally upgrade and refurbishment costs. By my calculations that is an additional €600M per annum.
With thanks to Jerry Horgan, TSSG Infrastructure Manager, WIT, for authoring this post. Opinions expressed are the authors own.
Written by Jerry Horgan for the Royal Irish Academy, October 2019