School on Open Science Cloud

In the recent years progress towards next generation computing platforms has moved from stand-alone parallel infrastructures to distributed cloud platforms. Keeping up with all state-of-the-art developments in this field might be a tough and time-demanding task, however, knowing these trends is essential for scientists working in related areas. Thus, an idea of making a school for young researchers to introduce the scientific and technological innovation that has led to the development of the cloud computing has been suggested and implemented. The result was a School on Open Science Cloud, held on 05.06.2017 – 09.06.2017 at the University of Perugia (organized in collaboration with National Institute for Nuclear Physics – INFN), which provided basic competences and training on applications for molecular and quantum dynamics research and education through lectures delivered by experts accompanied by hands-on sessions.

The organizing committee of the school consisted of the following researchers:

  • Antonio Laganà – Università degli Studi di Perugia
  • Daniele Spiga – INFN
  • Leonardo Pacifici – Università degli Studi di Perugia
  • Livio Fanò – Università degli Studi di Perugia
  • Noelia Faginas-Lago – Università degli Studi di Perugia
  • Mirko Mariotti – Università degli Studi di Perugia
  • Giuseppe Vitillaro – CNR-ISTM

Among the topics discussed were:

  • Introduction to Scientific Computing
  • Cloud and workflows
  • Molecular Sciences (applications)
  • Statistical Methods (applications)
  • Data Analysis and Processing
  • Overview of educational tools at university.

The school was also one of the 10 mandatory activities for ITN researchers.

The school took place in Perugia, Italy. Lifted by a hill above a valley covered with various fields, where the River Tiber runs soft and clear, Perugia is Umbria’s small, but immediately admirable capital. Its centro storico (a historic centre) rises in a helter-skelter of cobbled alleys, surrounded by stairways and piazzas framed by splendid mansions. Back in the 21st century, Perugia is a happiness-seeking and party-loving university city, with students from around the world enjoying the nightlife and filling cafe terraces. The venue for the school was chosen to be the Department of Physics and Geology, located conveniently close to the city centre.

Day 1. Introduction to Scientific Computing. The session started with an overview of scientific computing historical backgrounds and prospective views, the idea of Open Science Cloud (OSC) was presented. The main steps in the evolution of Scientific Computing in the last 40 years were reviewed, from its origin until today. From one side, mainframes have evolved as High Performance Computing (HPC) systems or Supercomputers, designed to solve at best the challenge of the execution of tightly coupled parallel jobs. On the other side, the High Throughput Computing (HTC) has exploited at best the possibility to execute a large number of loosely coupled tasks in different computing resources and has required the development of new architectures and computing models.

Most attention was dedicated to HTC, where INFN has played for 20 years a key pioneer role in Europe. In addition to this, perspectives of further development, challenges of funding and a European model for related issues were introduced. Challenges for data analysis in search of rare events (particularly with regards to the Large Hadron Collider – LHC) were discussed.

Day 2. Cloud and workflows. Generally speaking, “Cloud” is a set of technologies, that both private and public sectors use to self-provision IT resources since quite some time. The providers of these resources, used to handle data, applications, databases and etc, can also be private or public. The benefits of the Cloud as an efficient and unifying method to provision resources are indeed what drives its success.

The sessions introduced the definition of Cloud to the participants and showed the potentiality and limits of Cloud computing and storage for scientific endeavours in several fields. Some details of how Clouds can be concretely implemented, taking the popular OpenStack framework as an example, were given. In addition to this, it was briefly shown what were the perspectives of Clouds for science in Europe highlighting the European Open Science Cloud. Furthermore, challenges faced by computational chemistry community were discussed and possible ways of tackling them with the help of Distributed Computing were established:

– representing simulations as workflows enabling hardware virtualization, software containerization and access to data resources;

– optimizing data processing across heterogeneous DCIs putting emphasis on dynamic scalability of computational resources;

– managing the whole data life-cycle from primary experimental data to annotated scientific data.

Day 3. Applications I: Molecular Sciences. Molecular dynamics describes the time evolution of molecular systems using classical mechanics. The solution offered by MD is inevitably numerical in nature. The lectures in this session aimed to provide a profound understanding of the underlying concepts of MD and to answer possible questions the participants might have had regarding the technique in question.

A following session focused on performing quantum dynamics calculations on distributed architectures. Accordingly, methods for selecting the most suitable set of geometries, performing electronic-structure calculations, fitting/interpolating the obtained set of energies, running the dynamics were reviewed and organized in a workflow which later was the subject of the hands-on session. During the practical exercises participant were able to master the skills necessary to start working with distributed architectures.

Day 4. Applications II: Statistical Methods, Data Analysis and Processing. Unfortunately, statistics is not a typical part of the knowledge base of all scientists. To cope with this issue a few typical problems encountered in physics analysis (computing averages, performing fits, extracting confidence intervals) and the correct and incorrect ways to approach them were examined during one of the sessions. In addition to this, a few practical examples were considered to provide concrete treatments. The second talk introduced deep learning, showing some applications where deep architectures have been successfully used for designing intelligent systems that learn from large scale datasets. An overview of different neural-based models, ranging from basic feed-forward neural networks to convolutional neural networks and recurrent neural networks was presented. Finally, it was shown how these models can be trained, discussing traditional algorithms for optimization (e.g., backpropagation) as well as more recent technical innovations (e.g., ReLu activation functions, dropout). The lecture was complemented by a hands-on session using the open-source software library Tensorflow.

Day 5. Educational tools. The last sessions of the school were devoted to educational tools developed at University of Perugia and used widely within the academic community: LibreEOL and GLOREP. The former one is an innovative online assessment system based on HTML5, CSS2/3, PHP5 and Ajax. The hand-on session allowed the participants to create their own schedules, virtual exams and other exercises. A few examples were demonstrated to show the possibilities within those systems.

Dmytro Ivashchenko

Download this article as a pdf file!

Leave a Reply

Your email address will not be published. Required fields are marked *