Francesco Pace, Ph.D.

Cloud and Distributed Systems

Detail-oriented Ph.D. in Cloud and Distributed Systems with 3+ years experience in scheduling algorithms, systems aspects related to the design and implementation of distributed, large-scale computing programs, and original approaches involving non-trivial machine learning algorithms for time-series forecasting.

Also 3+ years experience as leading Teaching Assistant for two data science related courses: one on cloud computing and distributed systems, the other on algorithmic machine learning.

Currently developing architect and governance skills at Amadeus as part of the Architecture Strategy and Governance team.

Publications

Data-Driven Resource Shaping for Compute Clusters.

SoCC 2018
October 2018
California, USA
Conference Poster

Pace, Francesco; Milios, Dimitrios; Carra, Damiano; Venzano, Daniele; Michiardi, Pietro

Nowadays, data-centers are largely under-utilized because resource allocation is based on reservation mechanisms which ignore actual resource utilization. Indeed, it is common to reserve resources for peak demand, which may occur only for a small portion of the application life time. As a consequence, cluster resources often go under-utilized. In this work, we propose a mechanism that improves cluster utilization, thus decreasing the average turnaround time, while preventing application failures due to contention in accessing finite resources such as RAM. Our approach monitors resource utilization and employs a data-driven approach to resource demand forecasting, featuring quantification of uncertainty in the predictions. Using demand forecast and its confidence, our mechanism modulates cluster resources assigned to running applications, and reduces the turnaround time by more than one order of magnitude while keeping application failures under control. Thus, tenants enjoy a responsive system and providers benefit from an efficient cluster utilization.

Details

Stocator: Providing High Performance and Fault Tolerance for Apache Spark over Object Storage

CCGrid 2018
May 2018
Washington DC, USA
Conference Paper

Vernik, Gil; Factor, Michael; Kolodner, Elliot; Ofer, Effi; Pace, Francesco; Michiardi, Pietro

Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.

Details

Stocator: an object store aware connector for Apache Spark

SoCC 2017
September 2017
Santa Clara, USA
Conference Poster

Vernik, Gil; Factor, Michael; Kolodner, Elliot; Ofer, Effi; Pace, Francesco; Michiardi, Pietro

Details

Stocator: A high performance object store connector for Spark

SYSTOR 2017
May 2017
Haifa, Israel
Conference Poster

Vernik, Gil; Factor, Michael; Kolodner, Elliot; Ofer, Effi; Pace, Francesco; Michiardi, Pietro

Details

Flexible scheduling of distributed analytic applications

CCGrid 2017
May 2017
Madrid, Spain
Conference Paper

Pace, Francesco; Venzano, Daniele; Carra, Damiano; Michiardi, Pietro

This work addresses the problem of scheduling user-defined analytic applications, which we define as high-level compositions of frameworks, their components, and the logic necessary to carry out work. The key idea in our application definition, is to distinguish classes of components, including rigid and elastic types: the first being required for an application to make progress, the latter contributing to reduced execution times. We show that the problem of scheduling such applications poses new challenges, which existing approaches address inefficiently. Thus, we present the design and evaluation of a novel, flexible heuristic to schedule analytic applications, that aims at high system responsiveness, by allocating resources efficiently. Our algorithm is evaluated using trace-driven simulations, with largescale real system traces: our flexible scheduler outperforms a baseline approach across a variety of metrics, including application turnaround times, and resource allocation efficiency. We also pre sent the design and evaluation of a full-fledged system, which we have called Zoe, that incorporates the ideas presented in this paper, and report concrete improvements in terms of efficiency and performance, with respect to prior generations of our system.

Details

Too big to eat: Boosting analytics data ingestion from object stores with Scoop

ICDE 2017
April 2017
San Diego, USA
Conference Paper/Poster

Moatti, Yosef; Rom, Eran; Gracia-Tinedo, Raul; Naor, Dalit; Chen, Doron; Sampe, Josep; Sanchez-Artigas, Marc; Garcia-Lopez, Pedro; Gluszak, Filip; Deschdt, Eric; Pace, Francesco; Venzano, Daniele; Michiardi, Pietro

Extracting value from data stored in object stores, such as OpenStack Swift and Amazon S3, can be problematic in common scenarios where analytics frameworks and object stores run in physically disaggregated clusters. One of the main problems is that analytics frameworks must ingest large amounts of data from the object store prior to the actual computation; this incurs a significant resources and performance overhead. To overcome this problem, we present Scoop. Scoop enables analytics frameworks to benefit from the computational resources of object stores to optimize the execution of analytics jobs. Scoop achieves this by enabling the addition of ETL-type actions to the data upload path and by offloading querying functions to the object store through a rich and extensible active object storage layer. As a proof-of-concept, Scoop enables Apache Spark SQL selections and projections to be executed close to the data in OpenStack Swift for accelerating analytics workloads of a smart energy grid company (GridPocket). Our experiments in a 63-machine cluster with real IoT data and SQL queries from GridPocket show that Scoop exhibits query execution times up to 30x faster than the traditional "ingest-then-compute" approach.

Details

Experimental performance evaluation of cloud-based analytics-as-a-service

CLOUD 2016
June 2016
San Francisco, USA
Conference Paper

Pace, Francesco; Milanesio, Marco; Venzano, Daniele; Carra, Damiano; Michiardi, Pietro

An increasing number of Analytics-as-a-Service (AaaS) solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion and processing pipelines. This work is a first attempt at an experimental evaluation of analytic application performance executed using a wide range of storage service configurations. We present an intuitive notion of data locality, that we use as a proxy to rank different service compositions in terms of expected performance. Through an empirical analysis, we dissect the performance achieved by analytic workloads and unveil problems due to the impedance mismatch that arise in some configurations. Our work paves the way to a better understanding of modern cloud-based analytic services and their performance, both for its end-users and their providers.

Details

Certifications

Projects

IOStack

H2020 European Project
2015 - 2018
Research and Eurecom's task management

Zoe-Analytics

Eurecom
2015 - now
Research of novel scheduling algorithms

Education

Telecom ParisTech - EURECOM

Ph.D. in Cloud and Distribute Systems
2015 - 2018

Politecnico of Turin

MSc Computer Engineering - Networking
2012 - 2014

Politecnico of Turin

BSc Computer Engineering
2009 - 2012

Hobbies