INTERNS - Tanguy Raynaud

Tanguy Raynaud

(M.Sc. Intern — February–July 2014)

Abstract: Distributed architecture is widely used for storing and processing Big Data. Operations on Big Data need first,locating the required data blocks and then, read them. Reading data from secondary storage to process Big Data jobs is not an ideal approach especially for high performance applications. Because, the processors cannot access data faster if they are stored in secondary devices. In addition, fetching data from main memory is time consuming due to limited I/O bandwidth. Therefore, to optimize the application performance, it is not sufficient to have efficient algorithms only, an efficient architecture is needed to provide faster data access to the processors. The need for such an architecture has been a research issue for a long-time, however, the state-of-the-art is still missing one. This paper develops a promising architecture which caches data in main memory. It essentially transforms a main memory into a attraction memory which enables high-speed data access. Also, it enables automatic migration of data blocks and computations across the nodes contained in the clusters. It offers an exchange protocol for fast transfer of data blocks between the different physical nodes and speeds up job processing. The proposed architecture combines the power of Cache-Only Memory Architectures (COMAs) and the structural principle of Hadoop.