Tanguy Raynaud
(M.Sc. Intern — February–July 2014)
CEDAR Technical Report 8:
-
Abstract: Distributed architecture is widely used for storing
and processing Big Data. Operations on Big Data need first,locating
the required data blocks and then, read them. Reading data from
secondary storage to process Big Data jobs is not an ideal approach
especially for high performance applications. Because, the
processors cannot access data faster if they are stored in secondary
devices. In addition, fetching data from main memory is time
consuming due to limited I/O bandwidth. Therefore, to optimize the
application performance, it is not sufficient to have efficient
algorithms only, an efficient architecture is needed to provide
faster data access to the processors. The need for such an
architecture has been a research issue for a long-time, however, the
state-of-the-art is still missing one. This paper develops a
promising architecture which caches data in main memory. It
essentially transforms a main memory into a attraction memory which
enables high-speed data access. Also, it enables automatic migration
of data blocks and computations across the nodes contained in the
clusters. It offers an exchange protocol for fast transfer of data
blocks between the different physical nodes and speeds up job
processing. The proposed architecture combines the power of
Cache-Only Memory Architectures (COMAs) and the structural principle
of Hadoop.