Key points are not available for this paper at this time.
Data partitioning plays an important role in distributed systems to elevate performance of the applications. Big data applications require the augmented metrics of performance like responsiveness, availability, scalability and throughput. Data partitioning across multiple nodes improves the performance of the application with respect to its scalability and throughput. Although there is a rich literature on the data partitioning algorithms in distributed systems for big data applications, there is a need to classify as well as regroup the algorithms based on their strategies. The paper presents an exhaustive classification of data partitioning algorithms based on their strategies as well as operational units. A survey of this kind gives an insight to the user about not only a comparative analysis of performance of approaches but also suggests suitability of these approaches for candidate big data applications.
Phansalkar et al. (Fri,) studied this question.