The sheer growth in the workloads of deep learning has placed new and unprecedented demands on scalable and efficient computational infrastructure. Cloud systems have become the first providers of large-scale distributed training through elastic resources, purpose-built accelerators, and operated machine learning services. This study investigates the use of cloud-native architectures such as Kubernetes, TensorFlow on Kubernetes, and Apache Spark MLlib as the means to deploy distributed deep learning applications that could handle the performance, elasticity, and cost-effectiveness challenges. It discusses the importance of GPUs and new TPUs in training faster, analyzes the performance of auto-scaling and orchestration policies, and outlines the trade-offs between cloud providers. Additionally, the paper also names bottlenecks like the cost of data transfer, inefficiencies in schedules, and vendor lock-in, as well as provides commentary on the early trends of serverless ML and hybrid deployments. The results show that solutions based on the cloud are essential in addressing the gap in the computation requirements and the real-world application of deep learning on a scale and making the cloud infrastructure the basis of the upcoming AI.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ankita Mohapatra
Nikhil Sehgal
Building similarity graph...
Analyzing shared references across papers
Loading...
Mohapatra et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68d4768331b076d99fa6efb2 — DOI: https://doi.org/10.21590/ijtmh.04.02.03
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: