Core Vector Machines: Fast SVM Training on Very Large Data Sets

Key Points

Key points are not available for this paper at this time.

Abstract

Standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such "approximateness" in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium-4 PC.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ivor W. Tsang

Agency for Science, Technology and Research

James T. Kwok

Hong Kong University of Science and Technology

Pak-Ming Cheung

Tuen Mun Hospital

Journals

Journal of Machine Learning Research

Actions

Institutions

Hong Kong University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Tsang et al. (Thu,) studied this question.

synapsesocial.com/papers/6a11d0a7cc504890b2563b93 — DOI: https://doi.org/10.5555/1046920.1058114

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Very Large SVM Training using Core Vector Machines.· 2005 · 30 citations
The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines· 1998 · 246 citations
Computers and Intractability: A Guide to the Theory of NP-Completeness· 1979 · 44,605 citations
Breaking SVM Complexity with Cross-Training· 2004 · 62 citations
Statistical Learning Theory· 1999 · 26,957 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Very Large SVM Training using Core Vector Machines.· 2005 · 30 citations
The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines· 1998 · 246 citations
Computers and Intractability: A Guide to the Theory of NP-Completeness· 1979 · 44,605 citations
Breaking SVM Complexity with Cross-Training· 2004 · 62 citations
Statistical Learning Theory· 1999 · 26,957 citations

Core Vector Machines: Fast SVM Training on Very Large Data Sets

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider