SVM – Support Vector Machine Tutorial for Beginners

1. SVM Tutorial – Objective

In Support Vector Machine tutorial, we are going to deeply understand what is SVM? We will also discuss SVM algorithm on the basis of the separable and nonseparable case, linear SVM and SVM advantages & disadvantages in detail.

In machine learningsupport vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

So, let’s start SVM Tutorial.

SVM - Support Vector Machine Tutorial for Beginners

SVM – Support Vector Machine Tutorial for Beginners

2. SVM Introduction

SVM stands for Support Vector Machine. It is a machine learning approach used for classification and regression analysis. It depends on supervised learning models and trained by learning algorithms. They analyze the large amount of data to identify patterns from them.

An SVM generates parallel partitions by generating two parallel lines. For each category of data in a high-dimensional space and uses almost all attributes. It separates the space in a single pass to generate flat and linear partitions. Divide the 2 categories by a clear gap that should be as wide as possible. Do this partitioning by a plane called hyperplane.

An SVM creates hyperplanes that have the largest margin in a high-dimensional space to separate given data into classes. The margin between the 2 classes represents the longest distance between closest data points of those classes.

The larger the margin, the lower is the generalization error of the classifier.

Have a look at Kernel Functions

After training map the new data to the same space to predict which category they belong to. Categorize the new data into different partitions and achieve it by training data.

Of all the available classifiers, SVM provides the largest flexibility.

SVMs are like probabilistic approaches but do not consider dependencies among attributes.

3. SVM Algorithm

To understand the algorithm of SVM, consider two cases:

  • Separable case – Infinite boundaries are possible to separate the data into two classes.
  • Non Separable case – Two classes are not separated but overlap with each other.

3.1. The Separable Case

In the separable case, infinite boundaries are possible. The boundary that gives the largest distance to the nearest observation is called the optimal hyperplane. The optimal hyperplane ensures the fit and robustness of the model. To find the optimal hyperplane, use the following equation.


Here, a.x is the scalar product of a and x. This equation must satisfy the following two conditions:

Let’s discuss the real-life applications of Support Vector Machine

  • It should separate the two classes A and B very well so that the function defined by:
    • f(x) = a.x + b is positive if and only if x ∈ A
    • f(x) ≤ 0 if and only if x ∈ B
  • It exists as far away as possible from all the observations (robustness of the model). Given that the distance from an observation x to the hyperplane is | a.x + b|/||a||.

The width of the space between observations is 2/||a||. It is called margin and it should be largest.
Hyperplane depends on support points called the closest points. Generalization capacity of SVM increases as the number of support points decreases.

3.2. The Non-Separable Case

If two classes are not perfectly separated but overlap. A term measuring the classification error must add to each of the following two conditions:

  • For every i, yi(a.xi + b) ≥ 1 (correct separation)
  • 1/2 ||a||2 is minimal (greatest margin)

Define these condition for each observation xi on the wrong side of the boundary. By measuring the distance separating it from the boundary of the margin on the side of its class.

This distance is then normalized by dividing it by the half-margin 1/||a||, giving a term i, called the slack variable. An error in the model is an observation for which ξ > 1.The sum of all the ξi represents the set of classification errors. So, the previous two constraints for finding the optimal hyperplane become:

  • For every i, yi(a.xi + b) ≥ 1 – ξi
  • 1/2 ||a||2 + δΣi ξi is minimal

Do you know about Machine Learning Future

The quantity δ is a parameter that penalizes errors. It controls the adaptation of the model to the errors. As this increases and sensitivity to errors rise, adaptation also increases.

In SVMs, the process of restructuring data is known as transformation and do it with the help of a function. Refer this function as the transformation function and represented by the symbol (Φ). Technically, the transformation functions map the dot product of data points to a higher dimensional place.

Another way of handling the nonseparable case is to move to a space having a high enough dimension for there to be a linear separation. Search for a nonlinear transformation for moving from original space to a higher dimensional space. But choose one which has a scalar product.

4. Linear SVM

We can use Linear SVM for finding the largest and smallest margin hyperplane that divides the training data D, and a set of n points.
If the training data is separable, then select two hyperplanes in a way that they separate the data. There are no points between them and the distance between them known as margin. It can maximize the margin. You can calculate the distance between these 2 hyperplanes by applying simple geometry. You can measure distance directly by 2/||a|| quantity. To increase the distance you have to reduce||a||.

You must read about Recurrent Neural Networks

  • Primal Form – Primal form helps to better solve the linear SVM problem. It uses standard quadratic programming techniques and programs.
  • Dual Form – You can use the dual form to write classification rules as an unconstrained system. By doing this you get hyperplane with greatest possible margin. In such cases, represent classification process as a function of support vector machines. A subset of training data lies on

Biased and Unbiased Hyperplanes
Represent data points and hyperplanes in the same coordinate system. Divide hyperplanes into 2 types on the basis of their coordinates as:

  • Biased hyperplanes – Hyperplanes that do not pass through the origin of the coordinate system.
  • Unbiased hyperplanes – Those that pass through the origin of the coordinate system.

5. Advantages and Disadvantages of SVM

Let us now look at some advantages and disadvantages of SVM.

  • Advantages – SVMs can model nonlinear phenomena by the choice of an appropriate kernel method. SVMs generally provide precise predictions. SVMs determine the optimal hyperplane by the nearest points (support vectors) only and not by distant points. This thus enhances the robustness of the model in some cases.
  • Disadvantage – The models are opaque. Although you can explain them with a decision tree, there is a risk of loss or precision. SVMs are very sensitive to the choice of the kernel parameters. The difficulty in choosing the correct kernel parameters may compel you to test many possible values. As a result, the computation time is sometimes lengthy.

Have a look at Top Machine Learning Algorithm

So, this was all about SVM Tutorial. Hope you like our explanation.

6. Conclusion

In conclusion, to support vector machine, it is the most popular machine learning algorithm. It is the maximal-margin classifier that explains how actually SVM works. It is implemented practically using kernel. And the learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra, which is out of the scope of this introduction to SVM.

If you have any questions about SVM or this post? Ask in the comments and I will do my best to answer.
See Also – 

Reference – Machine Learning

6 Responses

  1. nisha says:

    dear sir,
    I wanna know how to create .mat file for feature extarction . like sample.mat contain label and class. i am not getting how to create it. will u help me.?

    • Neil says:

      First, create a matrix of features the relevant dataset and then save the variable (feature matrix) using the “save” function.
      For eg.
      Assuming the name of matrix is “feature_vector”
      use the command: save feature_vector.mat feature_vector

  2. khasrow naeeme says:

    Sir, I want to know what are the Existing challenges and Existing Solution of the S VM

    • DataFlair Team says:

      Hi Khasrow,
      Thanks for connecting DataFlair. The performance of an SVM classifier is dependent on the nature of the data provided. If the data is unbalanced, then the classifier will suffer. Furthermore, SVMs cannot handle multi-label data. This means that any data with more than two labels cannot be handled by the SVM. It is also unable to handle a large amount of data.

      There are various kernels of SVMs like LS-SVM (Least Squared SVMs), Lib SVMs that provide solutions to some of the challenges faced by the SVMs. You can also reconstruct a kernelized SVM as a linear SVM to handle large data.
      Hope, it helps you!

  3. Satish says:

    What could be the possible reasons for performance of SVM model is inferior to ELM model for estimation of hydraulic conductivity by using soil parameters.

    • DataFlair Team says:

      Hello Satish,
      ELM is modeled after Artificial Neural Networks. It has been proven through experimentation that an ELM model is more computationally efficient on larger dataset than an SVM Classifier. While SVM can provide greater accuracy in some cases, it is expensive to deploy as compared to ELM. Furthermore, ELM can be applied quickly to the new data which is not possible with SVM.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.