Python Machine Learning Project – Detecting Parkinson’s Disease with XGBoost

Free Machine Learning courses with 130+ real-time projects Start Now!!

In our list of Python projects, detecting Parkinson’s disease with python is on the 3rd position. In this Python Machine learning project, we will build a model using which we can accurately detect the presence of Parkinson’s disease in one’s body.

Here are some more Python Machine Learning Projects which you can bookmark for practicing later:

So, let’s start the Python Machine Learning Project with the introduction of terms used –

Detecting Parkinson’s Disease – Python Machine Learning Project

What is Parkinson’s Disease?

Parkinson’s disease is a progressive disorder of the central nervous system affecting movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neurodegenerative disorder affecting dopamine-producing neurons in the brain.

What is XGBoost?

XGBoost is a new Machine Learning algorithm designed with speed and performance in mind. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the xgboost library; this is an implementation of the scikit-learn API for XGBoost classification.

Detecting Parkinson’s Disease with XGBoost – Objective

To build a model to accurately detect the presence of Parkinson’s disease in an individual.

Detecting Parkinson’s Disease with XGBoost – About the Python Machine Learning Project

In this Python machine learning project, using the Python libraries scikit-learn, numpy, pandas, and xgboost, we will build a model using an XGBClassifier. We’ll load the data, get the features and labels, scale the features, then split the dataset, build an XGBClassifier, and then calculate the accuracy of our model.

Dataset for Python Machine Learning Project

You’ll need the UCI ML Parkinsons dataset for this; you can download it here. The dataset has 24 columns and 195 records and is only 39.7 KB.

Prerequisites

You’ll need to install the following libraries with pip:

pip install numpy pandas sklearn xgboost

You’ll also need to install Jupyter Lab, and then use the command prompt to run it:

C:\Users\DataFlair>jupyter lab

This will open a new JupyterLab window in your browser. Here, you will create a new console and type in your code, then press Shift+Enter to execute one or more lines at a time.

Steps for Detecting Parkinson’s Disease with XGBoost

Below are some steps required to practice Python Machine Learning Project –

1. Make necessary imports:

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Screenshot:

2. Now, let’s read the data into a DataFrame and get the first 5 records.

#DataFlair - Read the data
df=pd.read_csv('D:\\DataFlair\\parkinsons.data')
df.head()

Output Screenshot:

3. Get the features and labels from the DataFrame (dataset). The features are all the columns except ‘status’, and the labels are those in the ‘status’ column.

#DataFlair - Get the features and labels
features=df.loc[:,df.columns!='status'].values[:,1:]
labels=df.loc[:,'status'].values

Screenshot:

4. The ‘status’ column has values 0 and 1 as labels; let’s get the counts of these labels for both- 0 and 1.

#DataFlair - Get the count of each label (0 and 1) in labels
print(labels[labels==1].shape[0], labels[labels==0].shape[0])

Output Screenshot:

We have 147 ones and 48 zeros in the status column in our dataset.

5. Initialize a MinMaxScaler and scale the features to between -1 and 1 to normalize them. The MinMaxScaler transforms features by scaling them to a given range. The fit_transform() method fits to the data and then transforms it. We don’t need to scale the labels.

#DataFlair - Scale the features to between -1 and 1
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels

Screenshot:

6. Now, split the dataset into training and testing sets keeping 20% of the data for testing.

#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)

Screenshot:

7. Initialize an XGBClassifier and train the model. This classifies using eXtreme Gradient Boosting- using gradient boosting algorithms for modern data science problems. It falls under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.

#DataFlair - Train the model
model=XGBClassifier()
model.fit(x_train,y_train)

Output Screenshot:

8. Finally, generate y_pred (predicted values for x_test) and calculate the accuracy for the model. Print it out.

# DataFlair - Calculate the accuracy
y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

Output Screenshot:

Summary

In this Python machine learning project, we learned to detect the presence of Parkinson’s Disease in individuals using various factors. We used an XGBClassifier for this and made use of the sklearn library to prepare the dataset. This gives us an accuracy of 94.87%, which is great considering the number of lines of code in this python project.

Hope you enjoyed this Python project. We have already provided you the links for more interesting Python Projects at the top of the blog.

Want to become next Data Scientist?

Enroll for DataFlair’s Python Online Course and be the next Data Scientist

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google

Tags: interesting Python project Python machine learning project Python project

RAHUL SHARMA says:
October 31, 2019 at 8:12 pm
Does this same code is available for matlab also.
It will be much helpfull if available for Matlab.
Any idea about Alzheimer Detection
Reply
Ad says:
January 28, 2020 at 10:27 pm
How would I test the model?
Using the rest 80 percent?
Please let me know ASAP
Reply
- DataFlair says:
  November 10, 2021 at 10:16 am
  Hi Ad,
  No, in this using the train_test_split() function, we are splitting the inputs and the output into two parts containing 80% and 20% data. Then we are using the 80% part, named x_train and y_train, to train the model. After this, we are using the rest 20% to test and find the accuracy. Hoping that I could make it clear.
  Reply
T says:
April 8, 2020 at 1:04 am
I like turtles
Reply
Jennifer says:
October 15, 2020 at 11:27 am
Contact him through his email for herpes cure_______________[[ robinson_bucler @ ]] yahoo com..
…………. Thank you! !!!
…………. Thank you! !!!❤️❤️❤️
Reply
Eddie says:
February 17, 2021 at 11:56 pm
Is it compulsory scaling the data? Because without it I got a 97% accuracy.
Reply
- DataFlair says:
  November 10, 2021 at 10:18 am
  It is a good habit to scale the data so that the algorithm will better fit the data. It is a rear case to get a higher accuracy without scaling.
  Reply
Raymond says:
April 2, 2021 at 3:47 pm
I am Always getting this error can you help.
—————————————————————————
NameError Traceback (most recent call last)
in
1 #DataFlair – Train the model
—-> 2 model=XGBClassifier()
3 model.fit(x_train,y_train)
NameError: name ‘XGBClassifier’ is not defined
Reply
- DataFlair says:
  November 10, 2021 at 10:20 am
  Please check if you have imported the classifier using the statement ‘from xgboost import XGBClassifier’. If the problem is still not solved, try reinstalling the module again. Hope this solves the issue.
  Reply
Joshua says:
April 4, 2021 at 1:36 am
Did you install the XGBoost package on the cmd prompt? if so, may be it will fix it
Reply
riza abdul says:
August 8, 2021 at 1:20 pm
Hi,
please can you let me know how do I test this model, what are the inputs required?
Reply
- DataFlair says:
  November 10, 2021 at 10:28 am
  Hello,
  Before training the model, we split the data set into two parts having 80% and 20% data. We are using the 20% data to test the model. The inputs are the same ones that are used for training, i.e., all columns except the Status column.
  Reply
Aneeta says:
January 2, 2022 at 1:43 pm
i have imported the classifier, but still i didn’t get the output. Output I’m getting is XGBClassifier(). I’m not getting the values.
Reply
RAZIYA says:
February 15, 2022 at 2:48 pm
UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, …, [num_class – 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning.
i have used python 10
Reply
RAZIYA says:
February 15, 2022 at 2:51 pm
C:\Users\HP\AppData\Local\Programs\Python\Python310\lib\site-packages\xgboost\compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
warning at line from xgboost import XGBClassifier
Reply
Abbey says:
December 22, 2022 at 10:36 am
My python stops working and restarts the kennel once i run the code:
Model=XGBClassifier()
Model.fit(x_train,y_train)
Please could be wrong
Reply
Aparna shahare says:
May 1, 2023 at 2:25 pm
Hello,
I am getting the following error while training, could you please provide me your help in solving this?
XGBoostError: [14:21:58] C:/buildkite-agent/builds/buildkite-windows-cpu-autoscaling-group-i-08de971ced8a8cdc6-1/xgboost/xgboost-ci-windows/src/data/data.cc:455: Check failed: this->labels.Size() % this->num_row_ == 0 (156 vs. 0) : Incorrect size for labels.
Reply
Mahalakshmi R says:
March 20, 2024 at 9:40 am
the parkinsons dataset is not available when we click on download data its showing not found
. can you resolve this??
Reply

Python Machine Learning Project – Detecting Parkinson’s Disease with XGBoost

Detecting Parkinson’s Disease – Python Machine Learning Project

What is Parkinson’s Disease?

What is XGBoost?

Detecting Parkinson’s Disease with XGBoost – Objective

Detecting Parkinson’s Disease with XGBoost – About the Python Machine Learning Project

Dataset for Python Machine Learning Project

Prerequisites

Steps for Detecting Parkinson’s Disease with XGBoost

Summary

18 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials