ML Project – Student Dropout Risk Prediction using Gradient Boosting

Machine Learning courses with 100+ Real-time projects Start Now!!

Program 1

Student Dropout Risk Prediction Dataset

# Step 1: Import libraries
#Student Dropout Risk Prediction
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Step 2: Load dataset
df = pd.read_csv("D://scikit_data/dropout/student_dropout_risk_dataset.csv")
df.head()
df.isnull().sum()
df.shape
df.info()

# Step 3: Encode categorical variables
le_gender = LabelEncoder()
le_support = LabelEncoder()
le_job = LabelEncoder()

df["Gender"] = le_gender.fit_transform(df["Gender"])
df["ParentalSupport"] = le_support.fit_transform(df["ParentalSupport"])
df["PartTimeJob"] = le_job.fit_transform(df["PartTimeJob"])
df.head()

# Step 4: Define features and target
X = df.drop("DropoutRisk", axis=1) # Input (Independed)
y = df["DropoutRisk"] # Output Depended
y

# Step 5: Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
len(X_test)

# Step 6: Train the model
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Step 7: Evaluate
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print("Model Accuracy:", round(acc * 100, 2), "%")
print("Confusion Matrix:\n", cm)

# Step 8: User input for prediction
print("\n Enter student details to predict dropout risk:")

gender = input("Gender (Male/Female): ")
support = input("Parental Support (Low/Medium/High): ")
job = input("Part-Time Job? (Yes/No): ")
age = int(input("Age: "))
gpa = float(input("Current GPA (0.0–4.0): "))
attendance = float(input("Attendance Rate (0–100%): "))
study_hours = int(input("Study Hours Per Week: "))

# Encode inputs
if(gender=="Male"):
    gender_encoded = 1
else:
    gender_encoded = 0

if(support=='Low'):
    support_encoded=0
elif(support=='Medium'):
    support_encoded=1
else:
    support_encoded=2

if(job=='Yes'):
    job_encoded=1
else:
    job_encoded=0

# Predict
input_data = pd.DataFrame([{
    "Age": age,
    "Gender": gender_encoded,
    "StudyHoursPerWeek": study_hours,
    "AttendanceRate": attendance,
    "ParentalSupport": support_encoded,
    "PartTimeJob": job_encoded,
    "CurrentGPA": gpa
}])

prediction = model.predict(input_data)
print("\n Dropout Risk Prediction:", " At Risk of Dropping Out" if prediction == 1 else " Not at Risk")

# Step 9: Plot Feature Importance
import matplotlib.pyplot as plt

# Get feature importances
importances = model.feature_importances_
print(importances)
feature_names = X.columns
print(feature_names)
#Create a bar chart
plt.figure(figsize=(10, 6))
plt.barh(feature_names, importances, color='skyblue')
plt.xlabel("Importance Score")
plt.title(" Feature Importance – Dropout Risk Prediction")
plt.grid(axis='x')
plt.tight_layout()
plt.show()

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

courses

DataFlair Team

DataFlair Team provides high-impact content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. We make complex concepts easy to grasp, helping learners of all levels succeed in their tech careers.

Leave a Reply

Your email address will not be published. Required fields are marked *