ML Project – Salary Prediction Based-on Skills and Experience using Gradient Boosting

Machine Learning courses with 100+ Real-time projects Start Now!!

Program 1

Salary Prediction Dataset

# Salary Prediction Based on Skills and Experience using Gradient Boosting
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Step 2: Load dataset
df = pd.read_csv("D://scikit_data/gbm/salary_prediction_dataset.csv")

# Input(Independed)  output(Depended)
X = df.drop("Salary(LPA)", axis=1) # Indepdend
y = df["Salary(LPA)"] # Depended

# Step 4: Preprocessing for categorical features
categorical_cols = ["EducationLevel", "JobRole"]
numerical_cols = ["YearsExperience", "SkillPython", "SkillSQL", "SkillML"]

preprocessor = ColumnTransformer([
    ("cat", OneHotEncoder(drop="first"), categorical_cols)
], remainder='passthrough')
# OneHotEncoder: Converts these into binary features (0 or 1)
# drop="first" avoids dummy variable trap
# remainder='passthrough': Leaves numerical columns unchanged
preprocessor

model = Pipeline(steps=[
    ("preprocessing", preprocessor),
    ("regressor", GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42))
])
model
# Preprocessing (encoding)
# Gradient Boosting model
# This makes training cleaner and repeatable.

#Split DataSet
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 7: Train the model
model.fit(X_train, y_train)

# Step 8: Predictions and Evaluation
y_pred = model.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)

print(" Gradient Boosting Model Performance:")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.2f}")

# 🔹 Step 9: Accept user input and make a salary prediction
print("\n Enter details to predict your expected salary (LPA):")

# Get user input
experience = float(input("Years of Experience: "))
education = input("Education Level (High School / Bachelors / Masters / PhD): ")
job_role = input("Job Role (Data Analyst / Data Scientist / Software Engineer): ")
python_skill = int(input("Do you know Python? (1 for Yes, 0 for No): "))
sql_skill = int(input("Do you know SQL? (1 for Yes, 0 for No): "))
ml_skill = int(input("Do you know Machine Learning? (1 for Yes, 0 for No): "))

# Create a DataFrame for the input
input_data = pd.DataFrame([
    {
    "YearsExperience": experience,
    "EducationLevel": education,
    "JobRole": job_role,
    "SkillPython": python_skill,
    "SkillSQL": sql_skill,
    "SkillML": ml_skill
}
])

# Predict using the trained model
predicted_salary = model.predict(input_data)[0]

print(f"\n Estimated Salary (LPA): {predicted_salary:.2f}")

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

courses

DataFlair Team

DataFlair Team provides high-impact content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. We make complex concepts easy to grasp, helping learners of all levels succeed in their tech careers.

Leave a Reply

Your email address will not be published. Required fields are marked *