Machine Learning Algorithms (A quick revision)

there is only 4 algorithms :
Linear Regression
Logistic Regression
Decision Tree
Random Forest

Steps for Machine Learning Algorithms

Define the Problem: Identify the task and the kind of data needed to address it (e.g., classification, regression).
Collect Data: Gather relevant data for your algorithm to learn from.
Preprocess Data: Clean, normalize, or encode data to prepare it for analysis.
Split the Data: Divide data into training, validation, and testing datasets.
Select an Algorithm: Choose an appropriate model (e.g., decision trees, neural networks).
Train the Model: Use training data to let the algorithm learn patterns.
Validate the Model: Use the validation set to fine-tune and improve performance.
Prediction and Evaluation steps is necessary model accuracy.
Test the Model: Evaluate how well the algorithm performs on new, unseen data.
Deploy the Model: Implement it in a real-world environment if it meets accuracy criteria.
Monitor & Update: Regularly monitor performance and update the model if needed.

Now we are talking about the algorithms in machine learning.

Linear Regression

It predicts real values (or) Predicting output variable is "continuous data".
A relationship is established between the independent variable and the dependent variable by fitting a best fit line.
Assuming the relation between x & y is "linear" (or) The best fit line is known as the "regression line".

Representation:

y = mx + c

Where:

y = dependent variable
m = slope
x = independent variable
c = intercept

Y = β0 + β1X1 + β2X2 + ... + βpXp

Where:

Y: The response variable.
Xj: The jth predictor variable.
βj: The average effect on Y of a one-unit increase in Xj, holding all other predictors fixed.

Logistic Regression

It is a classification algorithm used to estimate "Discrete Data values" based on a given set of independent variables.
Its output value lies between [0 and 1] or [yes/no].
Uses the Sigmoid function probability:
Whereas logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign or malignant.
Logistic regression assumes the binomial distribution of the dependent variable.

e^y/(1+e^y) 


p(X) = e^(β0 + β1X1 + β2X2 + ... + βpXp) / (1 + e^(β0 + β1X1 + β2X2 + ... + βpXp))

Linear Regression & Logistic Regression

Decision Tree

It is a Supervised learning algorithm used for classification problems.
Works with both categorical and continuous dependent variables.
In this algorithm, the "population is split" into two (or) more homogeneous sets based on conditions.

Different Techniques:

Gini impurity
Information Gain
Chi-square
Code (if need)

from sklearn.tree import DecisionTreeClassifier

# Create a decision tree classifier
dt = DecisionTreeClassifier(criterion='entropy', random_state=42)

# Fit the model
dt.fit(X_train, Y_train)

# Predict on the test set
dt_pred = dt.predict(X_test)

Pros:

Interpretability: Easy to visualize and understand.
Speed: Quick to train and make predictions.

Cons:

Overfitting: Prone to overfitting, especially with deep trees.
Sensitivity to Outliers: Can be heavily influenced by outliers.

Random Forest

A collection of Decision Trees is called a "Random Forest".
To classify a new object based on its attributes:
- Each tree in the forest provides a classification.
- The majority of votes determines the final class (this is known as the Bagging Technique).
Applicable for both Classification and Regression tasks.

Decision Tree & Random Forest
Code (if need)

from sklearn.ensemble import RandomForestClassifier

# Create a random forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model
rf.fit(X_train, Y_train)

# Predict on the test set
rf_pred = rf.predict(X_test)

Pros:

Accuracy: Generally more accurate than a single decision tree.
Robustness: Less sensitive to outliers and noise.
Overfitting: Less prone to overfitting due to ensemble averaging.

Cons:

Interpretability: Difficult to interpret and visualize.
Computation: Requires more computational resources and time to train.

Thanks for Reading ~ Jai hanuman

2 Reactions

1 Bookmarks