Random Forest Classifier in Layman Terms

Random Forest Classifier in Layman Terms

February 13, 2019 DATAcated Challenge 0

Random Forest Classifier is an ensemble algorithm, which creates a set of decision trees from a randomly selected subset of the training set, which then aggregates the votes from different decision trees to decide the final class of the test object.

For better understanding, let us say Ram is planning to buy a car. After exhaustive car research, he is still unclear about what car he wants. He decides to take his friend Andy’s opinion. To understand Ram’s requirements, Andy gathers some data by asking him a few questions such as:

  1. How much are you willing to spend?
  2. Are you planning to buy a pre-owned or brand new car?
  3. Do you like to use it for a commute to work or pleasure?
  4. Which one do you like White Honda Civic or a Blue Genesis Coupe?
  5. Which one do you like Tesla Model S or Tesla Model 3?

This process of asking questions fall under the category of Decision Trees.

Andy suggests Ram a car based on his answers and a few assumptions. Andy being ram’s friend there is a chance that his opinion might be biased. For instance, Andy assumed Ram likes electric cars as he likes both Tesla Model S and Tesla Model 3. However, there might be an instance where Ram likes Tesla’s Model S for its design and Model 3 as it is economical. In another instance when Ram is given a choice between Blue Genesis Coupe and White Honda Civic, he picks the latter and Andy assumed that Ram likes white and economical cars. To avoid bias and get a wider perspective, Ram seeks advice from his other friend and gives different answers to each of them. For example, he chooses Blue Genesis Coupe instead of the White Honda Civic based on a test drive he took the previous day on an awesome interstate. He answers each time differently as he is either confused or unclear or based on his preference at that instance. To avoid bias and errors while obtaining a suggestion, Ram avoids narrow questions by giving different and unsettling answers. By maintaining such rules, he gets a pool of advice. Such a pool of advice is called Random Forest.

Advantages:

  1. It is very easy to measure the relative importance of each feature on the prediction.
  2. Overfitting is avoided in Random forest classifier, as it is the average of decision trees.

Disadvantages:

  1. Large trees make the algorithm slow and ineffective.

By: Anirudh Varanasi

 

Leave a Reply

Your email address will not be published. Required fields are marked *