Install Spark on MacOS High Sierra
Installing Spark on MacOS High Sierra. Apache provides multiple ways to accomplish this depending on your personal preferences: use Homebrew, download a prebuilt file or build it yourself from source. This tutorial describes building it yourself from source. continue reading …
Feature Selection using Ranking in r
Feature selection can be done by ranking them by importance: an important step in machine learning. Often features are irrelevant or redundant to making predictions which can slow down learning algorithms and negatively impact prediction accuracy. continue reading …
Develop Spark in IntelliJ with Maven
How to setup, compile, and run a simple Spark application from scratch. It assumes you have IntelliJ, the IntelliJ scala plugin and maven installed. continue reading …
UPDATE Spark and Scala in IntelliJ using Maven
This tutorial will assist in updating that same environment with the current versions, which, as of this writing is: Spark 2.4.3 and Scala 2.11.12. continue reading …
Feature Subset Selection (FSS) using Filters in r
Filters allows feature subset selection (FSS) which is the process of finding the best combination of attributes in the available data to produce the highest prediction accuracy. continue reading …
Machine Learning Tutorials for Beginners
Weka Part 1 – Feature Selection. Weka is an easy to use application for discovering machine learning. It is a great place to start. It’s free, provides a large number of learning algorithms, and uses a GUI that makes fine tuning your machine learning results easy. continue reading …
FSS using Wrappers and Regularization
Wrapper methods perform (FSS) by utilizing a predefined learning algorithm. These methods search through possible subsets of features and measure the accuracy of each subset selection against the chosen learning algorithm. continue reading …

Weka Part 2 – Predictions using kNN. K-Nearest Neighbors is a common algorithm used in Machine Learning for predictions. The k defines how many data points (neighbors) are considered similar or alike. continue reading …

Regularization is a regression technique that shrinks feature coefficients towards zero to simplify the learning model, reduce overfitting while promising the least amount of error on new unknown data. continue reading …

Weka Part 3 – SVM. Another common machine learning algorithm is called Support Vector Machine (SVM). In the simplest terms it splits the data into groups using a separation line later used to predict the side a new data point is placed on. continue reading …