As a mathematical modelling PhD student, machine learning has always been one of my areas of interest. There are many great machine learning MOOCs around on the internet and a very popular one is Andrew Ng’s machine learning course on coursera (https://www.coursera.org/learn/machine-learning). Whilst this is a great course, it has the slight disadvantage of being implemented in octave (open source version of matlab). I personally prefer a course implemented in either python or R and I later came across the “analytics edge” course offered by MIT via the edX platform (https://www.edx.org/course/analytics-edge-mitx-15-071x-2). Having just completed this course, I can wholeheartedly recommend it to anyone willing to try their hands on a bit of machine learning.
The course runs for approximately 12 weeks and you can take it completely free of charge and collect a honours certificate provided you obtain a pass mark of 55% (verified certificates are also available at a fee). The whole course is implemented in R and is of a very high standard (as expected of an MIT course). One feature that stands this course out from other analytics MOOCs out there is its very hands nature from day 1. As with most data science concepts, one of the best ways to understand machine learning is by implementing the various algorithms and exploring the results. Machine learning techniques covered include linear regression, logistic regression, decision trees (random forests, etc), clustering and visualisation. One thing I will say is that this course is not a core programming course in R-it mainly emphasises the use of R for data analytics and machine learning. As such it is more of an introduction to the basics of R and emphasis was more on learning syntax rather than hard core programming. There are lots of very useful R packages covered which can be used in day-to-day analytics tasks (see point 2 below).
Towards the end of the course, we were introduced to linear and integer optimization implemented in excel. Personally, I would have preferred to see more explanation of machine learning concepts.
Some nice features of the course
- It focuses on implementation of various machine learning algorithms without going into too much details of the underlying theory/mathematics. This is a nice way to understand machine learning for a beginner-by doing rather than getting bogged down on theory. There are other courses that cover more of the theory (see Andrew Ng’s machine learning course-https://www.coursera.org/learn/machine-learning) and this can be taken after this course.
- The course introduced a number of machine learning techniques which have already been implemented in R packages. These include random forests, decisions trees, logistic regression, linear regression, text analytics and clustering. Examples of packages used include rpart, randomForest, ROCR, caret, e1071, tm, kmeans, ggplot2, caTools, etc.
- I really liked the variety of data sets and problems introduced in the course. They are very interesting, diverse, stimulating and taken from the real world. Examples of datasets used include the Framingham heart study, crime data, stock market data, demographics, climate, imaging data (MRI), music, polling, twitter analytics, online dating , netflix (movies recommendation ), etc. Some of the datasets are very large and be very challenging from a computational perspective. As you can see, there’s almost certainly something for everyone.
- There’s a kaggle competition in week 7 which takes the excitement of the course to a whole different level! The data is again taken from the real world and you’ll have the joy of competing with about 3000 students from around the world.
- The lectures are very clear and concise. Emphasis is more on the assignments which are relatively easy to complete using the course material. These may seem repetitive initially but without doubt, the best way to explore machine learning is to dive into the problems using the various implementations in R. You can worry about the details of the algorithms or mathematical theory later.
- The discussion forum was a great place to learn from several other more experienced course participants. The best predictions usually rely on an ensemble of machine learning techniques which can only be learned through experience.
- The amount of lectures provided is very well balanced and the course is self-contained (It won’t take you totally away from other commitments). The assignment deadlines are realistic for most people taking similar courses (typically with day jobs and free time in evenings and weekends).
- The linear and logistic regression modules provide a reasonable grounding on the mathematical theory/assumptions behind the various implementations. Examples include sum of squared errors (SSEs), total sum of squared errors (SST), ROC curve, specificity and sensitivity, etc.
- The visualisation chapter is incredibly stimulating. I personally like data visualisation and so I took the liberty of enjoying this module. The main package used was ggplot2 but there were other packages for visualising maps/networks including maps, igraph and ggmap.
Without a shadow of doubt, I would recommend this course to anyone curious about machine learning or “analytics”. This course will by no means turn you into an expert in machine learning or “data science”. If you prefer python to R, there’s no reason not to try out the course in python-this will probably be one of my side projects in the near future. Alongside the course, I found it really helpful to go through the statistical learning book by Gareth James et al (especially the random forest chapter). In my opinion, 15.071x is one of best free MOOCs out there. It would be interesting to hear about other great machine learning courses out there.