M208 with the Open University

I started studying with the Open University in 2018 shortly after completing my PhD in applied mathematics. The reason for this choice is purely based on love for the subject and curiosity to learn more pure and applied mathematics. My first degree was not in mathematics which sort of made me realise how much I really did miss the subject. I started with MST125 which was quite straight forward. I then moved onto M208 which was my first introduction to degree pure maths. Topics covered include real analysis, group theory and linear algebra.

I’ve just completed my exams and whilst awaiting results, I thought I’d write a reflection on the course. M208 was certainly very challenging to start with but much more enjoyable towards the end when the concepts really started to fit together. Whilst taking the course, the focus was more on getting through the books and completing TMAs  (tutor marked assignments) before deadlines. Real analysis was quite new to me and required a different way of thinking about mathematics-emphasis was more on proofs, theorems and first principles. We explored behaviour of sequences and series, set theory, continuity, differentiation, integration, Taylor series and convergence. Throughout M208, I had access to face-to-face tutorials which was a blessing. My tutor was very passionate about the subject and I made the most of this opportunity. Whilst online tutorials are great, they don’t really have the feel of in-person tutorials.

The main struggle of M208 was the sheer volume of the material. Being a 60 credit module, there was a lot more to plough through compared to other 30 credit modules. Although Group theory was new to me, I found it really fun-it didn’t feel like hard grind to me and the theorems were rather nice and cohesive. Topics covered include symmetry and subgroups, permutations, cosets, group actions, homomorphisms, quotient groups and conjugacy, counting theorem, etc.  Group theory is a bit like marmite in that you either like it (or get it) or you don’t and a lot of students who didn’t fancy abstract mathematics couldn’t wait for it to finish. Abstraction isn’t that bad and sometimes you just want to focus on the ideas without the distraction of the context. At the same time, it turns out that Group Theory has some surprisingly useful applications in website encryption, counting (counting theorem) and particle physics. Linear algebra was quite fun too and it was nice to delve a bit more into the purest aspects of it.

I found that the module tutors were one of the strong features of the course-very supportive and always available to answer questions. This is really important for topics like analysis where you might not immediately grasp concepts at first reading. M208 is really my first formal introduction to pure mathematics and I’ve really enjoyed it even though it was tough at the beginning. I fell behind on the workload as the books were very chunky and the pace of the course was very swift. I had to juggle this alongside full time work, exams, family, etc.

Going forward, I’m considering taking more pure mathematics modules. In particular, I would like to explore number theory a bit further but it seems my best option for level 2 is MST210 which is another 60 credit module but very applied math. There’s some number theory coverage in M303 which is another level 3 module. Complex analysis (M337) looks really appealing and very appropriate especially having completed real analysis.

Irrespective of whatever I choose to do next, this OU journey just gets more and more exciting! I’m ever so grateful for the opportunity to learn through the OU!

Studying mathematics with the Open University

I am now in the second year of an undergraduate degree in mathematics and I definitely recommend studying with the OU especially if you need to earn qualifications alongside working. There are many benefits:

  1. Typed lecture notes: One of the best things about studying with the OU is the fact that they actually provide you with typed notes which really facilitates understanding of the material. For courses like mathematics, this is pure gold! A lot of math textbooks are undigestible especially if you are new to undergraduate mathematics. Hence, having notes that are easy to understand means you end up learning the material better than most brick universities. For this reason alone, I am of the opinion that the OU offers better value for money. The notes are designed to to be read and understood by self-studying. Most students studying with the OU are mature students who already have families and other responsibilities. It therefore helps to not have to spend a lot of time making your own notes or trying to understand badly written textbooks.
  2. Learning at your own pace: You can take as many or as little OU credits depending on your availability. This means the learning is tailored to your needs. You can take courses at your own pace and you can also take breaks in between courses.
  3. Tutorials: Each student is assigned a tutor and usually the tutors also organise tutorials either online or in-person. I’ve found both online and in-person tutorials very useful.
  4. Entry qualifications: The OU is not rigid on entry qualifications. They give you a chance to prove yourself and if you get along well with the material, you can carry on with the course. This is quite helpful for certain courses where there is a big jump between A levels and University (e.g. mathematics).
  5. High academic standards: The cut off for a distinction at the OU is 85% (yes-85%). This means you have to work very hard to get the top grades. In most brick universities, you can obtain a distinction from 70%. This is reason why employers value OU qualifications.
  6. Network of other OU students: As a student of OU, you will soon realise that the OU network is very vast. Wherever you go in the world, chances are you will find someone studying with the OU.
  7. Future of education: I’m convinced that the future of education is increasingly leaning towards the online model. More courses will be offered online and I feel the OU is already pioneering this model and was probably way ahead of its time.

My review of MIT’s “Analytics Edge” MOOC (15.071x)



As a mathematical modelling PhD student, machine learning has always been one of my areas of interest. There are many great machine learning MOOCs around on the internet and a very popular one is Andrew Ng’s machine learning course on coursera (https://www.coursera.org/learn/machine-learning). Whilst this is a great course, it has the slight disadvantage of being implemented in octave (open source version of matlab). I personally prefer a course implemented in either python or R and I later came across the “analytics edge” course offered by MIT via the edX platform (https://www.edx.org/course/analytics-edge-mitx-15-071x-2). Having just completed this course, I can wholeheartedly recommend it to anyone willing to try their hands on a bit of machine learning.

The course runs for approximately 12 weeks and you can take it completely free of charge and collect a honours certificate provided you obtain a pass mark of 55% (verified certificates are also available at a fee). The whole course is implemented in R and is of a very high standard (as expected of an MIT course). One feature that stands this course out from other analytics MOOCs out there is its very hands nature from day 1. As with most data science concepts, one of the best ways to understand machine learning is by implementing the various algorithms and exploring the results. Machine learning techniques covered include linear regression, logistic regression, decision trees (random forests, etc), clustering and visualisation. One thing I will say is that this course is not a core programming course in R-it mainly emphasises the use of R for data analytics and machine learning. As such it is more of an introduction to the basics of R and emphasis was more on learning syntax rather than hard core programming. There are lots of very useful R packages covered which can be used in day-to-day analytics tasks (see point 2 below).

Towards the end of the course, we were introduced to linear and integer optimization implemented in excel. Personally, I would have preferred to see more explanation of machine learning concepts.

Some nice features of the course

  1. It focuses on implementation of various machine learning algorithms without going into too much details of the underlying theory/mathematics. This is a nice way to understand machine learning for a beginner-by doing rather than getting bogged down on theory. There are other courses that cover more of the theory (see Andrew Ng’s machine learning course-https://www.coursera.org/learn/machine-learning) and this can be taken after this course.
  2. The course introduced a number of machine learning techniques which have already been implemented in R packages. These include random forests, decisions trees, logistic regression, linear regression, text analytics and clustering. Examples of packages used include rpart, randomForest, ROCR, caret, e1071, tm, kmeans, ggplot2, caTools, etc.
  3. I really liked the variety of data sets and problems introduced in the course. They are very interesting, diverse, stimulating and taken from the real world. Examples of datasets used include the Framingham heart study, crime data, stock market data, demographics, climate, imaging data (MRI), music, polling, twitter analytics, online dating , netflix (movies recommendation ), etc. Some of the datasets are very large and be very challenging from a computational perspective. As you can see, there’s almost certainly something for everyone.
  4. There’s a kaggle competition in week 7 which takes the excitement of the course to a whole different level! The data is again taken from the real world and you’ll have the joy of competing with about 3000 students from around the world.
  5. The lectures are very clear and concise. Emphasis is more on the assignments which are relatively easy to complete using the course material. These may seem repetitive initially but without doubt, the best way to explore machine learning is to dive into the problems using the various implementations in R. You can worry about the details of the algorithms or mathematical theory later.
  6. The discussion forum was a great place to learn from several other more experienced course participants. The best predictions usually rely on an ensemble of machine learning techniques which can only be learned through experience.
  7. The amount of lectures provided is very well balanced and the course is self-contained (It won’t take you totally away from other commitments). The assignment deadlines are realistic for most people taking similar courses (typically with day jobs and free time in evenings and weekends).
  8. The linear and logistic regression modules provide a reasonable grounding on the mathematical theory/assumptions behind the various implementations. Examples include sum of squared errors (SSEs), total sum of squared errors (SST), ROC curve, specificity and sensitivity, etc.
  9. The visualisation chapter is incredibly stimulating. I personally like data visualisation and so I took the liberty of enjoying this module. The main package used was ggplot2 but there were other packages for visualising maps/networks including maps, igraph and ggmap.


Without a shadow of doubt, I would recommend this course to anyone curious about machine learning or “analytics”. This course will by no means turn you into an expert in machine learning or “data science”. If you prefer python to R, there’s no reason not to try out the course in python-this will probably be one of my side projects in the near future. Alongside the course, I found it really helpful to go through the statistical learning book by Gareth James et al (especially the random forest chapter). In my opinion, 15.071x is one of best free MOOCs out there. It would be interesting to hear about other great machine learning courses out there.


A Collection of Useful R Codes for Data Manipulation

Two of the most important data science tools for wrangling and data manipulation are R and python. Of these two, I personally prefer R for data manipulation since it was written specifically for this by statisticians. One down side of R is that its documentation is quite poor and so it can be helpful to make your own list of useful codes which you can refer to as and when needed. In this article, I will be posting a collection of really useful R codes which I’ve found very handy over the course of my PhD and datascience work.

  1. Install and/or load multiple R packages at once: This is quite handy as most times, you will find yourself needing to either install new packages or load previously installed packages. A good exercise would be to re-write the code below as a function which can take one or more R packages as an argument. The code below was obtained from http://diggdata.in/.
    # List of packages to be installed/loaded
    packageList <- c("ggplot2", "nlme")
    check <- packageList %in% rownames(installed.packages())
    if(any(!check)) install.packages(packageList[!check])
    lapply(packageList, library, character.only = TRUE)
  2.  Create new folders in R: It is possible to create a new folder in your current directory using R.
    # get current working directory
    # create "folder_1" in current working directory
  3.  Plotting in ggplot2 using the pipe operator. The pipe operator is very handy for doing this. It also allows you to manipulate the data prior to the plot-see 4 below
    # Plot in ggplot2 using pipe
    CO2 %>% ggplot(aes(x = conc, y = uptake, group = Plant,
    col = Plant, shape=Type)) + geom_point() + geom_line()

    Uptake vs concentration

  4.  Manipulate data prior to plotting in ggplot
    # convert uptake by group into z scores (mean - uptake/sd(uptake))
    CO2 %>% group_by(Plant) %>%
    mutate(mean_uptake = mean(uptake),
    z_uptake = (mean_uptake - uptake)/sd(uptake)) %>%
    ggplot(aes(x = Plant, y = z_uptake, fill = Plant)) +

Read more

What are employers looking for in PhD graduates?


Many a maths PhD graduate may have envisioned themselves landing their dreams jobs almost straight after their vivas. This is not an unreasonable dream. However, this isn’t always the case and it is increasingly more important to understand very clearly what skills graduates are able to bring into a role or industry. There is the need to reflect on the PhD journey with a view to highlighting achievements that may be of particular interest to a potential employer. It may seem obvious to any PhD graduate what skills they’ve accrued over the course of their research. However, not all employers really understand and appreciate the value of a PhD. Therefore it is so important when writing your CV that your skills are well profiled and relevant to the various roles of interest. This is a type of inverse problem where you have to first understand what an employer is looking for and then find ways of demonstrating that you are the right fit for the job.

There are two broad categories of jobs that maths/natural science PhDs often end up doing:

  1. Jobs that require a PhD qualification in the subject studied: These types of jobs are very specialist and research focused and hence do specify the need for a PhD qualification. As a result, most companies won’t hire until you’ve completed your viva. They need to be sure that you indeed have the skills they are after and this is fair. Industries in this category often hire graduates who have focused on a subject matter that’s directly relevant to that industry. Examples include machine learning, mathematical modelling, fluid dynamics, financial modelling and time series analysis, optimization, PDEs, cryptography and number theory, advanced statistical modelling, etc. These jobs tend to attract higher salary packages because they value the skill set you’re bringing into the company. Examples of companies include pharmaceutical industry (modelling and simulation), finance industry (quantitative analysts and financial modellers), data science and analytics firms (Twitter, Quora, Facebook, etc), GCHQ, Google, IBM, Microsoft, Academia, etc. These jobs are likely to value relevant publications in the respective field of research.
  2. Jobs that do not directly require a PhD qualification: Most PhD graduates may be wondering-what’s the point of going for roles that do not require my PhD? Whilst these roles may not necessarily require an expertise in the subject matter of your PhD, it is the skills acquired and demonstrated in the process that are the main reasons for the hire. These include problem solving skills, ability to learn difficult and technical concepts very quickly, programming, analysis, modelling skills, independent research, written and verbal communication, etc. These roles are often open to non-PhD graduates including Masters and sometimes holders of BSc degrees. However, a PhD graduate has a greater advantage due to the fact that they’ve had more time to develop some of these key skills. Publications may not necessarily count for these types of roles. An employer in this category is interested in what you can add to their industry and this boils down to transferrable skills including softer skills such as communication, presentation and other business skills.

There are also other types of roles that do not necessarily require the subject matter of a PhD but nonetheless require a PhD qualification. A good example is the insight data science fellowship program in the US. This is a sort of bootcamp that recruit PhDs into Data Science roles in industry. In summary, the skills you’ve gained during your PhD are just as important as the PhD itself and these need to be reflected in your resume. You have to sell yourself as not every employer understands what skills are required to complete a PhD.

Self study list for becoming a data scientist

Data science is an exciting and growing field! It is a relatively new field and I’m sure lots of graduates have questions about it so I thought I’d put together all the best resources out there. I’m currently putting together a list of useful resources for budding data scientists (myself included) out there.


1. An introduction to statistical learning: This is a great practical text on machine learning. I was impressed by its amazon reviews (all very positive reviews as of 3/11/2015) and indeed very impressed its clarity and precision (amazon link). The authors of the book have kindly made it available as a free pdf online but most enjoy it so much that they end up buying the hard copy.

2. Theory and Applications for Advanced Text Mining

3. 9 free data science books
Programming languages

4. There are a few programs specifically designed for PhD graduates who want to become data scientists. One of them is the insight data science course. The link takes you to a page of recommended readings and preparatory material for their data science candidates. The application for this fellowship is very competitive and they’ve recently started taking medical doctors for their new insight healthcare data science. The actual course is very self-directed and is more of a crash course. I believe one of the best things about the course is that it serves as a platform where employers can meet with data science candidates and they also prepare candidates for interviews.

5. Mathematics and Statistics background: It is essential to have a good grasp of certain mathematical concepts from linear algebra and multivariable calculus. Probability and distribution theory and Bayesian statistics would also be very valuable. The coursera machine learning course by Andrew Ng is very highly recommended.

6. An extensive list of resources on how to become a data scientist can be found on the quora website-how to become a data scientist. Most of these were posted by current data scientists in various industries and so for the most part will be up to date.

7. Programming: It’s crucial to have a good programming background especially with high level languages such as python and R. If you are a complete beginner in programming, I would highly recommend python. Not only does it have an excellent online documentation (unlike R), its syntax is also very easy to understand and hence highly recommended for beginners. I started off learning R but after picking up python, I noticed that my programming skills became so much better and concepts like control structures and algorithms were much easier to grasp. If you want to learn more about python, I would recommend the MIT python course on edX. I have personally taken this course and I would recommend it perhaps after taking an introductory course to python syntax, etc. It is highly reputed as a very thorough and difficult course which emphasises mastery of algorithms and other fundamental programming concepts. Other languages for handling data bases such as SQL and big data tools such as Hadoop and Spark/Scala are also highly desirable.

8. Harvard Data Science course: This is a free online course organised by Prof Joe Blitzstein and colleagues at Harvard University. Again, I’ve heard many good things about it and there is definitely enough to keep you busy for month!

9. Open source data science masters: There are lots of resources here on the various data science domains (machine learning, maths/statistics, databases, visualisation)

10. Piotr Migdal, a recent physics PhD graduate and now free lance data scientist has put together a fantastic article about his own journey from a PhD student in quantum physics to becoming a data scientist. Its such a good read and I would highly recommend it-there’s enough there to keep you busy for years.

11. MIT’s Analytics Edge: This is one of the best MOOCs out there that you can undertake absolutely free of charge. I’m currently taking this MOOC and it is just amazing to say the least. I will be writing a review of the entire course once I’ve completed it. Features include machine learning, data visualisation, integer and linear optimization and also a kaggle competition.

Getting your first Data science role

“Data science” is a relatively new field that combines knowledge of statistics, machine learning and programming in solving real world problems using data. The field is still very new and whilst many companies understand why they need data scientists, many don’t know what skills they should be looking for in potential hires. A lot of data science job descriptions have a long list of skills which suggests that companies are looking for a “unicorn” with all the possible combination of skills required (or perhaps they don;t know what they should be looking for). However, in reality, most of these unicorns do not exist and its often more practical to build a data science team with individuals who have strengths in various domains of datascience.

Broadly speaking, data careers can be divided into data science and data engineering. Data science is more to do with analysing, visualising and deriving meaningful insights from data. Data engineering on the other hand is more concerned with building data pipelines to deal with large datasets, etc. Data engineering is more related to software engineering whilst data science is more suitable for candidates from physical sciences background (including maths, physics, quantitative biology/neuroscience, computer science, engineering, etc). In reality, this distinction is often not clear on job descriptions. Also, due to a shortage of talents, intersections between data science and engineering is quite common. See the four types of data scientists for more.



How do you land your first job as a data scientist?

Getting that first job as a data scientist can be a very important first step in your journey to becoming a competent data scientist. It can very challenging especially if you’re coming straight from university with little “real world” experience. It must be said that a PhD qualification alone will not give you an automatic entry into a data science role-you will have to demonstrate your capability as a data scientist. having said that, there are various routes to becoming a data scientist and you don’t necessarily need a PhD to get into one but you do have to demonstrate a breadth of skills.

  1. PhD route: A common route for landing a data science role is through a PhD qualification in a quantitative/scientific field such as physics, mathematics, engineering, neuroscience, biology, bioinformatics, computer science, etc. There are bootcamps that specifically recruit PhD graduates where they get to work on projects attached to companies either alone (ASI Fellowship) or in collaboration with other graduates (Science to Datascience). Some popular bootcamps include the ASI fellowships2ds and insight fellowship  programmes (note that the first two are based in the UK whilst the third is in the USA). For a more comprehensive list of data science bootcamps, see the following link.
  2. Masters/undergraduate route: Data scientists can be hired straight after an undergraduate or masters degrees into entry level data science roles. Its important to demonstrate your competencies through projects and if possible by doing an internship with a data-driven company. If going via the masters route, I would recommend a masters in machine learning. There are a few masters in data science but I feel these are probably too broad and non-specific.
  3. Portfolio/work experience route: This may be suitable for individuals who are already in industry and are wishing to move into data science roles. People in this group may typically come from analyst, software engineering, business intelligence roles. The key is to develop a portfolio of data science projects which can be uploaded to github, etc for employers to see. Other ways to show competence include contributing to data science open source projects.


So the question is: How do you land that first job?

  1. Know the basics of data science theory very well. This includes mathematics/statistics (linear algebra, calculus, numerical optimization, regression, algorithms, etc), programming (at the very minimum-python or R), machine learning algorithms, visualisation, some familiarity with big data tools (scala, spark). If interested in Data Engineering, its crucial to get familiar with the big data tools-scala, spark, hadoop. As a general rule, its important to be comfortable with at least one of the data wrangling tools (R or python)-this means at least 10,000 hours of coding in that particular language.
  2. Demonstrate your interest by undertaking a data science project in your spare time. Find a question that you can address using online data and showcase your work on a github account. If you’re already studying for a masters or a PhD, try to demonstrate your data science interests through your projects.
  3. If possible get some relevant industry experience related to data science and to the industry of your choice. Domain knowledge is crucial in being an effective data scientist. This can be acquired through an internship, kaggle competition, hackathon or through previous work experience.
  4. Network within the data science community by attending meetups, conferences and arranging meetings with data driven companies of interest.
  5. Keep up to date with the field by reading new articles, publications and algorithms. I personally follow data elixir, and other twitter accounts of prominent data scientists.


Lastly (and it goes without saying), apply to roles of interests. Some roles are not widely advertised and this is where networking becomes important. There are several roles out there but its important to scrutinize job specifications carefully to ensure they are truly datascience roles. All the best and watch out for my next article on how I made the transition to becoming a data scientist.

Career options for maths PhD graduates

The aim of this blog is to educate Maths PhD students (as well as their lecturers, career counsellors and the general public) on potential careers in industry available to them. The good news is that maths/STEM PhDs are in great demand in very attractive careers.

There’s no doubt that a vast number of career options are open to graduates with mathematical talent and education. A good example is this list of 85 job descriptions of mathematicians working in industry. I like this list because it gives concise descriptions of the various roles and as such gives a great insight into the skills required to do them. For this reason, finding the right career can seem like “finding a needle in haystack” situation. This is the main reason why this blog was created.

To further complicate this, I have noticed that most mathematician roles in industry rarely carry the title “mathematician”. They are often called various other names including but not limited to the following:

1. Business analyst
2. Software Engineer
3. Computer scientist
4. Research associate
5. Data scientist
6. Operations researcher
7. Hydrologist
8. Basin modeller
9. Geologist
10. Statistician
11. Actuary
12. Cryptographer
13. Quantitative analyst
14. Financial Engineer

This is hardly surprising as applied mathematicians are so versatile and often find themselves in roles that are traditionally occupied by other science, technology and engineering graduates alike. Consequently, the general public do not have a true appreciation of what mathematicians really do apart from the glaringly obvious teaching of mathematics. The job titles are often a reflection of the work environment a mathematician may be working rather than the background/skills required to get the job done. Not surprisingly, a quick search on google looking for “mathematician jobs” in industry may not necessarily yield very much. My first advice for maths PhD graduates is not to dismiss any role based on its job title but to pay particular attention to the job description before making a decision. As a maths PhD student myself, I will be sharing my own discoveries which I hope will be of some use to other research students. I plan to update this page on a weekly basis so please do visit again for additional information.

1. The first resource I’m going to recommend can be found on the society for industrial and applied mathematics website. This website is packed with lots of useful resources on destinations of maths PhD students and a lot of case studies of mathematicians working in industry-definitely worth a look. You can also download their brochure and read in your spare time. It seems to me that a mathematician is a true jack of all trades and master of all! Alongside this, I would also highly recommend the American society of mathematics website.


2. This website is packed with lots of quality information about various career options for maths graduates. Although, some of the first few links are no longer working, don’t be put off by this. I really recommend the financial mathematics section for those interested in this line of work.


3. Internships: An internship experience can serve as a straight entry route which allows you to explore a company and indeed secure that dream job. Most internships for PhD students are paid and these are usually offered in the penultimate year. For example, here is a list of companies offering PhD students internships in quantitative finance. Some companies however offer off cycle internships meaning you can apply anytime even after completion of your PhD.

4. This is a general website from the university of manchester containing adverts for jobs outside academia-very useful if you don’t know where to start from.

List of Companies specifically recruiting PhD applicants

A Phd in mathematics is a very valuable qualification to have on a CV. However, as valuable as it is, not all companies specifically recruit PhD talents into industrial roles. This unfortunately means PhD graduates may have to compete with Masters or even undergraduates (very annoying if you ask me). Hence, it is worth targeting companies with specific roles and schemes for PhD graduates (Google, McKinsey and Co, PWC, companies recruiting quantitative analysts, pharmaceutical companies, data analysis and tech companies, GCHQ, etc)

All in all, I think it’s important to really make a list of skills that each PhD graduate feels they can offer and target companies who are after those skills. For example, google is very keen to employ PhD graduates who have strong skills in software development, machine learning and statistics.