My Machine Learning Roadmap

I started off learning mathematics before code, there are various applications of mathematics. Particularly Applied Mathematics which led me to computer science, and most recently machine learning.

1. Mathematics

To deeply understand machine learning to the core a mathematics foundation is needed specifically; calculus, linear algebra, probability theory, discrete mathematics and statistics.

The library route of designing models is ok at first but not feasible as you need to understand what happens below the bonnet. This is effective if you want to tweak models that have been pre built for custom needs.

Example l1 and l2 lasso regression use different mathematical equations, changing your optimization problem, which might affect algorithmic fairness and bias issues.

There are numerous new machine learning methods rooted in partial differential equations and other mathematical equations.

2. Computer Science Fundamentals

Once I was inspired to be a computer scientist/software engineer, I then moved on to computer science an all its fundamentals.

You need to have a solid foundation in computer science from how data is stored, processed and the time taken to compute an algorithm. After learning these then you need to familiarize yourself with programming languages such as Python, R and C.

Python is the most widely used programming language in machine learning and data science. Resulting in a huge community, this leading us to a big support base and most machine learning libraries are built on Python.

Every programmer should know C, be it machine learning or web development, mobile engineering. The language is very essential for understanding what happens within a computer. This will enable you to optimize code for speed for machine learning algorithms that are fast.

3. SQL Foundation

During my journey of learning computer science fundamentals, I also learned SQL, and reverted back to it as a machine learning/data science tool.

One need to have a good understanding of Structural Query Language (SQL), to be able to manipulate and query data stored in a relational database.

4. Data Science Libraries

Numpy

Numpy is a scientific library used for mathematical computation, due to its speed.

Instead of using Python arrays, we can use Numpy which is faster as it is written in Python and C, as we all know C is a compiled language making Numpy very fast and powerful.

Pandas

A Python library mainly used for data analysis from various file formats, it is also very fast as it is written in C, Python and Cython . It also used for data manipulation.

Matplotlib

This is a plotting library that goes hand in hand with Numpy, we do the numerical computing with Numpy, then plot it with Matplotlib.

4. Read Research Papers

As I was new to the field I needed to familiarize myself with the jargon, how the industry is moving, challenges and different methods of problem solving.

With the above approaches you get to see how people train and test models, What are the standards within the field for rigorously training and testing models or simply checking how well a model works.

Reading also into fields that overlap into machine learning such as Neuroscience, Psychology, Economics, Physics became a norm to me. I got to see what problems they face and how machine learning could assist.

5. Deep Learning

Now that fundamentals where sorted I could move on to juicy stuff such as; learning neural networks, computer vision also known as Convolutional Network Networks , Recurring Neural Networks and Natural Language Processing

6. ML OPS

I was initially reluctant to learn this, as I thought why would a machine learning engineer want to familiarize themselves with Dev Ops.

Well think of this as Dev Ops applied to machine learning, how to use tools such as ML flow for the entire machine learning life cycle.

7. Projects

Models like transformers are a great start, as you reimplement these models yourself, you get a lot of intuition on how they work and how you can tweak them to suit your needs.

These models need datasets which can be accessed through websites such as Kaggle and Google data set search engine.

Another method that was emphasized to me was collecting raw data to learn how to clean it yourself.

6. Dev Ops and Big Data Analysis

Big Tech = Big Data, working for the Big Technology companies, you have to know how to integrate Dev Ops with machine learning, while adding a blend of big data analysis.

Tools such as Pyspark, Hardoop, Distributed Computing, Docker, containerizing applications, CI/CD with Jenkins and of course noSQL database understanding such as Cassandra DB.

7. Resources For Machine Learning

https://ai.google/education/

https://www.coursera.org/learn/machine-learning

https://www.w3schools.com/sql/

https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops