1
3 Comments

21 Resources for Learning Math for Data Science

This is probably one of the biggest worries of those starting in the area of data science, learning/refreshing math.

Let’s be honest, most people didn’t do very well in math in school, maybe not even in college, and this is very scary and creates a barrier for those who want to explore this discipline called data science.

A few days ago I published a post here in Reddit and right here on our blog called “Study Plan for Learning Data Science Over the Next 12 Months”, where I gave some quarterly recommendations and made an emphasis on studying mathematics and statistics for this first quarter, and from which I received many questions about exactly which materials I recommended. Well, this post answers those questions. But before that, I want to give you a context.

Leaving aside the factors or reasons that have led most people to hate math, it is a reality that we need it in data science. For me, one of the biggest shortcomings I found in mathematics was its lack of applicability in the real world, I didn’t see a reason for intermediate and advanced mathematics, such as multivariate calculus. I confess that in school and college I didn’t like them for that reason, but I always did well and got good scores and averages above the majority (especially in statistics). But I still didn’t see how I could use a derivative or a matrix in the real world. I finally ended up as a software engineer and once I entered the world of data science I was able to make the connection between mathematics, statistics, and the real world.

On the other hand, it is important to clarify that we do not need a master’s degree in pure mathematics to do data science projects. As I mentioned in previous posts there is a big debate in the community about how much math we need to do a good job as data scientists.

We could say that data science is divided into two major fields of work: research and production

By research, we mean the part of research and development, which normally takes place within a large company (usually a tech company), or which has focused on cutting-edge technological issues (such as medical research). Or it is also an area that is developed within universities. This sector has very limited job offers.

  • The great advantage is the deep knowledge of algorithms and their implementations, as well as being a person capable of creating variations of existing algorithms, to improve them. Or even create new machine learning algorithms.
  • The disadvantage is the unpractical nature of their work. It is a very theoretical work, in which often the only objective is to publish papers and is far from the business use cases in general. For reference on this, I recently read this post on Reddit, I recommend you read it.

By production, we refer to the practical side of this discipline, where you’ll use generally and in your day to day job libraries such as scikit-learn, Tensorflow, Keras, Pytorch, and others. These libraries operate like a black box, where you enter data, you get an output, but you don’t know in detail what happened in the process. This also has its advantages and disadvantages, but it certainly makes life much easier when putting useful models into production. What I don’t recommend is to use them blindly, where you don’t have the minimum bases of mathematics to understand a little of their fundamentals and that is the objective of this post, to guide you and recommend you some valuable resources to have the necessary bases and not to operate blindly those libraries.

So if you decide to focus on Research and Development, you are going to need mathematics and statistics in depth (very in-depth). If you are going to go for the practical part, the libraries will help you deal with most of it, under the hood. It should be noted that most job offers are in the practical side.

Well, after the previous remarks, it is time to define which are the specific topics needed to have an initial basis in mathematics for data science.

  • Linear Algebra: This subject is important to have the fundamentals of working with data in vector and matrix form, to acquire skills to solve systems of linear algebraic equations, and to find the basic matrix decompositions and the general understanding of their applicability.

  • Calculus: Here it is important to study functional maps, limits (in case of sequences, functions of one and several variables), differentiation (from a single variable to multiple cases), integration, thus sequentially building a foundation for basic optimization. It is also important here to study gradient descent.

  • Probability theory: Here you should learn about random variables, i.e. a variable whose values are determined by a random experiment. Random variables are used as a model for the data generation processes we want to study. The properties of the data are deeply linked to the corresponding properties of the random variables, such as expected value, variance, and correlations.

Note: these subjects are much deeper than what I just mentioned, this is simply a guide of the subjects and resources recommended to approach mathematics in the field of data science.

Now that we have a better idea of the path we should take, let’s examine the recommended resources to address this topic. We will divide them into basic, intermediate, and advanced. In the advanced ones, we’ll have resources focused on deep learning

Basics: in this first section of resources we’ll recommend the mathematical basics. Mathematical thinking, algebra, and how to implement math with python.

Read all here:
https://www.datasource.ai/en/data-science-articles/21-resources-for-learning-math-for-data-science

  1. 2

    Nice, would also recommend checking out David Venturi's curriculum

    1. 1

      great.. I'll check it out!

Trending on Indie Hackers
Passed $7k 💵 in a month with my boring directory of job boards 56 comments How I got 1,000+ sign-ups in less than a month with social media alone 20 comments 87.7% of entrepreneurs struggle with at least one mental health issue 14 comments How to Secure #1 on Product Hunt: DO’s and DON'Ts / Experience from PitchBob – AI Pitch Deck Generator & Founders Co-Pilot 13 comments Competing with a substitute? 📌 Here are 4 ad examples you can use [from TOP to BOTTOM of funnel] 10 comments Are you wondering how to gain subscribers to a founder's X account from scratch? 9 comments