How to code math equations?

Turn into math equations and formulas to python code

Bünyamin Ergen
10 min readNov 10, 2022

I will explain step by step how to code a math equation.

in order;
Basic
Intermediate
Advanced
Quantum :) (Maybe)

in order;
simple math operations,
equations with one unknown,
variance and standard deviation, confidence interval and Pearson correlation coefficient.
Last but not least, simple linear regression equation.

CONTENTS
1. Start with Basics

2. Intermediate
2.a. Variance
2.b. Standard Deviation (Sample)
2.c. Confidence Interval
2.c.a Confidence Level Value (Z-score)

3. Simple Linear Regression
3.a. β1 (slope)
3.a.a. Pearson correlation coefficient
3.b. β0 (intercept)
3.c R-Squared (Coefficient of determination)
3.c.a RSS (Residual Sum of Squares)
3.c.b TSS (Total Sum of Squares)
3.d Adjusted R-Squared

Resources

1. Start with Basics

I’m getting straight to the point, assuming everyone knows the Python math operators.
You can refer to 1st link in the Resources for more details.

x + 3 = 2

x plus 3 equals to 2

solution simple right ?

simple equation

let’s try another equation.
the following equation is twice of x plus 1 is equal to 5.
2x + 1 = 5
So let’s leave x alone. it looks like someone who wants to be alone. :)
So:
2x + 1 -1 = 5–1
2x = 5–1
2x = 4
x = 4/2

simple equation

2. Intermediate

Let’s jump from boring simple operations to some more fun stuff. :)
Variance , Standard Deviation , Confidence Interval , Pearson Correlation Coefficient

2.a. Variance

Variance is a measure of dispersion,
meaning it is a measure of how far a set of numbers is spread out from their average value.
More mathematical sentence;
Subtract the mean from each value,
Square each of the resulting values,
then sum all values.

Variance Equation / Formula

S2 = sample variance (S exponent 2)
Xi = the value of the one observation ( x sub i )
x̄ = the mean value of all observations ( x bar)
n = the number of observations
Σ = the summation symbol i.e. sum of observations (The upper case letter sigma)

Below subtract all values from the mean with a for loop.
And square all the values and add them all.
Finally, subtract 1 from the sample number and divide by the total number we find.

Variance code

2.b. Standard Deviation (Sample)

Standard deviation is a measure of the amount of variation or dispersion of a set of values.

Standard Deviation (Sample) Equation / Formula

s = sample standard deviation
N = the number of observations
x̄ = the observed values of a sample item
Σ = the summation symbol i.e. sum of observations (The upper case letter sigma)

If look closely at the equation in the link above,
can see that the standard deviation is actually the square root of the variance.

2.c. Confidence Interval

Confidence interval (CI) is a range of estimates for an unknown parameter.
A confidence interval is computed at a designated confidence level;
the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used.
The confidence level represents the long-run proportion of corresponding CIs that contain the true value of the parameter.
For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter’s true value.

More mathematical sentence;
Confidence Interval is the sample mean minus/plus the z-score multiply the standard deviation divided by the square root of the sample size.

Confidence Interval Equation / Formula

CI = confidence interval
x̄ = sample mean
z = confidence level value (z-score)
s = sample standard deviation
n = sample size

Confidence Interval code

Standard Deviation of x (s = sample standard deviation)

I explained how to find the standard deviation above.
But let’s go over it again.
All values are subtracted from the mean and the found values are squared.
Then add them all up and divide by the number of elements minus 1.
Then calculate the square root of the resulting number.

2.c.a Confidence Level Value (Z-score)

Confidence Level Value, standard score or z score is the number of standard deviations by which the value of a raw score
(i.e., an observed value or data point) is above or below the mean value of what is being observed or measured.
Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

Z-Score Equation / Formula
Z-Score

How is the Z score calculated? If we’re going to take the confidence interval as 95%, which can be taken as 99%, at 90%, the choice is up to you.
Then let’s look at the red area in the image above. That’s our safe zone. :)
That region represents 95%, that is, 0.95.
So what is the area left and right?
If the whole area is 1, subtract 0.95 from 1, we get these two areas,
so 1–0.95 = 0.5.
And if we divide that by two, we get the areas and on the right.
That’s 0.5/2 = 0.025.
Let’s do this with the formula.

Z-score

Let’s find the z-score.
Focus on blue area in the image below.
If the bottom of the whole curve is 1 and the area on the right is 0.025,
then the blue area on the left is 1–0.025.

Z-Score Code
Z-Score Table

If we find the number we found above from the table and add the numbers on the axis, we get the following number.

Z-Score code

Now that we have found all the unknowns, let’s put them in the equation and implamate.
Here implement the plus/minus expression in the equation to both sides of the distribution.
We use addition in the equation for the positive side and subtraction for the negative side.

look at the equation again

Confidence Interval Equation / Formula
Confidence Interval Code

3. Simple Linear Regression

Simple linear regression is a method used to predict the dependent variable with the help of the independent variable
when there is a linear relationship between a single independent variable and the dependent variable.

Simple Linear Regression Equation / Formula

Y = dependent variable
β0 = intercept
β1 = slope
X = independent variable
ϵ = random error

Of course, since want find the values to be predicted, should the equation as follows.
The only difference is, as you know, that hats which is mean predicted value and “ i “ letter means each value.
For more information, you can refer to the projection matrix.

Simple Linear Regression Equation / Formula
Simple Linear Regression

Create a dataset with a linear relation.
we can implement this from the datasets class in the sklearn module.

Now we have a data set of independent variables and dependent variable.
Since we will implement simple linear regression, let’s take one of the independent variables and assign it as x.

3.a. β1 (slope)

Dive into solving the equation!
we have x and y. And we know that they are in a linear relation.
then let’s find other unknowns
first β1 i.e. slope

β1 (slope) Equation

Let’s explain the above unknowns.
r = Pearson correlation coefficient
Sy = standard deviation of y
Sx = standard deviation of x

Let’s start with the simple first and find the standard deviations of x and y.
Since I explained how to find the standard deviation above, I’m going directly to the solution.

3.a.a. Pearson correlation coefficient

Now that we have found the standard deviations of x and y, we can move on to r, i.e. Pearson correlation coefficient.

Pearson Correlation Coefficient Equation / Formula

r = correlation coefficient
xi = values of the x-variable in a sample ( x sub i )
x̄ = mean of the values of the x-variable
yi = values of the y-variable in a sample ( y sub i )
ȳ = mean of the values of the y-variable

Let’s explain equation.
There is a vector operation the numerator of this fraction.
The first Σ (upper case sigma) sign means summation symbol, that is, we will summation the results.
Let’s take the first parenthesis for the numerator, subtract the mean of x from each x value,
Then move on to the next parenthesis, subtract the mean of y from each y value.
Then we will have two new arrays ,
Then multiply this two array ,
And then summation them all ,
And we will have found the numerator of the fraction of this equation.
Let’s implement the code for the numerator part first to avoid confusion.

Now let’s do the denominator part of the equation:
for the denominator, we’ll subtract the mean of x from each x value and square the resulting values and summation them all .
implement the same for y.
multiply two array each other.
Then take the square root of the result.

Now that have the numerator and denominator part of the equation, solve the equation.

Pearson Correlation Coefficient Equation / Formula
Pearson Correlation Coefficient Code

If you want, you can code directly without creating a variable as follows.
Personally, I like the following process more.
I don’t like creating variables for even the slightest thing.

Pearson Correlation Coefficient Other Code

Yes, now that we’ve found all the variables, let’s find our main operation, β1, i.e. the slope.
check equation again.

β1 (slope) Equation
β1 (slope) Code

3.b. β0 (intercept)

Now solve for β0

β0 (intercept) Equation / Formula
β0 (intercept) Code

3.c R-Squared (Coefficient of determination)

Now let’s find R-Squared, i.e. Coefficient of determination.

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced “R squared”, is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses,
on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

R-Squared (Coefficient of determination) Equation / Formula

R2 = coefficient of determination
RSS = sum of squares of residuals
TSS = total sum of squares

3.c.a RSS (Residual Sum of Squares)

Let’s go step by step and find the RSS first.

RSS (Residual Sum of Squares) Equation / Formula

RSS = residual sum of squares
y_i = each value of the dependent variable
f(x_i) = predicted values of the dependent variables

3.c.b TSS (Total Sum of Squares)

Now TSS

TSS = total sum of squares
n = number of observations
y_i = each value in a sample ( y sub i )
ȳ = mean value of a sample ( y bar )

Now that all the unknowns have been revealed,
find the R-Squared, i.e. Coefficient of determination

3.d Adjusted R-Squared

Let’s also find the adjusted R2 value to get a more accurate result.

R2 = Sample R-squared
n = sample size
p = number of independent variable

Since we are implementing the simple linear regression equation we naturally have one independent variable so p = 1.

As seen, actually need two things to code an equation.
First, understand the equation; meanings of unknowns and how to solve them.
Second, encode the equation according to the python syntax.

Of course, it is necessary to implement this all simple linear codes to a function or class. But maybe next time. :)
The next step can be multiple linear regression and other algorithms.

Let’s stay up until morning and code.
Then we’ll have a good sleep and maybe solve the quantum equations in our dreams. :)

Please note that some equations are solved in more than one way, just pick one and go for it.

Note: You can access all of the above codes in a single file from the link below.

https://github.com/bunyaminergen/how_to_code_math_equations/blob/main/math_equations.py

Thank you very much.

Bunyamin Ergen

— — — — — — — — — — — — — — — — — — — — — — — — — —

🌐 www.bunyaminergen.com

linkedin.com/bunyaminergen
github.com/bunyaminergen
kaggle.com/bunyaminergen
instagram.com/bunyaminergen
facebook.com/bunyaminergenoffical
twitter.com/bergenoffical
youtube.com/bunyaminergen

— — — — — — — — — — — — — — — — — — — — — — — — — —

Resources

https://en.wikibooks.org/wiki/Python_Programming/Basic_Math#Order_of_Operations

https://en.wikibooks.org/wiki/Python_Programming/Math

https://en.wikipedia.org/wiki/Equation

https://en.wikipedia.org/wiki/Standard_deviation

https://en.wikipedia.org/wiki/Algebra

https://en.wikipedia.org/wiki/Linear_algebra

https://en.wikipedia.org/wiki/Algorithm

https://tr.wikipedia.org/wiki/G%C3%BCven_aral%C4%B1%C4%9F%C4%B1

https://en.wikipedia.org/wiki/Confidence_interval

https://www.hec.ca/en/cams/help/topics/The_summation_symbol.pdf

--

--

No responses yet