Regression is used primarily if you wish to predict or explain numeric data

Regression | Photo by Hao Wang

This article is a continuation of series that last covered Bayes statistics.

Volatility is inherent within the stock market, but perhaps the rise of “meme stocks” is something that is beyond traditional comprehension. This article is not about such “stonks” — which is the ironic misspelling of stocks that encapsulates the new investing zeitgeist amongst Millennials (broadly speaking). Rather, I was pondering the fabulous rollercoaster of a ride of $GME as a timely example of something to apply linear regression to; one can apply a linear regression model to a given stock to predict its price in the future. …


Getting Started

Hard to believe there was once a controversy over probabilistic statistics

Visualization of Frequentists vs Bayesian debate
Visualization of Frequentists vs Bayesian debate
Frequentists vs Bayesians | Photo by Lucas Benjamin

This article builds on my previous article about Bootstrap Resampling.

Introduction to Bayes Models

Bayesian models are a rich class of models, which can provide attractive alternatives to Frequentist models. Arguably the most well-known feature of Bayesian statistics is Bayes theorem, more on this later. With the recent advent of greater computational power and general acceptance, Bayes methods are now widely used in areas ranging from medical research to natural language processing (NLP) to understanding to web searches.

In the early 20th century there was a big debate about the legitimacy of what is now called Bayesian, which is essentially a probabilistic way of…


Simple, straightforward, convenient.

Another Data Dimension | Photo by Rene Böhmer
Another Data Dimension | Photo by Rene Böhmer
Alternate Data Dimension | Photo by Rene Böhmer

No, not Twitter Bootstrap — this bootstrapping is a way of sampling data, and it is one of the most important to consider what underlies the variation of numbers, the variation of distributions, what underlies distributions. To that end, bootstrapping works really, really well. For Data Scientists, Machine Learning Engineers, and statisticians alike it is vital to understand resampling methods.

But why use resampling? We use resampling because we only have a limited amount of data — the limits of time, and economics, to say the least. What then is resampling? Resampling is when you take a sample, and then…


The continuing adventures of regularization and the eternal quest to prevent model overfitting!

Abstract representation of LASSO, Ridge, and Elasticnet regression
Abstract representation of LASSO, Ridge, and Elasticnet regression
LASSO, Ridge, and Elasticnet regression | Photo by Daniele Levis Pulusi

This article is a continuation of last week’s intro to regularization with linear regression. Lettuce yonder back into the nitty-gritty of making the best data science/ machine learning models possible with more advanced techniques on simplifying our models. How do we simplify our models? By removing as many features as possible. Why do I want to do this? I want to have a simpler model. Why do I want a simpler model? Well, because a simpler model generalizes better. What does generalization mean? It means that you can actually use it in the real world. …


The one where we correct overfitting

Regularization with Linear Regression
Regularization with Linear Regression
Regularization and Linear Regression | Photo by Jr Korpa

This article is a continuation of my series on linear regression and bootstrap and Bayesian statistics.

Previously I talked at length about linear regression, and now I am going to continue that topic. As I hinted at previously, I am going to bring up the topic of regularization. And what regularization does, is it simplifies the model. And the reason why you want to have a simpler model is that, usually, a simpler will model will perform better in most if not pretty much all of the tests that can be run.

N.B: It is not kosher to use training…


There were others who had forced their way to the top from the lowest rung by the aid of their bootstraps

Bootstrapping Linear Regression | Photo by Ahmad Dirini

This article builds on my Linear Regression and Bootstrap Resampling pieces.

For the literary-minded among my readers, the subtitle is a quote from ‘Ulysses’ 1922, by James Joyce! The origin of the term “bootstrap” is in literature, though not from Joyce. The usage denotes: to better oneself by one’s own efforts — further evolving to encompass metaphors for a series of self-sustaining processes that proceed without external help, the context we are likely most familiar with.

For data scientists and machine learning engineers, this bootstrapping context is an important tool for sampling data. For this reason, it is one of…


Oh boy, homoscedasticity

Multivariant Regression | Photo by Paweł Czerwiński

This article is a continuation of my previous one on Linear Regression.

It is important to reiterate from my last article about the error formulae in least-squares regression.


The Central Limit Theorem (CLT). Something that we likely learned in high school math (AP Stats for me). What I remember about it was that because of the CLT, the magic number for sampling was n = 30. Like many sleep-deprived teens, I nodded and jotted that down in my notebook as I sat in the back of the class, struggling to read the faded projector from the back of the class. As an aside I swear that this was among the last projectors in the entire school, with all my other classes having those fancy smart boards.

While I…


Distributed ledger technology, or the blockchain, is a very disruptive technology that most are unfamiliar with, yet like with the advent of internet 20 years ago it stands to be a very transformative and powerful tool for any business, especially the early adopters. The blockchain offers businesses a secure way of confirming and sending records without fear of changes or edits being made. This means records are kept secure, voting systems become immune to tampering, transparency exists in all transactions and smart contracts become a reality. All this while being completely scalable and customizable to the needs of the client.

What is a blockchain?


More and more states are legalizing cannabis, and yet significant challenges and obstacles remain at almost every conceivable level of government for those running a cannabis oriented business. Competitive advantages are hard to come by, but cannabis is not separate from the rest of the business world: it can be revolutionized by Big Data. The future of cannabis lies in harnessing Big Data and being able to pipe it into a modeling solution that can forecast trends and help you understand your customer base and respond to trends before your competition. …

James Andrew Godwin

Writer, Data Scientist and huge Physics nerd

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store