This article is a continuation of series that last covered Bayes statistics.

Volatility is inherent within the stock market, but perhaps the rise of “meme stocks” is something that is beyond traditional comprehension. This article is not about such “stonks” — which is the ironic misspelling of stocks that encapsulates the new investing zeitgeist amongst Millennials (broadly speaking). Rather, I was pondering the fabulous rollercoaster of a ride of $GME as a timely example of something to apply linear regression to; one can apply a linear regression model to a given stock to predict its price in the future. …

This article builds on my previous article about Bootstrap Resampling.

Bayesian models are a rich class of models, which can provide attractive alternatives to Frequentist models. Arguably the most well-known feature of Bayesian statistics is Bayes theorem, more on this later. With the recent advent of greater computational power and general acceptance, Bayes methods are now widely used in areas ranging from medical research to natural language processing (NLP) to understanding to web searches.

In the early 20th century there was a big debate about the legitimacy of what is now called Bayesian, which is essentially a probabilistic way of…

No, not Twitter Bootstrap — this bootstrapping is a way of sampling data, and it is one of the most important to consider what underlies the variation of numbers, the variation of distributions, what underlies distributions. To that end, bootstrapping works really, really well. For Data Scientists, Machine Learning Engineers, and statisticians alike it is vital to understand resampling methods.

But why use resampling? We use resampling because we only have a limited amount of data — the limits of time, and economics, to say the least. What then is resampling? Resampling is when you take a sample, and then…

Some wisdom transcends the ages!

This article provides an overview of time series analysis. Time series are an extremely common data type. A quick Google search yields many applications, including:

**Demand forecasting:**electricity production, traffic management, inventory management**Medicine:**Time-dependent treatment effects, EKG**Financial markets and economics:**seasonal unemployment, price/return series, risk analysis**Engineering/Science:**signal analysis, analysis of physical processes

For this article I will cover:

- Basic properties of time series
- How to perform and understand decomposition of time series
- The ARIMA model
- Forecasting

**References:** selection of references you can use to go deeper into time series analysis with Python:

This article is a brief continuation of my regression series.

So far the regression examples I have been illustrating have been numeric, of numbers: predicting a continuous variable. With the Galton family height dataset, we were predicting children’s height — a continuously varying parameter. Yet, note how a regression line fails to fit a binary outcome, a classification example:

This article is a continuation of last week’s intro to regularization with linear regression. Lettuce yonder back into the nitty-gritty of making the best data science/ machine learning models possible with more advanced techniques on simplifying our models. How do we simplify our models? By removing as many features as possible. Why do I want to do this? I want to have a simpler model. Why do I want a simpler model? Well, because a simpler model generalizes better. What does generalization mean? **It means that you can actually use it in the real world**. …

This article is a continuation of my series on linear regression and bootstrap and Bayesian statistics.

Previously I talked at length about linear regression, and now I am going to continue that topic. As I hinted at previously, I am going to bring up the topic of regularization. And what regularization does, is it simplifies the model. And the reason why you want to have a simpler model is that, usually, a simpler will model will perform better in most if not pretty much all of the tests that can be run.

N.B: It is not kosher to use training…

This article builds on my Linear Regression and Bootstrap Resampling pieces.

For the literary-minded among my readers, the subtitle is a quote from ‘Ulysses’ 1922, by James Joyce! The origin of the term “bootstrap” is in literature, though not from Joyce. The usage denotes: to better oneself by one’s own efforts — further evolving to encompass metaphors for a series of self-sustaining processes that proceed without external help, the context we are likely most familiar with.

For data scientists and machine learning engineers, this bootstrapping context is an important tool for sampling data. For this reason, it is one of…

This article is a continuation of my previous one on Linear Regression.

It is important to reiterate from my last article about the error formulae in least-squares regression.

The Central Limit Theorem (CLT). Something that we likely learned in high school math (AP Stats for me). What I remember about it was that because of the CLT, the magic number for sampling was n = 30. Like many sleep-deprived teens, I nodded and jotted that down in my notebook as I sat in the back of the class, struggling to read the faded projector from the back of the class. As an aside I swear that this was among the last projectors in the entire school, with all my other classes having those fancy smart boards.

While I…

Writer, Data Scientist and huge Physics nerd