The One-Definition Rule: Introducing and explaining include guards.

Photo by Bruce Warrington on Unsplash

You’ll find macros #ifndef and #define at the top of pretty much every C/C++ header file. However, I could not find any information about this formatting in any books or tutorials I learned from, and if they were, there was no explanation! So what are they?

For a while, I just accepted this was part of coding standards for formatting or readability and just used it myself. It was only when I took an interest in understanding compilers that I realised the purpose and importance of header guards.

Include guards are a construct used at the top of header files…

How to set random seeds for individual classes in Python

Photo by Dominika Roseclay from Pexels

Using np.random.seed(number) has been a best practice when using NumPy to create reproducible work. Setting the random seed means that your work is reproducible to others who use your code. But, now when you look at the Docs for np.random.seed, the description reads:

This is a convenience, legacy function.

The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons.

So what’s changed? This post will explain the old method, issues with it and then show and explain the new best practice and the benefits.

Legacy Best Practice

A full end to end tutorial to create a dataset without web scraping!

Photo by Taryn Elliott from Pexels

Data scientists love creating models and competing to slightly improve accuracy on datasets. By now, so many data scientists have used these popular datasets that it has become difficult to learn anything new from them. It also stops data science from being a problem solving subject to a more engineering subject: slightly tweaking models to slightly improve scores.

To do something unique in data science, you will have to create a dataset yourself and solve a new problem! Because most of us data scientists do not know much about data engineering or web scraping, this guide will show you how…

Customise your Jekyll site to show off your portfolio.

Photo by @jesuskiteque on Unsplash.

It is becoming more and more important to have an online presence. From data scientists, artists, writers all the way to small businesses; having a portfolio website is becoming critical to success.

Jekyll is a static site generator to help create blogs or as Tom Preston-Werner (the developer) would say to blog “like a hacker”. Because Jekyll powers Github pages it has become a popular solution for many blogs and portfolios.

This post will not go into how to create a Jekyll site as there are many tutorials online to do this (like this). However, this will be a short…

How to make complex neural networks without overfitting!

Photo from Startup Stock Photos.


Choosing the right architecture for your deep learning model can drastically change the results achieved. Using too few neurons can lead to the model not finding complex relationships in the data, whereas using too many neurons can lead to an overfitting effect.

With tabular data it is usually understood that not many layers are required, one or two will suffice. To help understand why this is enough look at the Universal Approximation Theorem, which proves (in simple terms) that a neural network with one layer and a finite number of neurons can approximate any continuous function.

However, how do you…

How to statistically find the required sample size to make accurate and high confidence generalisations. (With examples!)

Photo by Morning Brew on Unsplash.


Finding the optimal sample size can be important for many different contexts, from collecting voting intentions in an election to assessing the quality of machinery in a company. Finding a sample which best represents a population can help reduce costs and time while also providing conclusions which can be applied to the entirety of the population.

A way to help understand this is from George Gallup who was a pioneer of survey sampling techniques.

“If you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. …

An introduction to Cling to help learn C++

Image taken from

When studying mathematics at University I was introduced to both Python and R in a statistics module. Since then I have only stuck to those two languages and only dabbled in other languages when needed.

Recently, I have wanted to improve my programming foundations and learn more about the underlying concepts that we data scientists take for granted with Python and R, while also finding some ways to improve my workflows. And so I undertook the challenge of learning C++, doing this then led me to find Cling.


Cling is an interactive interpreter for C++ which helps give a similar…

Step by step guide to implementing an autoencoder in fastai.

Autoencoder Architecture. Image made using NN-SVG.


fastai is a deep learning library that simplifies training neural networks using modern best practices [1]. While fastai provides users with a high-level neural network API, it is designed to allow researchers and users to easily mix in low-level methods while still making the overall training process as easy and accessible to all.

This post is going to cover how to set up an autoencoder in fastai. This will go through creating a basic autoencoder model, setting up the data in fastai, and finally putting all this together into a learner model.

Note: a basic understanding of fastai and PyTorch…

Henri Woodcock

UK Based Data Scientist \\ Personal website: \\ The opinions expressed are my own views and not my employer. \\ @henriwoodcock

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store