Wrap your brain around one of Pandas’ most powerful tools for statistical analysis.

Photo by Pascal Müller on Unsplash

In this tutorial you will learn how to use the Pandas dataframe .groupby() method and aggregator methods such as .mean() and .count() to quickly extract statistics from a large dataset (over 10 million rows). You will also be introduced to the Open University Learning Analytics dataset

Pandas

Pandas is the most adorable and cuddly tabular data management library for Python. Once you get the hang of it its intuitive, object-oriented implementation and clever tricks to improve computational efficiency make for flexible and powerful data handling.

Pandas facilitates data mining, data processing, data cleaning, data visualization, and some basic statistical analysis on…


How I did not deploy my first SARIMA COVID-19 forecasting model using Dash, Plotly, and Heroku.

Photo by Ethan Hu on Unsplash

I did not deploy a SARIMA time series model using the statsmodels library that predicts future COVID-19 infection and death rates. Using Plotly to create interactive graphs of current and predicted case and death rates, allowing users to to decide which statistics to include, which countries or states to predict, and how far out to predict, I did not make a publicly accessible and interactive predictive website. I worked hard and learned a lot in not deploying this model to a Heroku server.

Here is my story.

In this walkthrough you will learn to deploy a website to Heroku that…


Photo by JESHOOTS.COM on Unsplash

The opportunities for humans to contribute to the work of the world are changing rapidly. Businesses growing to take advantage of these opportunities need workers with new skills. Programmers, data scientists, web developers, and leadership positions are hiring, but there are not enough folks with the right skills to fill the need. This is true of many industries.

Education is expensive. Traditional teachers have to be multi-talented, high educated, passionate, and hard-working. Teaching and assessing are done manually and at considerable expense of time and money. However, we live in a magical age where data driven, accessible, personalized, and effective…


Callbacks allow you to adjust settings or save your model during training.

Photo by Karsten Winegeart on Unsplash

In this article, you will learn how to use the ModelCheckpoint callback in Keras to save the best version of your model during training.

Modeling is Fun!

I love building predictive deep learning models. I love watching the training outputs, seeing the loss fall and watching for the diverging losses between training and validation sets that indicate overfitting. But sometimes a model finds a great solution…and keeps training to a solution that only works for the training set. Now, if I’m there, staring like it’s a fish tank, I can interrupt the training before too much damage is done. But, who wants to…


Using SpaCy pre-trained embedding vectors for transfer learning in a Keras deep learning model. Also, bonus, how to use TextVectorization to add a preprocessing layer to the your model to tokenize, vectorize, and pad inputs before the embedding layer.

Photo by Alexandra on Unsplash

In this article you will learn how to use SpaCy embedding vectors to create a pre-trained embedding layer for natural language processing models in Keras. This reduces training time for NLP models and transfers learning about words and their relationships from larger models.

Words

Words make up most of the world most of us live in. If you are reading this, I…


Getting started with forecasting quickly with the fbprophet library

Photo by Drew Beamer on Unsplash

Why Facebook Prophet?

The Challenges of Timeseries Forecasting

Timeseries forecasting is a complex art form. Many models are very sensitive to trends, cycles (called ‘seasons’) and changing magnitudes of fluctuations, and instead require stationary data, which lack these features.

This devastating disease has killed hundreds of thousands in the US and millions around the world and at this time continues to spread ever faster. Forecasting the rate of future infections can help hospitals and aid organizations plan and prepare for future needs.

A good example of data that not-stationary is the cumulative spread of COVID-19 in the United States over 2020. For this project I will be using…


Can we predict whether a student will pass an online course without knowing anything about who they are?

Photo by Frank Romero on Unsplash

Learning online has been a growing trend for decades now. In 2018, 35% of college students took at least one course online and 17% took all of their classes remotely (NCES study). With COVID-19 a reality, learning online has exploded and become a necessary health and safety issue for more people than ever. While students will eventually return to school, the industry has had opportunity, funding, and impetus to improve and expand. This will undoubtedly lead to a sharper rise in the importance of internet based learning in the post COVID future.

In my work as a teacher I leveraged…


Predictive analytics, human expertise, data mining, and empathy come together to improve graduation rates for tens of thousands of students, many the first in their families.

Photo by Element5 Digital on Unsplash

My first year of college was hard in so many ways. I had never lived away from home, my friends, family, and girlfriend were far away, and I didn’t know anyone. I was on my own for the first time and encountering some of the most difficult challenges I had yet faced. But, my struggles were invisible. I didn’t reach out to campus services, and they did not know I needed them. If you…


Photo by Xavi Cabrera on Unsplash

How fun is it to explore? As data scientists, we are all about discovery and interacting with data. Folium allows you and your audience to explore data with interactive maps, and it is quick and simple to set up.

Folium is a python library that allows you to combine the amazing data wrangling libraries of python and the beautiful mapmaking abilities of Leaflet.js. With just a few lines of code in your IPython Jupyter Notebook, you can produce eye-catching interactive maps to help your audience explore your data in a visual and geographical way. …


If you love Pandas and MatPlotLibs’s Pyplot, you’ll love GeoPandas!

Photo by Capturing the human heart. on Unsplash

Beautiful data deserves beautiful visualizations, and GeoPandas makes displaying geographical data easy for anyone who knows Pandas and MatPlotLib. It’s one thing to show your readers tables of data or bar charts, but putting it on a map makes your story more tangible.

GeoPandas comes with a handy starter shapefile, which can be loaded with:

shapefile = geopandas.datasets.get_path('naturalearth_lowres')
earth = geopandas.read_file(shapefile)

Shapefiles like these are available on the web for maps of just about anyplace. The files are tables with data and a column that defines the shape of and location of each region you want to plot.

For example…

Josh Johnson

I'm a data scientist with a background in education. I empower learners to become the folks they want to be.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store