COVID-19 Data Analysis Using Python

SKILLS YOU WILL DEVELOP

Learn step-by-step

  1. Importing the COVID19 dataset and preparing it for the analysis by dropping columns and aggregating rows.
  2. Deciding on and calculating a good measure for our analysis.
  3. Merging two datasets and finding correlations among our data.
  4. Visualizing our analysis results using Seaborn.
  • Introduction
  • Importing Covid19 dataset
  • Finding a good Measure
  • Importing and preparing World happiness report dataset
  • Merge two datasets and find correlations among your data
  • Visualize your results using Seaborn
import pycountry
import plotly.express as px
import pandas as pd
URL_DATASET = r'https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
df1 = pd.read_csv(URL_DATASET)
print(df1.head(3)) # Get first 3 entries in the dataframe
print(df1.tail(3)) # Get last 3 entries in the dataframe
  1. Date
  2. Country
  3. Confirmed
  4. Recovered
  5. Deaths
0 2020-01-22 Afghanistan 0 0 0
1 2020-01-22 Albania 0 0 0
1 2020-01-22 Algeria 0 0 0
12597 2020-03-31 West Bank and Gaza 119 18 1
12598 2020-03-31 Zambia 35 0 0
12599 2020-03-31 Zimbabwe 8 0 1
  1. Date
  2. Country
  3. Confirmed
  4. Recovered
  5. Dead

2. Select data for India

#### ----- Step 2 (Select data for India)----
df_india = df1[df1['Country'] == 'India']
print(df_india.head(3))

3. Plot data

  • The line of code: plt.rcParams[“figure.figsize”]=20,20 is meant only for Jupyter. So remove it if you are using some other IDE.
  • Notice the line of code: ax1 = plt.gca(). To ensure that both the plots i.e. for confirmed cases as well as for deaths are plotted on the same graph, we need to give to the second graph the ax object of the plot. So we use gca() to do this. (By the way, ‘gca’ stands for ‘get current axis’).
#  Author:- Anurag Gupta # email:- 999.anuraggupta@gmail.com
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
#### ----- Step 1 (Download data)----
URL_DATASET = r'https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
df1 = pd.read_csv(URL_DATASET)
# print(df1.head(3)) # Uncomment to see the dataframe
#### ----- Step 2 (Select data for India)----
df_india = df1[df1['Country'] == 'India']
print(df_india.head(3))
#### ----- Step 3 (Plot data)----
# Increase size of plot
plt.rcParams["figure.figsize"]=20,20 # Remove if not on Jupyter
# Plot column 'Confirmed'
df_india.plot(kind = 'bar', x = 'Date', y = 'Confirmed', color = 'blue')
ax1 = plt.gca()
df_india.plot(kind = 'bar', x = 'Date', y = 'Deaths', color = 'red', ax = ax1)
plt.show()

Creating an animated horizontal bar graph for five countries

1. Download the data

2. Create a list of all dates

3. Pick five countries and create an ax object

4. Write the call back function

5. Create FuncAnimation object

my_anim = animation.FuncAnimation(fig = fig, func = plot_bar,
frames= list_dates, blit=True,
interval=20)
  • fig, which must be given a fig object, which we created earlier.
  • func, which must be the call back function.
  • frames, which must contain the variable on which the animation is to be done. Here in our case, it will be the list of dates we created earlier.

6. Save the animation to an mp4 file

%matplotlib notebook
# Author:- Anurag Gupta # email:- 999.anuraggupta@gmail.com
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from time import sleep
#### ---- Step 1:- Download data
URL_DATASET = r'https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
df = pd.read_csv(URL_DATASET, usecols = ['Date', 'Country', 'Confirmed'])
# print(df.head(3)) # uncomment this to see output
#### ---- Step 2:- Create list of all dates
list_dates = df['Date'].unique()
# print(list_dates) # Uncomment to see the dates
#### --- Step 3:- Pick 5 countries. Also create ax object
fig, ax = plt.subplots(figsize=(15, 8))
# We will animate for these 5 countries only
list_countries = ['India', 'China', 'US', 'Italy', 'Spain']
# colors for the 5 horizontal bars
list_colors = ['black', 'red', 'green', 'blue', 'yellow']
### --- Step 4:- Write the call back function
# plot_bar() is the call back function used in FuncAnimation class object
def plot_bar(some_date):
df2 = df[df['Date'].eq(some_date)]
ax.clear()
# Only take Confirmed column in descending order
df3 = df2.sort_values(by = 'Confirmed', ascending = False)
# Select the top 5 Confirmed countries
df4 = df3[df3['Country'].isin(list_countries)]
# print(df4) # Uncomment to see that dat is only for 5 countries
sleep(0.2) # To slow down the animation
# ax.barh() makes a horizontal bar plot.
return ax.barh(df4['Country'], df4['Confirmed'], color= list_colors)
###----Step 5:- Create FuncAnimation object---------
my_anim = animation.FuncAnimation(fig = fig, func = plot_bar,
frames= list_dates, blit=True,
interval=20)
### --- Step 6:- Save the animation to an mp4
# Place where to save the mp4. Give your file path instead
path_mp4 = r'C:\Python-articles\population_covid2.mp4'
# my_anim.save(path_mp4, fps=30, extra_args=['-vcodec', 'libx264'])
my_anim.save(filename = path_mp4, writer = 'ffmpeg',
fps=30,
extra_args= ['-vcodec', 'libx264', '-pix_fmt', 'yuv420p'])
plt.show()

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store