Featured

Connect your Jupyter Notebook to the BigQuery

Hi,

This is Jing, a data analyst with great passion for data science and big data technology. This blog gonna to show you how to connect your Jupyter Notebook to the BigQuery by using a service account.

Screen Shot 2018-11-27 at 12.57.17

Prerequisite

Before we start, I assume you already have Jupyter Notebook installed on you computer and have a project on the google cloud platform with data saved in BigQuery.

In this blog, I created a project called “helloworld”, with a dataset called “example_dataset” in the my BigQuery. I am going to show you how to get the data from BigQuery on the google cloud from my Jupyter Notebook on my local computer, MAC to be more specific.

Let’s begin

  • Step 0: Enable BigQuery API on the Google Cloud Platform Console

Search for the BigQuery API by using the search function on the Google Cloud Platform console and enable it. You should see something like this after you enabled it:

Screen Shot 2018-11-26 at 16.52.28
  • Step 1 : Get a service account key for the BigQuery in a Json File

The json file of service account key is used to authenticate your Jupyter Notebook on your local PC which will access the BigQuery later on.

Go to GoogleCloudPlatform -> API&Services -> Credentials

Screen Shot 2018-11-26 at 11.25.49

Get a service account by Create Credentials -> Service Account key

Screen Shot 2018-11-26 at 11.26.48

Select account -> new service account, give a name to your new service account, here we name it as “bq-jupyternotebook”. To limit the permission of this service account, we choose “BigQuery Admin” as the role for this service account. We want full access of BigQuery, but only to BigQuery, not other google products in this project. As for detailed information about service account, please refer to Service Accounts in Google Cloud

Screen Shot 2018-11-26 at 17.54.50

Click on “Create”, then you get the key for this service account in a json file. Put this json file in a folder which you created for your project. In this blog, we put it in a “key” folder under a folder called “helloworld” and also renamed it as “helloworld-key.json”.

Screen Shot 2018-11-26 at 15.59.52
  • Step 2 : Install the Google BigQuery API Client Libraries for Python on your computer

The google BigQuery api client python libraries includes the functions you need to connect your Jupyter Notebook to the BigQuery.

Open you terminal -> type in the command below. More information about Google BigQuery API client library

pip install google-cloud-bigquery
  • Step 3 : Set the environment variable GOOGLE_APPLICATION_CREDENTIALS for your python script

Set the environment variable called “GOOGLE_APPLICATION_CREDENTIALS” for pointing your Notebook to your service account key which you just created in Step 1. This variable lets Google knows where to find your authentication credentials. Run this before you start your Jupyter Notebook every time.

Type in the command below in your terminal:

export GOOGLE_APPLICATION_CREDENTIALS="/Users/jinjing/Desktop/helloworld/key/helloworld-key.json"

Be careful with the path to the service account key json file, you need to specify the whole path. Look at the example:

Screen Shot 2018-11-26 at 11.37.31
  • Step 4 : Open your Jupyter Notebook, let’s get data from the BigQuery

Create a new Jupyter Notebook in your folder for your project, and look at the example code to see how it works. More examples for using Python Client for BigQuery

Screen Shot 2018-11-27 at 10.34.48

Congratulations! We are done here!

Now you should have successfully connected your Jupyter Notebook to the BigQuery and gotten some data! Enjoy your data analysis and have fun!

This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible.