Featured

Connect your Jupyter Notebook to the BigQuery

Hi,

This is Jing, a data analyst with great passion for data science and big data technology. This blog gonna to show you how to connect your Jupyter Notebook to the BigQuery by using a service account.

Screen Shot 2018-11-27 at 12.57.17

Prerequisite

Before we start, I assume you already have Jupyter Notebook installed on you computer and have a project on the google cloud platform with data saved in BigQuery.

In this blog, I created a project called “helloworld”, with a dataset called “example_dataset” in the my BigQuery. I am going to show you how to get the data from BigQuery on the google cloud from my Jupyter Notebook on my local computer, MAC to be more specific.

Let’s begin

Step 1: Enable BigQuery API on the Google Cloud Platform Console

Search for the BigQuery API by using the search function on the Google Cloud Platform console and enable it. You should see something like this after you enabled it:

Screen Shot 2018-11-26 at 16.52.28

Step 2: Get authentication file for your Jupyter Notebook

Option 1: Create your own default application login in terminal if you already have Cloud SDK installed

Open you terminal -> type in the command below.

gcloud auth application-default login

Make sure application_default_credentials.json is created

ls ~/.config/gcloud/

Option 2: A service account key for the BigQuery in a JSON File. Recommended for scripts which will be used as a service later on. e.g. running on a virtual machine

The json file of service account key is used to authenticate your Jupyter Notebook on your local PC which will access the BigQuery later on.

Go to GoogleCloudPlatform -> API&Services -> Credentials

Screen Shot 2018-11-26 at 11.25.49

Get a service account by Create Credentials -> Service Account key

Screen Shot 2018-11-26 at 11.26.48

Select account -> new service account, give a name to your new service account, here we name it as “bq-jupyternotebook”. To limit the permission of this service account, we choose “BigQuery Admin” as the role for this service account. We want full access of BigQuery, but only to BigQuery, not other google products in this project. As for detailed information about service account, please refer to Service Accounts in Google Cloud

Screen Shot 2018-11-26 at 17.54.50

Click on “Create”, then you get the key for this service account in a json file. Put this json file in a folder which you created for your project. In this blog, we put it in a “key” folder under a folder called “helloworld” and also renamed it as “helloworld-key.json”.

Screen Shot 2018-11-26 at 15.59.52

Step 3 : Install the Google BigQuery API Client Libraries for Python on your computer

The google BigQuery api client python libraries includes the functions you need to connect your Jupyter Notebook to the BigQuery.

Open you terminal -> type in the command below. More information about Google BigQuery API client library

pip install google-cloud-bigquery

Step 4 : Set the environment variable GOOGLE_APPLICATION_CREDENTIALS for your python script

Set the environment variable called “GOOGLE_APPLICATION_CREDENTIALS” for pointing your Notebook to your service account key which you just created in Step 2. This variable lets Google knows where to find your authentication credentials. Include the following code in your Jupyer Notebook.

For the option 1 in Step 2

import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '~/.config/gcloud/application_default_credentials.json'

For the option 2 in the Step 2

import os
 os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/Users/jinjing/Desktop/helloworld/key/helloworld-key.json'

Be careful with the path to the service account key json file, you need to specify the whole path. Look at the example:

Screen Shot 2018-11-26 at 11.37.31

Step 5 : Open your Jupyter Notebook, let’s get data from the BigQuery

Create a new Jupyter Notebook in your folder for your project, and look at the example code to see how it works. More examples for usingĀ Python Client for BigQuery For option 1, you need to specify which project you are querying for, e.g.

bigquery_client = bigquery.Client(project='prodounf-yew-217512')
Screen Shot 2018-11-27 at 10.34.48

Congratulations! We are done here!

Now you should have successfully connected your Jupyter Notebook to the BigQuery and gotten some data! Enjoy your data analysis and have fun!

This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible.