This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to select data when you are processing data with python in this blog.
Here is a list of commonly used functions in pandas for selecting data we will cover today in this blog.
iloc[]
sample()
isin()
where()
query()
not()
get()
lookup()
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data analyst with great passion for data science and big data technology. This blog gonna to show you how to connect your Jupyter Notebook to the BigQuery by using a service account.
Prerequisite
Before we start, I assume you already have Jupyter Notebook installed on you computer and have a project on the google cloud platform with data saved in BigQuery.
In this blog, I created a project called “helloworld”, with a dataset called “example_dataset” in the my BigQuery. I am going to show you how to get the data from BigQuery on the google cloud from my Jupyter Notebook on my local computer, MAC to be more specific.
Let’s begin
Step 1: Enable BigQuery API on the Google Cloud Platform Console
Search for the BigQuery API by using the search function on the Google Cloud Platform console and enable it. You should see something like this after you enabled it:
Step 2: Get authentication file for your Jupyter Notebook
Option 1: Create your own default application login in terminal if you already have Cloud SDK installed
Open you terminal -> type in the command below.
gcloud auth application-default login
Make sure application_default_credentials.json is created
ls ~/.config/gcloud/
Option 2: A service account key for the BigQuery in a JSON File. Recommended for scripts which will be used as a service later on. e.g. running on a virtual machine
The json file of service account key is used to authenticate your Jupyter Notebook on your local PC which will access the BigQuery later on.
Go to GoogleCloudPlatform -> API&Services -> Credentials
Get a service account by Create Credentials -> Service Account key
Select account -> new service account, give a name to your new service account, here we name it as “bq-jupyternotebook”. To limit the permission of this service account, we choose “BigQuery Admin” as the role for this service account. We want full access of BigQuery, but only to BigQuery, not other google products in this project. As for detailed information about service account, please refer to Service Accounts in Google Cloud
Click on “Create”, then you get the key for this service account in a json file. Put this json file in a folder which you created for your project. In this blog, we put it in a “key” folder under a folder called “helloworld” and also renamed it as “helloworld-key.json”.
Step 3 : Install the Google BigQuery API Client Libraries for Python on your computer
The google BigQuery api client python libraries includes the functions you need to connect your Jupyter Notebook to the BigQuery.
Step 4 : Set the environment variable GOOGLE_APPLICATION_CREDENTIALS for your python script
Set the environment variable called “GOOGLE_APPLICATION_CREDENTIALS” for pointing your Notebook to your service account key which you just created in Step 2. This variable lets Google knows where to find your authentication credentials. Include the following code in your Jupyer Notebook.
For the option 1 in Step 2
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '~/.config/gcloud/application_default_credentials.json'
For the option 2 in the Step 2
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/Users/jinjing/Desktop/helloworld/key/helloworld-key.json'
Be careful with the path to the service account key json file, you need to specify the whole path. Look at the example:
Step 5 : Open your Jupyter Notebook, let’s get data from the BigQuery
Create a new Jupyter Notebook in your folder for your project, and look at the example code to see how it works. More examples for using Python Client for BigQueryFor option 1, you need to specify which project you are querying for, e.g.
Now you should have successfully connected your Jupyter Notebook to the BigQuery and gotten some data! Enjoy your data analysis and have fun!
This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about some common errors and good practice when you are processing data with python.
Here is a list of things we will cover today.
Common pitfalls and errors
Key Error
Index Error
Attribute Error
Best practices and tips
Get or set values quickly
Leftover DataFrames
Set data type for the columns in the dataset
Check and know your data before calculation
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to group your data and work with categorical data when you are processing data with python.
Here is a list of functions we will cover today.
DataFrame.groupby ( )
Grouped.agg ( )
Grouped.filter ( )
Pandas.DataFrame.rank ( )
Pandas.Series.cat ( )
Pandas.getdummies ( )
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to work with datetime data when you are processing data with python.
Here is a list of commonly used functions in pandas for dealing with datetime data we will cover today.
pd.Timestamp ( )
pd.Period ( )
pd.Timedelta ( )
pd.to_datetime ( )
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to work with text data when you are processing data with python.
Here is a list of commonly used functions in pandas for processing text data we will cover today.
split ( )
replace ( )
extract ( )
getdummies ( )
wrap ( )
partition ( )
swapcase ( )
capitalize ( )
rfind ( )
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to pivot and unpivot data when you are processing data with python in this blog.
Here is a list of commonly used functions in pandas for pivoting and unpivoting data we will cover today in this blog.
pivot_table ( )
stack ( )
unstack ( )
melt ( )
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn Pandas with me. I will talk about how to using Pandas to merge and reshape data when you are processing data with python in this blog.
Here is a list of commonly used functions in pandas for merging and reshaping data we will cover today in this blog.
merge ( )
join ( )
drop ( )
sort ( )
concat ( )
append ( )
Before we look at each function, we need to create an example dataset so that you could understand the function better through the examples. And import the pandas library as pd which is a name we will use in the following code. Then follows explanation, examples and exercise of each function listed above.
This is Jing, a data analyst with great passion for data science and big data technology. The EU General Data Protection Regulation (GDPR) is really a big thing that happens in this year. As a customer, feel more secure about my personal information online thanks to GDPR. Though as a data analyst, working with e-commerce, GDPR leads to some problems to my daily work. This blog is gonna to show you how to use User Deletion API provided by Google Analytics to delete users’ personal information to be GDPR compliant simply in a Google Spread Sheet.
Delete Bulk of Users with User Deletion API
Prerequisite
Before we start, I assume you have an “Edit” level access to a Google Analytics view of a website. Besides, I also assume you are familiar with Google Analytics, for example, you know what is client ID, segment, event label, customer dimensions and etc.
Besides, you need to make sure your Google Cloud Platform is activated. For this project, it almost cost you like nothing. So, don’t worry about it.
Use Case : Delete a list of Segmented Users
To make a better explanation, I continue to use the use case I made up earlier in the blog Get Google Analytics data to your Jupyter Notebook by using Analytics Reporting API . Let’s imagine a scenario where for certain kind of reasons, some users’ phone numbers are saved in the GA Event Labels during their online sessions because of some tracking setups. In earlier blog, I have shown you how to find the list of Segmented Users. Now, I am gonna show you how to delete them at one click to avoid potential GDPR issues in a Spread sheet!
Let’s begin!
Step 0 : Get the list of users’ client ID you want to delete
Case 1: You already have a list of users’ client ID you want to delete. Then skip this step.
Step 1 : Create a Google Spread Sheet to use User Deletion API
Here is a template Google Spread Sheet I’ve already created to put your list of client IDs. Feel free to make a copy of it to build your owns on. If you are not interested in how to create this sheet, then Congratulations! We are done here.
Otherwise, please follow in steps:
Create a Google Spread Sheet
Tools -> Script Editor -> Edit project name, I named it as “user_deletion_API”
Go to Resources -> Advanced Google Services -> Enable Google Analytics API
Google Analytics Reporting API
Go to Resources -> Cloud Platform project -> currently associated project
User Deletion API Project
Click on the project ID -> Google Cloud Platform -> APIs & Services -> Library
Google API Library
Search for Analytics API -> Enable it
Analytics API
Make it look nice
Rename current sheet as “Main” -> Create a new sheet “Result” -> Add a function to cell “B3”
Fill in your Google Analytics Property ID
Download a button you like from Google
Insert a image -> Assign script -> Type in the function name in Google APPS Script we created earlier. It calls “myFunction” in this case.
Assign Scrip
Finally it looks like below:
Click on “Start” Button, the scrip will run automatically
First time you run the script, it requires you to authenticate the script. Just follow the instruction from Google.
If it runs successfully, the result will show in “Result” sheet.
Result for User Deletion
Step 2 : Put the list of users’ client ID in a Google Spread Sheet
Case 1: If you find it is convenient enough to manually cope paste the list to the Google Spread Sheet. Then Congratulations! You can begin use the shared template Google Spread Sheet to delete users!
Now you should be able to delete a lot of users by one click! It might take some to setup all these things. But would you like to manually delete hundreds of users? Besides, what you have done here, you can easily reuse it next time!
This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible
This is Jing, a data analyst with great passion for data science and big data technology. As a data analyst, working with Excel or Google Spread Sheet is sometimes inevitable, considering the reality that a lot your colleagues are more comfortable with seeing data or results in Excel or Google Spread Sheet. This blog gonna to show you how to connect your Jupyter Notebook to Google Spread Sheet to make your workflow more automatic.
Prerequisite
Before we start, I assume you already have Jupyter Notebook installed on you computer.
Let’s begin!
Step 0 : Create a project and Enable Google Drive API
Go to APIs & Service -> Library, find the Google Drive API and enable it.
Step 1 : Create a Service Account for your Jupyter Notebook
Go to APIs & Services -> Credentials -> Create credentials -> Service account key -> New service account, named it as “Google_Sheet” -> Select role as “Service Account User” for this account -> select “JSON” -> click on “Create”. Download it, rename it as “service_account_GS.json” and put it in a folder you are comfortable with.
Step 2 : Create a Google Sheet and share it with Service Account E-mail
Open the “service_account_GS.json” you’ve just gotten. Find your “client_email”
Share the Google Sheet you just created with this “client_email”
Now you should see the Google Sheet is updated with the data you uploaded!
Congratulations! We are done here!
Now you should have successfully connect your Jupyter Notebook to a Google Sheet! Find more information about Google Spread Sheet at df2gspread.
This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible.
This is Jing, a data analyst with great passion for data science and big data technology. This blog gonna to show you how to using Google Analytics Reporting API to get Google Analytics data to your Jupyter Notebook or save to Excel.
Prerequisite
Before we start, I assume you already have Jupyter Notebook installed on you computer and have access to a Google Analytics view of a website. Besides, I also assume you are familiar with Google Analytics, for example, you know what is client ID, session, event label, customer dimensions and etc.
Use Case : Get a list of segmented users’ client ID
To make a better explanation, I made up a use case which may sounds silly. Let’s imagine a scenario where for certain kind of reasons, some users’ phone numbers are saved in the GA Event Labels during their online sessions because of some tracking setups. We want to find out these users’ client ID and then later on we want to delete them to avoid potential GDPR issues.
In this blog, I will show you how to find these users and get a list of these users’ Client ID in your Jupyter Notebook. And then later on, I will show you how to delete these users in GA with once click in another blog.
In this use case, your Jupyter Notebook is an application, sending request to Google Analytics server. So, your Jupyter Notebook is also a client to Google, which also have a Client ID. This might be confusing with client id in Google Analytics. Sorry for this.
Let’s begin!
Step 0 : Create a project and Enable Google Analytics Reporting API
Create a project for your application in Google Cloud Console if you do not have one. In this blog, I created a project called “user deletion”.
Go to APIs & Service -> Library, find the Google Analytics Reporting API and enable it.
Step 1 : Create the client credential for your Jupyter Notebook for requesting Google Analytics
Go to APIs & Services -> Credentials -> Create credentials -> OAuth client ID
Select Other, name our client ID as “Jupyter_Notebook”, click on Create -> OK
Then you can find your client credentials on the Credentials page. By clicking on the download button, you can download your client credential as a json file. Rename it as “client_secret.json” and put it in a folder you are comfortable with.
Step 2: Install the client library for your Jupyter Notebook
If you have done this before, skip this. Otherwise, Open you terminal -> type in the command below. More information about Google API Client Libraries for Python.
Step 3: Create a segment in Google Analytics to find the target users
Go to User Explorer -> Add Segment -> Conditions, write an Regex to find the users whose event labels have “phoneNumber =”. Then save this Segment as “phoneNumber”
In the repository, you have all the code you need. But there are some-places you need to modify to make this code work for you.
Step 5: Get the segment ID and Custom Dimension ID
Go to Query Explorer, authenticate ourselves and get the segment ID and Custom Dimension ID for the Client ID in Google Analytics. When we type in Client ID, we know Custom Dimension 42 (in our website) is set for Client ID.
Similar way to get the segment id for the segment we just created:
Step 6: Jupyter Notebook template for Getting data from Google Analytics
Now, open the Jupyter Notebook template and modify the code with the information you have gotten. For the first time to run the template, you will need to authorise your notebook in the web browser. Just follow the instruction from Google and go back to your Notebook , you will see it is done.
Now you should have successfully gotten Google Analytics data for this use case in your Jupyter Notebook! Besides, I will show you how to upload the data frame to a Google Sheet in another blog to avoid copy paste issues.
This is not something amazing, but hopefully will make your work a little bit easier! This is Jing, thanks for reading my blog! Feel free to commend and ask questions! I will get back to you as much as possible.