Loading data from CSV to BigQuery

In one of the previous posts, you learnt to load files into the Google Cloud Storage.

Now let us use the files in the Google Cloud Storage as a data source for the tables in Google BigQuery.

For this, open the BigQuery Web GUI by signing in to your Google Cloud Platform account and clicking on the BigQuery on the left side panel.

This will open the BigQuery Web Interface as below –

BigQuery_WebUI

Now click on the small dropdown arrow next to your project name.

Then click on Create new dataset. Give a name to your dataset and click on OK.

Click on the small plus symbol seen over this dataset.

A new popup is seen with heading as ‘Create and Import’. This wizard will take you through the steps of loading a CSV file from Cloud Storage. Give a name to the table (in the Table ID textbox) that you want to create.

CI_BQ

Click on Next.

Now on the next page of this wizard, you will get the option to Select the data source.  First select the format of data (CSV, JSON or AppEngine Datastore Backup). In this case, select CSV.

In the section ‘Load data from’ – select Google Cloud Storage.

Now specify the path to your file from Cloud Storage in the following format.

gs://<BucketName>/<FileName>.<extension>

In this example, my file name is FL_insurance_sample.csv and is stored in the bucket named bq_testbucket. The path will look like –

gs://bq_testbucket/FL_insurance_sample.csv

Click on next.

Now specify the schema of the table. Here you need to specify the name and type of field as a comma separated list with format as field_name_1:data_type, field_name_2:data_type, … field_name_n:data_type

The valid data types that are allowed are – “string”, “integer”, “float”, “boolean” and “timestamp”.

So in the above example my schema string will look like –

policyID:integer,statecode:string,county:string,eq_site_limit:float,hu_site_limit:float,fl_site_limit:float,fr_site_limit:float,tiv_2011:float,tiv_2012:float,eq_site_deductible:float,hu_site_deductible:float,fl_site_deductible:float,fr_site_deductible:float,point_latitude:float,point_longitude:float,line:string,construction:string,point_granularity:integer

Click on next. In the last section of the wizard you will see some advanced options. Here you may specify if your file contains a header row by specifying the value for ‘Header rows to skip’.

Click on Submit.

At this point, a job is triggered that will perform the loading of this CSV format data into Google BigQuery.

It might take some time depending on the file size, once the job is finished the table is now available to be queried and analyzed with Google BigQuery.

The table details will look as below once it is ready.

Table Details

It show some information related to the table like the size, number of rows etc. It also shows a preview data of a few rows from the table.

With that you just loaded a CSV file into Google BigQuery, ready to crunch some data out of it !!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s