Playing with StackOverflow dataset on Google BigQuery


Stackoverflow has a good new for this Christmas season: they have recently lauched their dataset into Google BigQuery.

Google BigQuery is a data warehouse part of Google’s Cloud Platform that uses the power of Google’s infraestructure to perform superfast queries on large datasets.

“Storing and querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google’s infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.”

Using BigQuery is as simple as login into Google cloud platform, access the console and start playing with the data.

stackoverflow dataset

In this example, we queried the age of the youngest Stackoverflow’s users.

Read the BigQuery documentation for more information about how to query the data. You can get inspired by these curious queries.

We can see that the size of the users table in the Stackoverflow dataset is of 985 MB containing more than 6 millions of rows. With the power of Google’s infraestructure we queried the entire table in a ridicously 2.4 seconds and 0.8 seconds using query caching. Complex queries in bigger datasets shows at glance the incredible
computational power that BigQuery offers at a relatively no cost (can there be any better?)

You can easily upload your own data and start querrying just in seconds.

Check out Google BigQuery pricing quota:

Keep in mind these usefull tips for when designign your data from our friends at Toptal


 ~Happy querying