Using D3.js – part 2

 

Last week, we made a simple interactive bar chart using the javascript library, D3.js. Today, we are going to build on this by making a simple scatterplot and adding axes. We will build this from the beginning to review all steps of the process.

First we need to our data. We will be using come data compiled by Nate Silver’s 538. In his story “Do Pulitzers Help Newspapers Keep Readers,” Nate Silver uses a scatterplot to look at the relationship between the number of Pulitzer winners and finalists and newspapers’ change in circulation during the last 10 years.

We are going to pull their data and look at it in a slightly different way. Instead of looking at the change in circulation, we are just going to look at current circulation and compare it to the number of Pulitzer winners and finalists during the last 10 years. I guess our headline would be “Do the Biggest Newspapers Win the Most Pulitzers.”

We can grab the data they compiled from their GitHub site. We could actually just tell D3 to read a CSV file and point it to the raw file hosted on Github, but that adds a few extra steps beyond the scope of this example.

Regardless, I have pulled together the data we need. Remember, we start by creating a variable called “dataset” and fill it with our data. For the bar graph, we used an array: [value 1, value 2]. This time we will use an array of arrays (see below). In the first row, we create the variable and open the array. In the second row, we enter another ray with two values 20 and 2.5. Twenty is the number of Pulitzers (or finalists) this organization has had and is the value for the y-axis; 2.5 is the circulation (in millions) and the value on the x-axis).

var dataset = [
 [20, 2.5],
 [62, 1.9],
 [1, 1.7],
 [41, 0.7],
 [2, 0.6],
 [2, 0.5],
 [0, 0.5],
 [48, 0.5],
 [1, 0.4],
 [8, 0.4],
 [15, 0.4],
 [6, 0.4],
 [6, 0.4],
 [3, 0.4],
 [2, 0.3],
 [6, 0.3],
 [11, 0.3],
 [7, 0.3],
 [8, 0.3],
 [4, 0.3],
 [2, 0.3],
 [2, 0.2],
 [15, 0.2],
 [5, 0.2],
 [5, 0.2],
 ];

Next, we need to define the size of our graphic. This is as simple as creating a variable for the width and the height.

var w = 700;
var h = 500;

Now we can create the graphic. First, we need to create the wrapper for the graphic (this is the space on the page where the graphic will be inserted.

var svg = d3.select("body")
 .append("svg")
 .attr("width", w)
 .attr("height", h);

Basically, the above code says, “Use D3 to append an ‘SVG’ or Scalable Vector Graphic to the body and make it ‘w’ width and ‘h’ height.” Now we just need to tell D3 what to put in that space.

svg.selectAll("circle")
 .data(dataset)
 .enter()
 .append("circle")
 .attr("cx", function(d){
 return d[0];
 })
 .attr("cy", function(d) {
 return d[1];
 })
 .attr("r", 5);

This code starts similar to the last example by appending a circle to each item in our dataset. Then it adds some attributes to the circles with the “.attr” class.

The first attribute is “cx” or the position on the x axis. We pull the value for this attribute by using the magical “d” variable. Remember, the “d” variable, when passed into a function, will automatically cycle through our whole dataset creating a value for each item. In our last example we used it to define the height of each bar. Now we are using it to define the position on the x axis. The value the function returns is d[0] or the value in the zero position – which is the first number – for each item in our dataset.

Next we do the same thing for the “cy” attribute. Except we return d[1] or the second number in our array. Finally, we must define the radius of our circle. For now we will just give each circle a radius of 5 px.

We can now see our amazing graphic.

WHAT?!?! What happened?

Here’s the code. What is going wrong?

55490060

Did you figure it out?

Yep! It’s that we need to scale the variables! Remember in the bar chart, we used the following code to scale up the height.

.style("height", function(d) {
	var barHeight = d * 5;
	return barHeight + "px";
});

We multiplied the data by 5 to make the bars tall enough for the page. This code worked, but we can also have D3 do this for us. We should have D3 do it for three reasons:

  1. Works better for axes (that we’ll be doing soon enough)
  2. It’s easier. When we have a lot of data its not always easy to guess what you should multiply by to make it look nice.
  3. It’s adaptable. If we end up needing to add data to our chart, it might throw of our manual scaling. If we automate it, D3 will do it all for us.

To automate the scaling all we need to do is write a function which can create the magnitude for the scale and then change how the attribute creates the x and y-dimensions. We’ll start with the function:

 var xScale = d3.scale.linear()
   .domain([0, d3.max(
     dataset, function(d){
       return d[0]; 
     })])
   .range([0, w]);

What we’re doing here is assigning a function to the variable “xScale.” The function figures out the correct scale in order to fit all of our data in the space we allotted in the wrapper (i.e., 700 x 500 pixels). It does this through setting a domain and a range. The domain is values in our data; the range is the values of our output. In our example, the domain is 0 to 62 and the range is 0 to 700.

The above code calls the linear scale function built in to D3. Then we assign the domain and the range. The domain is set to 0 and the maximum value in our dataset. Then the range is set to 0 and the width of our space. Our data only goes up to 62, but if we all the sudden need to add a 100 to the data, D3, using the max function will find that value and adjust the scale properly.

We do the same thing for the y-axis.

 var yScale = d3.scale.linear()
 .domain([0, d3.max(
 dataset, function(d){
 return d[1]; 
 })])
 .range([h , 0]);

Then all we need to do is update the attributes in the code that actually draws the graphic.

 .attr("cx", function(d){
    return xScale(d[0]); 
 })
 .attr("cy", function(d) {
    return yScale(d[1]);
 })

You’ll notice that this isn’t much different than the last time around. All we added was instead of directly returning the value from our dataset, we run it through the function we created so it is properly scaled.

We can now see our updated amazing graphic.

What is wrong now?? And why is it happening?

Here is the code.

55490060

Did you figure it out?

You’re right…our circles on the edge are getting cut off. That’s because they are falling outside of the space we set up for our wrapper. For example, one of our data points has a value of zero. That means the center of the circle is on the “0” line, which is the very edge of the wrapper. So half the circle falls outside the wrapper.

We fix that by putting a little padding around the chart. We start by adding a variable to define the width of the padding. We don’t need much.

var padding = 30;

Then we change the scaling functions to incorporate the padding. Specifically, we need to change the range.

 var xScale = d3.scale.linear()
 .domain([0, d3.max(
 dataset, function(d){
 return d[0]; 
 })])
 .range([padding, w - padding]);

All that I changed was the last line of code. It was “0” and “w” or 700. Now it is “padding” or 30 and “w – padding” or 700 – 30.

You do the same thing for the yScale variable. You can see what it ends up look like here. Here is the current full code.

It really is starting to look nice, but we need to add some axes to this chart.

First we start by creating a variable that calls D3’s axis function. Like so:

 var xAxis = d3.svg.axis()
   .scale(xScale)
   .orient("bottom")
   .ticks(5);
 
 var yAxis = d3.svg.axis()
   .scale(yScale)
   .orient("left")
   .ticks(5);

So we call the function. Then tell it to use the scales we already set up, orient the axis on the bottom and left, respectively, and break the graphic up into 5 ticks, or major units in Excel parlance.

Next, we have to actually draw the lines on to the SVG we already created.

 svg.append("g")
 .attr("transform", "translate(0," + (h - padding) + ")")
 .call(xAxis);
 
 svg.append("g")
 .attr("transform", "translate(" + padding + ", 0)")
 .call(yAxis);

With the above code, we are adding to our SVG. Unlike the “circle” or “bar” we already drawn, we are now using the “g” class to draw a whole group of things, (e.g., lines, text). Then we use the call function to pull the math axis info we already defined. Finally, we add an attribute to move the axes to where we want them. It should look like this now:

Screen Shot 2015-03-31 at 1.58.30 AM

 

Next, we can style the axes using CSS and adding an attribute. The styling is just like styling the bars in the example from Thursday.

<style type="text/css">
 
 .axis path, 
 .axis line {
 fill: none;
 stroke: black; 
 shape-rendering: crispEdges;
 }
 
 .axis text {
 font-family: sans-serif;
 font-size: 11px;
 }
 </style>

Then we apply it via another attribute.

.attr(“class”, “axis”)

Finally, we have a good looking graphic with axes. Anything weird?

Here is the current code.

Using D3.js

D3 (or Data-Driven Documents) is an open-source JavaScript library for making data visualizations. Pretty cool, eh?

Oh…you’re asking yourself, “what is an open-source JavaScript library?” Well, the first part, open source, means that the source code is publicly available. In many cases, open-source software can be freely downloaded, edited and re-published. For more information about open-source software, check out this annotated guide.

The second part, javascript library, means that it is a collection of JavaScript functions that you can reference in your code. Basically, it is a bunch of code that other people wrote so you don’t have to! All you have to do is point to the library and tell it what you want to do.

Pointing to the library is easy. You just use the <script> tag and tell the browser what script you want. Generally, you can either host the library on your server or point to the library on the creators server. If you point to their server, you’ll automatically get updates (depending on their naming/workflow), which is good and bad. It is good in that you are using the newest software. It is bad in that they might update something in a way that ruins your project. I personally lean toward hosting on my server.

To host on your server:

  1. Download the library from the D3 website.
  2. Upload the library to your server
  3. Point to the library using the following code:
<script src="/d3/d3.v3.min.js" charset="utf-8"></script>

To leave it on their server:

  1. Just insert this code:
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>

We have successfully navigated step one. Our browser will know that it needs to use JavaScript, because of the <script> tag, and it will load the correct JavaScript library, because we told it where the library is by using the src attribute.

Now we can move to step 2. Actually making a graphic using the library. To do this, we can just put directions in between the opening and closing <script> tags (which sounds easy).

The first thing we have to understand about D3 is that we are using code to create everything in the chart. This is amazing, because it is highly adaptable and lightweight. This is also a drawback, because that means that there is a steep learning curve and can be a bit daunting at the beginning. Let’s start by looking at a chart and break it down to it’s various elements.

Medal of Honor recipient origin: Top 5 states

Medal of Honor recipient origin: Top 5 states

 

What do we need to create this graphic?

  1. Data
  2. Type of chart
  3. Axes and labels
  4. Color coding

We are going to need to explain that all to D3.

First, lets deal with the data. You can get data to D3 in numerous ways. For now we will enter the numbers in an array and assign it to a variable. You can also point D3 to CSV files, JSON data, and numerous other file types. I haven’t looked but I assume you could point to a Google Spreadsheet. Regardless, here is the snippet of code we’ll use to encode our data:

 var dataset = [ 12, 15, 20, 24, 25, 18, 27, 29];

This code should makes sense. We are creating a variable (var) named dataset and we are assigning our values to it.

Now we need to decide the way in which we want to display the data. For now, we will create a simple bar chart. So we need to style a bar. To do this we are going to use Cascading Style Sheets. Like a newspaper’s style guide, CSS is used to format content on a given page or across a site. So you use HTML to enter the content and then CSS to style said content. We are going to assign the style to a DIV tag. We’ll add the class “bar,” so it isn’t applied to all DIVs on our page. Here is the snippet of code:

div.bar {
 display: inline-block;
 width: 20px;
 height: 75px;  
 margin-right: 2px;
 background-color: teal;
 }

This will make the default bar 20px wide, teal, and with a 2px margin. Right now, the bar is 75px tall, but we will adjust that based on our data.

Finally, we need tell our browser that we want to use D3 to use this style to draw a bunch of bars representing our data. Here is the code we’ll use to do that:

 d3.select("body").selectAll("div")
 .data(dataset)
 .enter()
 .append("div")
 .attr("class", "bar")
 .style("height", function(d) {
 var barHeight = d * 5;
 return barHeight + "px";
 });

OK…this snippet of code looks a lot more confusing. In English, this code says, “Append each item in our dataset to a div of the class bar and adjust the height of the bar based on its value.”

One of the coolest things about D3 is using the built-in “d” variable to cycle through all the values in a dataset. In our case, D3 pulls up each value then multiples it by 5 and assigns that value to the height of the bar it is drawing.

Now we have all the building blocks for a basic bar chart. We can organize it in an HTML as follows:

<html lang="en">
 <head>
 <meta charset="utf-8">
 <title>D3 Demo: Making a bar chart with divs</title>
 <script type="text/javascript" src="../d3/d3.v2.js"></script>
 <style type="text/css">
 
 div.bar {
 display: inline-block;
 width: 20px;
 height: 75px;
 margin-right: 2px;
 background-color: teal;
 }
 
 </style>
 </head>
 <body>
 <script type="text/javascript">
var dataset = [ 12, 15, 20, 24, 25, 18, 27, 29 ];
 
 d3.select("body").selectAll("div")
 .data(dataset)
 .enter()
 .append("div")
 .attr("class", "bar")
 .style("height", function(d) {
 var barHeight = d * 5;
 return barHeight + "px";
 });
 
 </script>
 </body>
</html>

If we uploaded that file, we would get the following chart:

Screen Shot 2014-04-22 at 12.48.25 PM

Maybe it isn’t the most beautiful chart, but it is all code…no JPGs, no Google Charts…just code.

ED NOTE: I am not sure how long this will take in class, so I am skipping ahead to updating the dataset. I will come back to axes and labels. 

A code-driven chart is cool, but an interactive chart is even cooler. So let’s do that.

What we’ll have to do is add an object with which the user can interact (i.e., click). Then we’ll have to add code that tells D3 to listen for a click and update the data when it hears it. For the object, we’ll just create a simple set of text using the <p> tag. Here is the code we’ll use:

<p> Conference standing </p>

Now we need to add the Event Listener and tell it to update the data. Here is the code:

d3.select("p")
 .on("click", function() {

//New values for dataset

dataset = [ 7, 3, 4, 2, 2, 3, 2, 1 ];

//Update all bars
d3.selectAll("div")
  .data(dataset)
  .style("height", function(d) {
      var barHeight = d * 5;
      return barHeight + "px";
  });
});

Although this looks complex, we can easily walk through it. We are telling the browser to listen for any clicks within a <p> tag. Then once it hears the click, it executes the  function. Within the function, the dataset is updated with our new data and the bars are redrawn.

You can see the fruits of our labor here.

Pretty cool, but pretty useless. Am I right?

We can easily make this better by adding an IF command to our Event Listener. You should remember IF commands from some of our work in Excel. But basically an IF command says:

IF (logical statement comes back true) { 
     Do this
}
ELSE {
     Do this
}

We can start this process by giving our user two interaction options, like so:

 <p id="wins">Wins per year</p>
 <p id="conf">Conference</p>

We do the same thing as early – use the <p> tag – but this time we add unique id’s that we can reference later.

Then we just add the IF command to our Event Listener:

d3.selectAll("p")
 .on("click", function() {

 //See which p was clicked
 var paragraphID = d3.select(this).attr("id");
 
 //Decide what to do 
 if (paragraphID == "wins") {
   //New values for dataset
   dataset = [ 12, 15, 20, 24, 25, 18, 27, 29 ];
   //Update all bars
   d3.selectAll("div")
     .data(dataset)
     .style("height", function(d) {
        var barHeight = d * 5;
        return barHeight + "px";
     });
 } else {
   //New values for dataset
   dataset = [ 7, 3, 4, 2, 2, 3, 2, 1 ];
   //Update all bars
   d3.selectAll("div")
     .data(dataset)
     .style("height", function(d) {
        var barHeight = d * 5;
        return barHeight + "px";
      });
   }
 });

All that we added was two options. If the user clicks wins, we keep with the original dataset, and when the user clicks conference we insert the new dataset.

You can see the chart here.

Some InDesign Basic

A few people missed class on Thursday due to the weather, so I figure I would put together a quick post about InDesign. In this sort-of-tutorial, I go over everything you need to create a basic layout, like the sample I used in class. [It is important to remember InDesign is a layout tool. It does not do charting. That means we still need to build our chart in Illustrator or Excel.]

Screen Shot 2015-03-05 at 12.30.04 AM

Here are some very basic tips to using InDesign. Please shoot me an email if you are having problems or try to Google away the problems.

Screen Shot 2015-03-06 at 9.32.51 PM1) Basic tools – The basic layout tools should be on the left side of the screen when you open InDesign. The toolbar includes straightforward tools, such as the selection tool, the text tool, the line tool, and the rectangle tool. For basic layout, such as the example I used in class, this should be just about everything you need. Other convenient tools include the eyedropper – for coping colors or styles -, the pen tool, and the zoom tool – though you might want to use the shortcut key instead (Command and + or -).

2) Columns – One of the first things you want to do when laying out a story or page is come up with a basic grid structure. Hopefully, this was taken care of during the storyboarding process. If not, draw out your graphic now. How many columns do you need to make your layout? I needed three columns for mine – two columns of text and one column for the sidebar. I didn’t want the sidebar to be a full third of the page, so I actually made my layout four columns, using the first three columns (75%) of the page for the text and one column for the sidebar.

Screen Shot 2015-03-06 at 10.06.59 PM

Turning on the columns in InDesign is simple. Go to Layout > Margins and Columns.

Screen Shot 2015-03-06 at 10.13.03 PM

At the bottom of the dialog box, enter the number of columns you want.

Screen Shot 2015-03-06 at 10.30.45 PM3) Columns 2: The New Batch – Besides using columns at the page level, I also needed to break the text box into two columns. First, I created the text box, with the text tool in main toolbar. Then, I filled it with filler content (since we don’t have real copy) by Control+Clicking on the text box and choosing  “Fill with Placeholder Text.”

Once I have the text made, I can change the number of columns by selecting Object > Text Frame Options and entering the number of columns I need in the column box at the top of the dialog. You can also use the shortcut key (Control and B) to get the dialog box.

4) Place – When you want to bring in your graphic from Illustrator or Excel (or if your want to bring in graphic like the book cover), you select File > Place…which is weird. In nearly every program the command would be insert not place, but I digress. Once you place the image, you might have two issues:

  • It might look crappy. That is because InDesign doesn’t display images at high quality. This is a way to save processing power, but if you don’t know it is supposed to do that you might just think your graphic looks bad. You can turn this off by Control+Clicking anywhere and selecting Display Performance > High Quality Display.
  • You might need to scale the object once you bring it in (i.e., change its size). When you place an image, the tool that is turned on by default is the cropping tool. To scale, hold Command+Shift and drag any corner of the image.

Screen Shot 2015-03-06 at 11.09.20 PM5) Text Wrap – I placed one of my images (i.e., the book cover) on top of my text box. Initially, the text did not move and so some of it was covered by the image. To change this, I need to adjust the text wrap options. Text wrap can be adjusted in numerous ways. First, you can change the wrapping style in the top toolbar (across the top of your screen). The image to the left, shows the icons you should look for.

Screen Shot 2015-03-06 at 11.06.01 PMYou can also adjust the text wrap options in the toolbar on the right side of the screen. This toolbar might not be displayed by default on your computer. If not you can turn it on by selecting Window > Text Wrap.

Those are a few of the most important, general things you need to complete your assignment.

Spicing up our map

Screen Shot 2015-02-03 at 10.49.36 AMIn my first mapping post we made a simple locator map. It let the user see all the properties in Knox County which had been quarantined for Meth-related activity during the last eight or so years. Although this type of map is often useful to quickly show the user the distribution of objects/events within a geographic area (e.g., distribution of crime, pizza shops, schools), it doesn’t provide a lot of context for the data. In this post, we will try adding context to this map in two different ways.

First, we will group different instances of objects/events in our map by color coding the points. Then, we will create an overlay for our map in order to provide the user with information about the neighborhoods in which meth-related incidences occur.

Color coding points

Step 1: Get data – I will be using data from the Tennessee Meth & Pharmaceutical Task Force again. I start by using the Chrome Scraper Extension to grab the data. This time I will grab data for all meth incidents in Knox County instead of just quarantined properties.

Screen Shot 2015-02-03 at 11.13.01 AM

Step 2: Clean and prep data – I’ll start by prepping the addresses (just like I did in the last example). Then I’ll create a second additional column, which I will use to distinguish between quarantined and non-quarantined propertied. The quarantined properties all have a “quarantine date” associated with them. Since there only 30 quarantined properties, I’ll just go through and manual code each of the quarantined properties with a “1” in my new column. I will then code all the non-quarantined properties with a “0” in the same column.

Screen Shot 2015-02-03 at 11.29.09 AM

 

Step 3: Import to Fusion Tables – Identical to last example. Make sure the mapping is based of the “location” column, instead of the city or zip columns. Also, clean up any ambiguous data, or points Google has problems mapping.

Screen Shot 2015-02-03 at 11.35.44 AM

My map look pretty good right now. The user can quickly see every meth-related incident in Knox County during the last eight or so years…but I want the user to be able to quickly distinguish between quarantined and non-quarantined sites. Enter step 4.

Step 4: Color coding – Ready to be shocked? Here’s how simple it is to color code the data points. Screen Shot 2015-02-03 at 12.39.41 PM

  1. Click the “Change feature style” button
  2. In the “Marker icon” tab, choose the “Buckets” tab.
  3. Divide in to two buckets using the “Quarantine” column.
  4. Click the “use this range” link
  5. Choose the right color for each bucket
  6. Click “save”

What this did was created two sets of marker icons. The first being any rows with a quarantine value between “0” and “0.9999”; the second being any rows with a value of “1” or greater. When I exit out of this dialog box my map updates and looks like this:

So simple! Now I’ll move onto my second way to add context to a map.

Creating a choropleth map

OK, I now want to lay statistical/numeric data over my map based on geographic boundaries. The numeric data I am going to use is average household income per census tract.

Census Tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity that are updated by local participants prior to each decennial census as part of the Census Bureau’s Participant Statistical Areas Program.

On top of this backdrop, I am again going to place the icons representing each meth-related incident in Knox County during the last eight or so years.

Step 1: Get data – This time I am going to use data from two unique locations to tell what is hopefully a deeper and more powerful story. I already have the meth data cleaned and coded how I want it. Next, I am going to need the income data. This data will come from one of the greatest websites ever created: Census Reporter. Seriously, it is amazing.

Screen-Shot-2014-04-03-at-12.48.36-PM-1024x429

Census Reporter allows me to quickly pull data from the various datasets created by the U.S. Census. Although their new interface is a bit wonky, it is still a million times better than digging through census.gov. Here is how we get the info we need:

  1. Screen Shot 2015-02-03 at 1.26.05 PMExplore for “median household income”
  2. Select the variable right variable
  3. Enter the location “Knox County, TN”
  4. Once Knox County is selected, Census Reporter will pop out one number for all of Knox County…which is not the number we want. To fix this, I need to tell Census Reporter I want to look at the census tract-level data. On the left side of the screen, there is a “Divid Knox County by…” heading. Below it is the census tract option.
  5. Once clicked, Census Reporter will spit out a median household income value for each census tract.
  6. From here all I need to do is export the data as a KML file.

Oh wait, you don’t know what a KML file is? A KML file, or Keyhole Markup Language file, is a format used to display geographic data in Google Maps. We can use a KML file to draw boundaries on a map.

Step 2: Import in to Fusion Tables – Another simple step. All I have to do is create new Fusion table and import the KML file.

Step 3: Create the KML-based map – In Fusion Table, create a map based on “geometry.” Then change feature style for the fill color to a gradient based on the income variable.

Screen Shot 2015-02-03 at 2.04.05 PM

Once I save and close the dialog box, the map will update.

Now I just need to combine to two maps. Unfortunately, the one weakness of Google Maps in Fusion Tables is I can’t simultaneously style two layers (i.e., the census tracts and the meth incident icons). But never fear,  there is a super simple work around:

Screen Shot 2015-02-03 at 2.08.15 PM

High School Journalism workshop

I am giving a talk tomorrow to a group of high school journalists about basic video production. Here are some of the examples and notes I’ll be using:

Woodworker sees with his hands from McKenna Ewen on Vimeo.

Step 1: Find a story.

Everyone has a story; it is just a matter of finding it.

Step 2: Shoot interviews.

If prep and setup correctly, television interviews can be fairly easy.

photo (8)

nabb from Medal of Honor Project on Vimeo.

Step 3: Shoot b-roll.

What is your story about? What did your interview subject mention? What gives your viewer context? What is compelling in the scene?

broll from Medal of Honor Project on Vimeo.

Born This Way – Form 990

Gawker published a story about Lady Gaga’s Born This Way Foundation, this afternoon. The article looks at how the BTWF used its $2.1 million in net assets during 2012.

How did they do this article you my ask?

Well it’s simple. Non-profit organizations falling under IRS designation 501c3 must file a Form 990, which lists basic information about revenue and expenditures. Form 990s are public and can be accessed via Guidestar, an organization that specializes in collecting information on non-profits. To download a Form 990: go to Guidestar, sign in (990’s are free but you must create an account), search for the organization, click the “Form 990 & Docs” tab, and download the form you want.

Below is the 2012 Form 990 for the Born This Way Foundation. Try fact checking the Gawker story. How did they do?

Additional Resources:

Simple example for teaching data journalism

Addresses in Knox County Quarantined for Meth-related Activity

The map below shows the 27 Knox County properties listed as quarantined in Tennessee Methamphetamine Task Force‘s Meth Lab Database. The database includes quarantine data dating back to 2006.

Seven Steps to Doing This!

You will be shocked at the simplicity of this project.

Step 0. Get a Google Account. The scraper we will be using will dump the information that is scraped directly to a Google Spreadsheet.

Step 1. Install the Scraper extension for Chrome. It is free and simple. After install you might have to restart Chrome (I can’t remember). Once you have the Scraper extension properly install, you should be able to right-mouse click in Chrome and see a new option: “Scrape similar…”

Scrape

Step 2. Find our data. In this case, we have to go to the Tennessee Methamphetamine Task Force’s website. Then we choose “Search for Meth Labs” link. We want to search for activity in Knox County, and we only want to see quarantined properties.

Scrape2

Step 3. Scrape our data using our new Chrome extension. After you run the search on the Task Force’s website, you will see the data for the 27 quarantined addresses. Select the whole row for one instances, right-mouse click, and choose “Scrape similar…”

Scrape3

After you choose “Scrape similar…”, a window will pop up displaying…hopefully…the data we want. It worked great for me.

Scrape4

From here you just choose “Export to Google Docs…” and after a few clicks – the first time you do this you will have to give permission to the Scraper extension to write to your Google Docs – the data should populate into a Google Spreadsheet.

Scrape5

Step 4. Prep the data a bit. The data is pretty much ready to go, but we want to make sure the address is clear enough that Google Maps can find it. We just need to pull the address from the five different address columns together in to one column. We can’t use the regular addition command (+), since these values are strings. Instead we need to combine them using the ampersand (&). We also need to make sure we include the proper spacing and punctation. Here is the formula I used:

=A2&" "&B2&", "&C2&", "&D2&" "&E2

The formula might be confusing, but really it just says “I want to the data from cell A2, than a space, then the data from cell B2, then a comma and a space…” and so on. Rewrite the code above into cell G2, see if it works, and then copy-and-paste the the formula from G2 to all the cells in column G.

Step 5. Import data into Fusion Table. Although Google Spreadsheets is cool, it doesn’t have the ability to do geolocation, or finding the address on a map. Therefore, we have to use another Google product, Fusion Tables. Fusion Tables is a free, fairly advance database software. Since most people don’t use Google makes you turn on Fusion Tables within your Google account.

To turn it on, go to your Google Drive and choose “Connect more apps…” Then search for Fusion Tables and install it.

Scrape6

Once it’s installed, go back to your Google Drive and under “create” you should be able to create a Fusion Table. When you choose to create a Fusion Table you will get to this screen:

Scrape7

Choose “Google Spreadsheet.” The spreadsheet you created in Step 3 & 4 should pop up. If it doesn’t just look through your Google Drive until you find it. When you find it, click it and follow the directions. Eventually, your data should show up in Fusion Tables.

Scrape8

Step 6. Map the data. Just click the Map tab. Seriously! It’s that easy. The only problem that might arise is that Fusion Table might not understand what column you want to map. If this does sink your project, all you have to do select “Address” instead of “City” in the drop down.

Step 7. Post the map on your website. Choose “Publish” on the map tab. Fusion Table will give you a warning telling you that if you want to publish the map you need to make the table public. Do that. Then copy the embed code and paste it in your blog or website.

DONE!