All posts by nick

Using D3.js

D3 (or Data-Driven Documents) is an open-source JavaScript library for making data visualizations. Pretty cool, eh?

Oh…you’re asking yourself, “what is an open-source JavaScript library?” Well, the first part, open source, means that the source code is publicly available. In many cases, open-source software can be freely downloaded, edited and re-published. For more information about open-source software, check out this annotated guide.

The second part, javascript library, means that it is a collection of JavaScript functions that you can reference in your code. Basically, it is a bunch of code that other people wrote so you don’t have to! All you have to do is point to the library and tell it what you want to do.

Pointing to the library is easy. You just use the <script> tag and tell the browser what script you want. Generally, you can either host the library on your server or point to the library on the creators server. If you point to their server, you’ll automatically get updates (depending on their naming/workflow), which is good and bad. It is good in that you are using the newest software. It is bad in that they might update something in a way that ruins your project. I personally lean toward hosting on my server.

To host on your server:

  1. Download the library from the D3 website.
  2. Upload the library to your server
  3. Point to the library using the following code:
<script src="/d3/d3.v3.min.js" charset="utf-8"></script>

To leave it on their server:

  1. Just insert this code:
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>

We have successfully navigated step one. Our browser will know that it needs to use JavaScript, because of the <script> tag, and it will load the correct JavaScript library, because we told it where the library is by using the src attribute.

Now we can move to step 2. Actually making a graphic using the library. To do this, we can just put directions in between the opening and closing <script> tags (which sounds easy).

The first thing we have to understand about D3 is that we are using code to create everything in the chart. This is amazing, because it is highly adaptable and lightweight. This is also a drawback, because that means that there is a steep learning curve and can be a bit daunting at the beginning. Let’s start by looking at a chart and backing it down to it’s elements.

Medal of Honor recipient origin: Top 5 states

Medal of Honor recipient origin: Top 5 states

 

What do we need to create this graphic?

  1. Data
  2. Type of chart
  3. Axes and labels
  4. Color coding

We are going to need to explain that all to D3.

First, lets deal with the data. You can get data to D3 in numerous ways. For now we will enter the numbers in an array and assign it to a variable. You can also point D3 to CSV files, JSON data, and numerous other file types. I haven’t looked but I assume you could point to a Google Spreadsheet. Regardless, here is the snippet of code we’ll use to encode our data:

 var dataset = [ 12, 15, 20, 24, 25, 18, 27, 29];

This code should makes sense. We are creating a variable (var) named dataset and we are assigning our values to it.

Now we need to decide the way in which we want to display the data. For now, we will create a simple bar chart. So we need to style a bar. To do this we are going to use CSS (cascading style sheets), which we discussed a few weeks ago. We are going to assign the style to a DIV tag. We’ll add the class “bar,” so it isn’t applied to all DIVs on our page. Here is the snippet of code:

div.bar {
 display: inline-block;
 width: 20px;
 height: 75px;  
 margin-right: 2px;
 background-color: teal;
 }

This will make the default bar 20px wide, teal, and with a 2px margin. Right now, the bar is 75px tall, but we will adjust that based on our data.

Finally, we need tell our browser that we want to use D3 to use this style to draw a bunch of bars representing our data. Here is the code we’ll use to do that:

 d3.select("body").selectAll("div")
 .data(dataset)
 .enter()
 .append("div")
 .attr("class", "bar")
 .style("height", function(d) {
 var barHeight = d * 5;
 return barHeight + "px";
 });

OK…this snippet of code looks a lot more confusing. In English, this code says, “Append each item in our dataset to a div of the class bar and adjust the height of the bar based on its value.”

One of the coolest things about D3 is using the built-in “d” variable to cycle through all the values in a dataset. In our case, D3 pulls up each value then multiples it by 5 and assigns that value to the height of the bar it is drawing.

Now we have all the building blocks for a basic bar chart. We can organize it in an HTML as follows:

<html lang="en">
 <head>
 <meta charset="utf-8">
 <title>D3 Demo: Making a bar chart with divs</title>
 <script type="text/javascript" src="../d3/d3.v2.js"></script>
 <style type="text/css">
 
 div.bar {
 display: inline-block;
 width: 20px;
 height: 75px;
 margin-right: 2px;
 background-color: teal;
 }
 
 </style>
 </head>
 <body>
 <script type="text/javascript">
var dataset = [ 12, 15, 20, 24, 25, 18, 27, 29 ];
 
 d3.select("body").selectAll("div")
 .data(dataset)
 .enter()
 .append("div")
 .attr("class", "bar")
 .style("height", function(d) {
 var barHeight = d * 5;
 return barHeight + "px";
 });
 
 </script>
 </body>
</html>

If we uploaded that file, we would get the following chart:

Screen Shot 2014-04-22 at 12.48.25 PM

Maybe it isn’t the most beautiful chart, but it is all code…no JPGs, no Google Charts…just code.

ED NOTE: I am not sure how long this will take in class, so I am skipping ahead to updating the dataset. I will come back to axes and labels. 

A code-driven chart is cool, but an interactive chart is even cooler. So let’s do that.

What we’ll have to do is add an object with which the user can interact (i.e., click). Then we’ll have to add code that tells D3 to listen for a click and update the data when it hears it. For the object, we’ll just create a simple set of text using the <p> tag. Here is the code we’ll use:

<p> Conference standing </p>

Now we need to add the Event Listener and tell it to update the data. Here is the code:

d3.select("p")
 .on("click", function() {

//New values for dataset

dataset = [ 7, 3, 4, 2, 2, 3, 2, 1 ];

//Update all bars
d3.selectAll("div")
  .data(dataset)
  .style("height", function(d) {
      var barHeight = d * 5;
      return barHeight + "px";
  });
});

Although this looks complex, we can easily walk through it. We are telling the browser to listen for any clicks within a <p> tag. Then once it hears the click, it executes the  function. Within the function, the dataset is updated with our new data and the bars are redrawn.

You can see the fruits of our labor here.

Pretty cool, but pretty useless. Am I right?

We can easily make this better by adding an IF command to our Event Listener. You should remember IF commands from some of our work in Excel. But basically an IF command says:

IF (logical statement comes back true) { 
     Do this
}
ELSE {
     Do this
}

We can start this process by giving our user two interaction options, like so:

 <p id="wins">Wins per year</p>
 <p id="conf">Conference</p>

We do the same thing as early – use the <p> tag – but this time we add unique id’s that we can reference later.

Then we just add the IF command to our Event Listener:

d3.selectAll("p")
 .on("click", function() {

 //See which p was clicked
 var paragraphID = d3.select(this).attr("id");
 
 //Decide what to do 
 if (paragraphID == "wins") {
   //New values for dataset
   dataset = [ 12, 15, 20, 24, 25, 18, 27, 29 ];
   //Update all bars
   d3.selectAll("div")
     .data(dataset)
     .style("height", function(d) {
        var barHeight = d * 5;
        return barHeight + "px";
     });
 } else {
   //New values for dataset
   dataset = [ 7, 3, 4, 2, 2, 3, 2, 1 ];
   //Update all bars
   d3.selectAll("div")
     .data(dataset)
     .style("height", function(d) {
        var barHeight = d * 5;
        return barHeight + "px";
      });
   }
 });

All that we added was two options. If the user clicks wins, we keep with the original dataset, and when the user clicks conference we insert the new dataset.

You can see the chart here.

And you can find the code on GitHub: https://github.com/ngeidner/d3_example

Stepping up our map (cont.)

Hopefully, you came up with the idea of the pulling income data in to your map. As we all know the best source for income data is the U.S. Census Bureau. In past years, you would have to dig through the data provided by the Census, which is rough. Now-a-days it is a lot easier, because of Census Reporter.

Screen Shot 2014-04-03 at 12.48.36 PM

 Census Reporter has made accessing Census data simple. Seriously, I mean SUPER simple. Really.

In a handful of clicks, we can get census tract-level household income data for Knox County. Even more badass, we can download it as a KML. Seriously…a KML file. Yay!

Oh wait, you don’t know what a KML file is? A KML file, or Keyhole Markup Language file, is a format used to display geographic data in Google Maps. We can use a KML file to draw boundaries on a map. In this case, we will draw the borders of the census tracts. Let’s go ahead an create a new Fusion Table with the data from Census Reporter.

Now we should have two Fusion Tables that each hold a part of our project:

  1. Census tract-level household income
  2. Location of meth-related incidents

There are multiple ways we can combine these, but as always we will use the easiest possible way. In this case, the tool of choice is the Fusion Tables Layer Wizard.

Screen Shot 2014-04-03 at 2.06.36 PM

Some kind soul made an automated tool for accessing the Google Maps API. We just tell the wizard where to get the info, and it spits out the corresponding HTML code. AMAZING.

 

Today’s assignment: Stepping up our map

Our meth map the other day gave the reader some information (i.e., the locations of meth arrests and quarantines), which is cool. But we can do a lot more. I will go over a couple ways to expand on our first map.

First, we might want to make our map a little more user friendly. One way to do this would be to make it so the user can search of a specific address and find incidents near that location. This is probably not very useful in our use case – we know the city -, but for someone moving to Knoxville it could be very useful.

searchable-map-template-v1.2

To do this we can use a open-source (read as free) JavaScript library created by Derek Eder. It is ridiculously simple. On that page, he shows you how to do it in seven steps. We won’t go through that, but know it exists and is easy.

Second, we can add more context to the map. For instance, one thing we might want to look at the relationship between poverty and methamphetamine incidences.

How could we go about doing this?

 

 

Questions for editing

Story

  1. Does the lead accurately reflect the story?
  2. Does the author follow AP style through out?
  3. Does the author get all numbers correct?
  4. Are all calculations correct (e.g., percent change)?
  5. Does the author properly contextualize the numbers?

Graphic

  1. Does each graphic connect to the story?
  2. Does the author interpret the meaning of the graphic in the text?
  3. Are all numbers correct in the graphic?
  4. Did the author use the right type of chart for the information being conveyed?
  5. Is the graphic well designed? Does it pass the ink-to-data ratio test?

Assignment due Thursday, March 27

We will not have class Tuesday. Instead you will work on your first assignment combining data analysis, writing and data visualizations. By Thursday, you must write an article using the below data set. The article should be 300-400 words and use at least two simple charts (line, bar, scatter, etc). Like always you will work with your partner and turn in only one article. Please email me your story by Thursday, March 27 at 11:00 a.m.

Beyond the data I give you below, you can use any additional data or sources you want. Like any news story, do not use other media outlets as a source and follow AP style.

The Story: The NCAA tournament is hitting full stride, and the Vols and LadyVols are still in the mix. You must write a sidebar story about the effects of money on winning in Division I basketball. The story should be localized to the UT community (as if you were writing for the KnoxNews).

main

The Data: The main source of your story will be data from the Department of Education’s Equity in Athletics data. Under the 1994 Equity in Athletics Disclosure Act federally-funded institutions must reporting information about their athletics program including roster size, revenue and expenditures. The Equity in Athletics Data Analysis Cutting Tool is an amazingly simple tool, which allows you to download the data in anyway you want it: by schoool, by division, by year, etc. You should play around with it. Seriously…go play with it.

I’ll give you time.

Now they you played with the data cutting tool. I’ll just give you a CSV to start you off. Below you will find a CSV file, which contains the revenue and expenditures of every Division I basketball program for the last five years, which 2012 being the most current year.

HERE IS THE DATA

The Charts: Please create your charts in Microsoft Excel. Here are some resources if you don’t know anything about making a chart in Excel:

After you make your basic Excel charts, we want to step it up a bit. Excel charts are ugly. For reals.

Here is a fantastic guide on how you can “tufterize” – after the graphic design legend Edward Tufte – your charts. With a little extra time and finesse, you can actually make great looking charts in Excel. Get at it!

Another missing plane graphic

The other day we looked at a graphic from the Washington Post about the missing plane. If you remember correctly, I didn’t like.

Here is another piece of data journalism related to the missing plane:

20140314-223905.jpg

I like this one. It is so simple – a Google Map with a few handfuls of data points – but it gives the user a lot of new information. Fantastic.

Intro to Data Visualization

We have learned to grab data, sort data, clean data and organize data. Now we need to learn how to display data. Today, we will intro all the technologies we will be using during the rest of the semester.

Step 1. Exploratory v. Explanatory

The first question we need to ask ourselves when making a data visualization is, “What kind of graphic am I making?” Although there are a lot of ways you can answer this question, I think the most important first step is to decide if you are making an exploratory or an explanatory graphic.

“Exploratory data visualizations are appropriate when you have a bunch of data and your not sure what’s in it.” – Iliinsky & Steele, p. 7

In exploratory graphics, the user is given the freedom to explore the data, but also has the responsibility of figuring out what the data means. This can be empowering, but overwhelming.

Let’s look at this example from the New York Times:

Screen shot 2014-03-11 at 12.12.18 PM

 

Unlike an exploratory graphic, an explanatory graphic is trying to tell a story, or point the reader to specific information. Most static graphics fall into the explanatory category. If we go back to my Peyton graphic, we can see how it was designed to explain something to the reader.

1017150_10101707796389348_282664727_n

What was I trying to explain to the reader with this graphic? If you are making an explanatory, you need to decide exactly what concepts or ideas you want to explain to the reader. Then choose the most appropriate graphic type for that type of data.

Step 2. Data Type

The type of data you have will determine the type of graphic/tool you use. Today, we will briefly discuss a number of different ways to display information. Then over the next few weeks, we will go more in-depth with each tool.

Is your data exploratory and type-less?

Do you have a lot of information, which falls into a number of different data types (e.g., numbers, dates, categories)? You might consider posting the data as a searchable database.

Screen Shot 2014-03-11 at 1.53.58 PM

Does one of your columns include a date?

Maybe you should create a timeline.

Screen Shot 2014-03-11 at 1.57.45 PM

Or we could create a bar chart…like my Peyton graphic. Remember the x axis of that graphic is time.

Does one of your columns include a location?

Maybe you should map the data. You could use Google Maps or something like StoryMap.js.

Screen Shot 2014-03-11 at 2.00.29 PM

Are you just using numbers?

We probably want to use some kind of chart. Are you comparing numbers, looking at trends, looking at parts of a whole? AHHH…what are you trying to do? Once we decide that we can pick the correct chart to use.

Screen Shot 2014-03-11 at 2.05.20 PM

Scraping using Outwit

The other day during class I scraped data the Congressional Medal of Honor Society’s website using Outwit Hub. Ahead of today’s assignment, I figured I would pull together a guide on how I did it.

Step 1. We need to decide what we want and where it is.

I want some descriptive data about living medal of honor recipients in order to provide some context to the reporting we are doing at the Medal of Honor Project. Specifically, I want the recipients’ name, rank, date of birth, date of medal-winning action, place of action, MoH issue date, place of birth and place of action.

If I choose “Living Recipients” from the “Recipients” tab, I see this:

Screen Shot 2014-03-03 at 4.20.44 PM

If this screen had all the information I needed, I could easily use the Chrome Scraper extensions to grab the data. Unfortunately, I want more information than their name, rank, organization, and conflict. If I click on one of the entries, I can see that all the information that I want is on each entry page.

mohscrape

 

So now we know that we want to grab a handful of data from each of the pages of the living recipients.

Step 2. Collect the addresses of all the pages we want our scraper to go to (i.e., the pages of all the living recipients).

We can do this in a number of ways. Since there are only 75 living recipients, across three pages, we could easily use the Chrome Scraper Extension to grab the addresses (see this guide if you forget how to use it).

Screen Shot 2014-03-03 at 4.45.28 PM 

Since I am using this project as practice for grabbing data from the pages of all 3,463 recipients, I decided to write a scraper in Outwit to grab the addresses.

To write a scraper, I need to tell the program exactly what information I want to grab. I start this process by looking at the coding around the items I want using the “Inspect Element” function in Google Chrome.

Screen Shot 2014-03-03 at 5.01.04 PM

 

 

If I right-mouse click on the “view” link and click “Inspect Element,” I will see that this is the line of code that relates to the link:

<div class="floatElement recipientView"><a href="http://www.cmohs.org/recipient-detail/3219/baca-john-p.php">view</a></div>

This line of code is all stuff we have seen before. This just a <div> tag with an <a> tag inside it. The <div> is used to apply a class (i.e., floatElement recipientView) and the <a> inserts the link. The class is unique to the links we want to grab, so we can use that in our scraping. We just need to tell Outwit Hub to grab the link found within any <div> tag of the recipientView class.

In Outwit, we start by loading the page we want to scrape.

Screen Shot 2014-03-03 at 5.17.34 PM

Then we want to start building our scraper by choosing “Scrapers.” When we click into the scraper window, we will have to pick a name for our scraper. I chose “MoH Links.” You will also see that the CMOHS website has flipped in to a code view. We will enter the directions for our scraper in the lower half of the screen, where it says description, marker before and marker after.

Screen Shot 2014-03-03 at 5.23.58 PM

We just need one bit of info, so our scraper in simple. I entered:

  • Description = Link
  • Marker before = recipientView”><a href=”
  • Marker after = “>

You can then hit “Execute” and your scraper should grab the 25 addresses from the first page of living recipients. But remember, I don’t want the addresses from just the first page, but from all three pages.

To do this, I need to step back, get super meta, and create a list to make a list. If you go to the second page, it is easy to see how these pages are organized or named. Here is the address for the second page of recipients:

http://www.cmohs.org/living-recipients.php?p=2..

Not shockingly “p=2″ in english is “page equals two.” A list of all the address is simple to derive.

http://www.cmohs.org/living-recipients.php?p=1..

http://www.cmohs.org/living-recipients.php?p=2..


http://www.cmohs.org/living-recipients.php?p=3..

If you create this list as a simple text file (.txt), we can bring this into Outwit Hub and use our scrape on all of these pages.  After I create the text file, I go to Outwit choose “File > Open” and select the text file. Next, select “Links” from the menu on the right-hand side of the screen. It should look like this:

Screen Shot 2014-03-03 at 5.40.21 PM

Now, select all the links by using Command+A. Then right-mouse click and choose “Auto-Explore Pages > Fast Scrape > MoH Links (or whatever you named your scraper).” OutWit should pop out a table that looks should of like this:

Screen Shot 2014-03-03 at 5.51.05 PM

YOU JUST RAN YOUR FIRST SCRAPER!!!

Way to go!

Now just export these links. You can either right-mouse click and select “Export selection” or click “Catch” and then hit “Export.” I usually export as a Excel file. We’ll eventually have to turn this file in to a text file, so we can bring it back into OutWit. For now, just export it and put it to the side.

Step 3. Create a scraper for the data we actually want.

We are going to start with “Element Inspector” again. Remember, we want to find unique identifiers related to each bit of information we want to grab. I went in an look at each piece of information (e.g., Issue Date) and looked at the coding around the information.

Screen Shot 2014-03-03 at 6.01.04 PM

If you run through each of the bits of information we are grabbing, you start seeing a pattern in the way the information is coded and unique identifiers for each piece of information. For example, the code around the “Date of Issue” looks like this:

<div><span>Date of Issue:</span> 05/14/1970</div>

And it looks like that on every page I need to scrape. So I can enter the following information into a new OutWit scraper – I called this one MoH data – in order to grab the date:

  • Description = IssueDate
  • Marker before = Issue:</span>
  • Marker after = </

OutWit will grab the date (i.e., 5/14/1970) which is all the information between the “>” after the span to the “</” which closes the span.

Just about every piece of information we want has a label associated, which makes it very easy to scrape. I just went through and created a line in OutWit for each piece of data I wanted, using the label as the marker before.

The only piece of information that doesn’t have a label is the name. If you right-mouse click on it and choose “Inspect Element,” you will see that it is surrounded by an <H4> tag. If you use the Find function (command+F), you’ll see that the name is the only item that has an <H4> tag associated with it. So we can tell OutWit to grab all information between an <H4> tag, like so:

  • Description = Name
  • Marker before = <H4>
  • Marker after = </

Screen Shot 2014-03-03 at 6.29.01 PM

Once I got my scraper done, I hit the “Execute” button to see if it worked. It did!

Step 4. Now I just need to tell OutWit where to use my new scraper.

Go back to the Excel file you create in Step 2. Copy the column of links and paste them into a new text file. Save this new text file. I called mine mohlinks2.txt.

Screen Shot 2014-03-03 at 6.32.13 PM

Next we open up OutWit. Before actually start scraping we need to deal with a limitation of the free version of OutWit. You can only have one scraper assigned to a given web address in the free version. So we need to change “MoH Links” (our first scraper), so it is not associated with cmohs.org.

Open up “MoH Links” on the “Scrapers” page of OutWit. Below where it says “Apply If Page URL Contains” there is a box the contains “http://www.cmohs.org.” Delete the address from that box and save the new “MoH Links” scraper. Now go into the “MoH Data” scraper and enter the cmohs address in the same box, save the scraper, and then close and reopen OutWit.

Screen Shot 2014-03-03 at 6.44.43 PM

Next open the mohlinks2.txt. Select all the links (command+A) and choose “Auto-Explore Pages > Fast Scrape > MoH Data (or whatever you named your scraper).” Slowly but surely OutWit Hub should go to each of the 75 pages in our links text file and grab the bits of information we told it to grab. Mine worked perfect.

All that you need to do is export the data OutWit collected and then you can go into to Excel to start cleaning the data and pulling information from the data.

Although this first one probably seemed a bit rough, you will get used to how information is structured in websites and how OutWit works over time.