Questions for editing

March 27, 2014 nick Leave a comment

Story

Does the lead accurately reflect the story?
Does the author follow AP style through out?
Does the author get all numbers correct?
Are all calculations correct (e.g., percent change)?
Does the author properly contextualize the numbers?

Graphic

Does each graphic connect to the story?
Does the author interpret the meaning of the graphic in the text?
Are all numbers correct in the graphic?
Did the author use the right type of chart for the information being conveyed?
Is the graphic well designed? Does it pass the ink-to-data ratio test?

Assignment due Thursday, March 27

March 24, 2014 nick Leave a comment

We will not have class Tuesday. Instead you will work on your first assignment combining data analysis, writing and data visualizations. By Thursday, you must write an article using the below data set. The article should be 300-400 words and use at least two simple charts (line, bar, scatter, etc). Like always you will work with your partner and turn in only one article. Please email me your story by Thursday, March 27 at 11:00 a.m.

Beyond the data I give you below, you can use any additional data or sources you want. Like any news story, do not use other media outlets as a source and follow AP style.

The Story: The NCAA tournament is hitting full stride, and the Vols and LadyVols are still in the mix. You must write a sidebar story about the effects of money on winning in Division I basketball. The story should be localized to the UT community (as if you were writing for the KnoxNews).

The Data: The main source of your story will be data from the Department of Education’s Equity in Athletics data. Under the 1994 Equity in Athletics Disclosure Act federally-funded institutions must reporting information about their athletics program including roster size, revenue and expenditures. The Equity in Athletics Data Analysis Cutting Tool is an amazingly simple tool, which allows you to download the data in anyway you want it: by schoool, by division, by year, etc. You should play around with it. Seriously…go play with it.

I’ll give you time.

Now they you played with the data cutting tool. I’ll just give you a CSV to start you off. Below you will find a CSV file, which contains the revenue and expenditures of every Division I basketball program for the last five years, which 2012 being the most current year.

HERE IS THE DATA

The Charts: Please create your charts in Microsoft Excel. Here are some resources if you don’t know anything about making a chart in Excel:

Microsoft’s how -to guide for Excel charts
Here are 8 tutorials on making Excel charts

After you make your basic Excel charts, we want to step it up a bit. Excel charts are ugly. For reals.

Here is a fantastic guide on how you can “tufterize” – after the graphic design legend Edward Tufte – your charts. With a little extra time and finesse, you can actually make great looking charts in Excel. Get at it!

Another missing plane graphic

March 15, 2014 nick Leave a comment

The other day we looked at a graphic from the Washington Post about the missing plane. If you remember correctly, I didn’t like.

Here is another piece of data journalism related to the missing plane:

I like this one. It is so simple – a Google Map with a few handfuls of data points – but it gives the user a lot of new information. Fantastic.

Intro to Data Visualization

March 11, 2014 nick Leave a comment

We have learned to grab data, sort data, clean data and organize data. Now we need to learn how to display data. Today, we will intro all the technologies we will be using during the rest of the semester.

Step 1. Exploratory v. Explanatory

The first question we need to ask ourselves when making a data visualization is, “What kind of graphic am I making?” Although there are a lot of ways you can answer this question, I think the most important first step is to decide if you are making an exploratory or an explanatory graphic.

“Exploratory data visualizations are appropriate when you have a bunch of data and your not sure what’s in it.” – Iliinsky & Steele, p. 7

In exploratory graphics, the user is given the freedom to explore the data, but also has the responsibility of figuring out what the data means. This can be empowering, but overwhelming.

Let’s look at this example from the New York Times:

Unlike an exploratory graphic, an explanatory graphic is trying to tell a story, or point the reader to specific information. Most static graphics fall into the explanatory category. If we go back to my Peyton graphic, we can see how it was designed to explain something to the reader.

What was I trying to explain to the reader with this graphic? If you are making an explanatory, you need to decide exactly what concepts or ideas you want to explain to the reader. Then choose the most appropriate graphic type for that type of data.

Step 2. Data Type

The type of data you have will determine the type of graphic/tool you use. Today, we will briefly discuss a number of different ways to display information. Then over the next few weeks, we will go more in-depth with each tool.

Is your data exploratory and type-less?

Do you have a lot of information, which falls into a number of different data types (e.g., numbers, dates, categories)? You might consider posting the data as a searchable database.

Does one of your columns include a date?

Maybe you should create a timeline.

Or we could create a bar chart…like my Peyton graphic. Remember the x axis of that graphic is time.

Does one of your columns include a location?

Maybe you should map the data. You could use Google Maps or something like StoryMap.js.

Are you just using numbers?

We probably want to use some kind of chart. Are you comparing numbers, looking at trends, looking at parts of a whole? AHHH…what are you trying to do? Once we decide that we can pick the correct chart to use.

Scraping using Outwit

March 3, 2014 nick Leave a comment

The other day during class I scraped data the Congressional Medal of Honor Society’s website using Outwit Hub. Ahead of today’s assignment, I figured I would pull together a guide on how I did it.

Step 1. We need to decide what we want and where it is.

I want some descriptive data about living medal of honor recipients in order to provide some context to the reporting we are doing at the Medal of Honor Project. Specifically, I want the recipients’ name, rank, date of birth, date of medal-winning action, place of action, MoH issue date, place of birth and place of action.

If I choose “Living Recipients” from the “Recipients” tab, I see this:

If this screen had all the information I needed, I could easily use the Chrome Scraper extensions to grab the data. Unfortunately, I want more information than their name, rank, organization, and conflict. If I click on one of the entries, I can see that all the information that I want is on each entry page.

So now we know that we want to grab a handful of data from each of the pages of the living recipients.

Step 2. Collect the addresses of all the pages we want our scraper to go to (i.e., the pages of all the living recipients).

We can do this in a number of ways. Since there are only 75 living recipients, across three pages, we could easily use the Chrome Scraper Extension to grab the addresses (see this guide if you forget how to use it).

Since I am using this project as practice for grabbing data from the pages of all 3,463 recipients, I decided to write a scraper in Outwit to grab the addresses.

To write a scraper, I need to tell the program exactly what information I want to grab. I start this process by looking at the coding around the items I want using the “Inspect Element” function in Google Chrome.

If I right-mouse click on the “view” link and click “Inspect Element,” I will see that this is the line of code that relates to the link:

<div class="floatElement recipientView"><a href="http://www.cmohs.org/recipient-detail/3219/baca-john-p.php">view</a></div>

This line of code is all stuff we have seen before. This just a <div> tag with an <a> tag inside it. The <div> is used to apply a class (i.e., floatElement recipientView) and the <a> inserts the link. The class is unique to the links we want to grab, so we can use that in our scraping. We just need to tell Outwit Hub to grab the link found within any <div> tag of the recipientView class.

In Outwit, we start by loading the page we want to scrape.

Then we want to start building our scraper by choosing “Scrapers.” When we click into the scraper window, we will have to pick a name for our scraper. I chose “MoH Links.” You will also see that the CMOHS website has flipped in to a code view. We will enter the directions for our scraper in the lower half of the screen, where it says description, marker before and marker after.

We just need one bit of info, so our scraper in simple. I entered:

Description = Link
Marker before = recipientView”><a href=”
Marker after = “>

You can then hit “Execute” and your scraper should grab the 25 addresses from the first page of living recipients. But remember, I don’t want the addresses from just the first page, but from all three pages.

To do this, I need to step back, get super meta, and create a list to make a list. If you go to the second page, it is easy to see how these pages are organized or named. Here is the address for the second page of recipients:

http://www.cmohs.org/living-recipients.php?p=2..

Not shockingly “p=2″ in english is “page equals two.” A list of all the address is simple to derive.

http://www.cmohs.org/living-recipients.php?p=1..

http://www.cmohs.org/living-recipients.php?p=2..


http://www.cmohs.org/living-recipients.php?p=3..

If you create this list as a simple text file (.txt), we can bring this into Outwit Hub and use our scrape on all of these pages. After I create the text file, I go to Outwit choose “File > Open” and select the text file. Next, select “Links” from the menu on the right-hand side of the screen. It should look like this:

Now, select all the links by using Command+A. Then right-mouse click and choose “Auto-Explore Pages > Fast Scrape > MoH Links (or whatever you named your scraper).” OutWit should pop out a table that looks should of like this:

YOU JUST RAN YOUR FIRST SCRAPER!!!

Way to go!

Now just export these links. You can either right-mouse click and select “Export selection” or click “Catch” and then hit “Export.” I usually export as a Excel file. We’ll eventually have to turn this file in to a text file, so we can bring it back into OutWit. For now, just export it and put it to the side.

Step 3. Create a scraper for the data we actually want.

We are going to start with “Element Inspector” again. Remember, we want to find unique identifiers related to each bit of information we want to grab. I went in an look at each piece of information (e.g., Issue Date) and looked at the coding around the information.

If you run through each of the bits of information we are grabbing, you start seeing a pattern in the way the information is coded and unique identifiers for each piece of information. For example, the code around the “Date of Issue” looks like this:

<div><span>Date of Issue:</span> 05/14/1970</div>

And it looks like that on every page I need to scrape. So I can enter the following information into a new OutWit scraper – I called this one MoH data – in order to grab the date:

Description = IssueDate
Marker before = Issue:</span>
Marker after = </

OutWit will grab the date (i.e., 5/14/1970) which is all the information between the “>” after the span to the “</” which closes the span.

Just about every piece of information we want has a label associated, which makes it very easy to scrape. I just went through and created a line in OutWit for each piece of data I wanted, using the label as the marker before.

The only piece of information that doesn’t have a label is the name. If you right-mouse click on it and choose “Inspect Element,” you will see that it is surrounded by an <H4> tag. If you use the Find function (command+F), you’ll see that the name is the only item that has an <H4> tag associated with it. So we can tell OutWit to grab all information between an <H4> tag, like so:

Description = Name
Marker before = <H4>
Marker after = </

Once I got my scraper done, I hit the “Execute” button to see if it worked. It did!

Step 4. Now I just need to tell OutWit where to use my new scraper.

Go back to the Excel file you create in Step 2. Copy the column of links and paste them into a new text file. Save this new text file. I called mine mohlinks2.txt.

Next we open up OutWit. Before actually start scraping we need to deal with a limitation of the free version of OutWit. You can only have one scraper assigned to a given web address in the free version. So we need to change “MoH Links” (our first scraper), so it is not associated with cmohs.org.

Open up “MoH Links” on the “Scrapers” page of OutWit. Below where it says “Apply If Page URL Contains” there is a box the contains “http://www.cmohs.org.” Delete the address from that box and save the new “MoH Links” scraper. Now go into the “MoH Data” scraper and enter the cmohs address in the same box, save the scraper, and then close and reopen OutWit.

Next open the mohlinks2.txt. Select all the links (command+A) and choose “Auto-Explore Pages > Fast Scrape > MoH Data (or whatever you named your scraper).” Slowly but surely OutWit Hub should go to each of the 75 pages in our links text file and grab the bits of information we told it to grab. Mine worked perfect.

All that you need to do is export the data OutWit collected and then you can go into to Excel to start cleaning the data and pulling information from the data.

Although this first one probably seemed a bit rough, you will get used to how information is structured in websites and how OutWit works over time.

JEM 494: Data Journalism

Monthly Archives: March 2014