The Department of Treasury recently launched the very cool Beta.USASpending.gov which makes it easy to browse and query for US Government spending data. In this post, we’ll give a quick example of how to use the Python requests module to query this data via the USASpending API, and then create a few plots with pandas.
Pulling the Data
As a simple test case, let’s use the API to pull all the awards from the Department of Defense, following the example given here.
To begin, here is a basic function to pull from the awards endpoint for a given agency (by CGAC code):
To pull all the DoD data, we now just run the function using the DoD CGAC code:
This returns a list of nested JSON objects corresponding the list of awards. In this case, the API returns 3016 records, each of which looks something like this:
Exploring the Data
Given the API response data, it’s easy to use pandas to start exploring the data. First, we dump the API response data into a data frame:
This gives us data frame with a bunch of nested structure like this:
One interesting thing we can look at straight away is the distribution of award sizes, using the total_obligation field, which the data dictionary defines as “The amount of money the government is obligated to pay for the award”.
First a few summary statistics:
It looks like the average award is about ~$290k, with the largest being just over $20M. The $20M award is this one, titled “1001 PA ARMY NATIONAL GUARD FACILITIES PROGRAM” that went to PA Military & Veterans Affairs, which makes sense.
One odd thing we notice is that some awards seem to be negative. In fact, this includes about 3% of the awards in the sample. I’m not sure exactly why this would be the case, but one hypothesis is that these are de-obligations e.g. when money is returned to the DoD because a project came in under budget. Some of these negative awards are quite large–e.g. this one that appears to be related to funding for this study on treating severe traumatic injury and haemorrhagic shock.
In any case, let’s remove the negative awards, and take a look at the full distribution of award sizes from the DoD (using the seaborn package; code excluded):
We see that most of the awards are on the low end of the distribution (< $1M), though there is a long tail of much larger awards.
Conclusion
There is much more that we can do with this data. The goal of this post has been to give a basic overview of how to access the USASpending data and manipulate it using python and pandas.