Introduction

The Department of Treasury recently launched the very cool Beta.USASpending.gov which makes it easy to browse and query for US Government spending data. In this post, we’ll give a quick example of how to use the Python requests module to query this data via the USASpending API, and then create a few plots with pandas.

Pulling the Data

As a simple test case, let’s use the API to pull all the awards from the Department of Defense, following the example given here.

To begin, here is a basic function to pull from the awards endpoint for a given agency (by CGAC code):

import requests

def get_awards(CGAC = None): 

	# initialization
	has_next_page = True
	page = 1
	output = []

	while has_next_page: 
	
		# parameters for request
		payload = {'awarding_agency__toptier_agency__cgac_code': CGAC, 'page':page}

		# run the request
		response = requests.get('https://api.usaspending.gov/api/v1/awards/', params=payload).json()

		# add response data to output
		output += response['results']

		# handle pagination
		has_next_page = response['page_metadata']['has_next_page']
		page+=1

	return output

To pull all the DoD data, we now just run the function using the DoD CGAC code:

 
data = get_awards(CGAC = '097')

This returns a list of nested JSON objects corresponding the list of awards. In this case, the API returns 3016 records, each of which looks something like this:

data[0]
{u'awarding_agency': {u'id': 1159,
  u'office_agency': None,
  u'subtier_agency': {u'abbreviation': u'',
   u'name': u'Department of the Air Force',
   u'subtier_code': u'5700'},
  u'toptier_agency': {u'abbreviation': u'DOD',
   u'cgac_code': u'097',
   u'fpds_code': u'9700',
   u'name': u'Department of Defense'}},
 u'date_signed': u'2017-02-16',
 u'date_signed__fy': 2017,
 u'description': u"TAS::57 3600::TAS 'INNOVATIONS IN STATISTICAL IMAGE ANALYSIS AND APPLICATIONS TO 3D IMAGING FOR IMPROVED SSA, DATED 20 JUN 2014'",
 u'fain': u'FA95501510286',
 u'funding_agency': None,
 u'id': 171259,
 u'period_of_performance_current_end_date': u'2018-08-14',
 u'period_of_performance_start_date': u'2015-08-15',
 u'piid': None,
 u'place_of_performance': {u'address_line1': None,
  u'address_line2': None,
  u'address_line3': None,
  u'city_name': u'ALBUQUERQUE',
  u'country_name': u'UNITED STATES',
  u'foreign_city_name': None,
  u'foreign_postal_code': None,
  u'foreign_province': None,
  u'location_country_code': u'USA',
  u'state_code': u'NM',
  u'state_name': u'NEW MEXICO',
  u'zip5': None},
 u'potential_total_value_of_award': None,
 u'recipient': {u'business_types': None,
  u'business_types_description': u'Unknown Types',
  u'legal_entity_id': 2049,
  u'location': {u'address_line1': u'1700 LOMAS BLVD NE',
   u'address_line2': None,
   u'address_line3': None,
   u'city_name': u'ALBUQUERQUE',
   u'country_name': u'UNITED STATES',
   u'foreign_city_name': None,
   u'foreign_postal_code': None,
   u'foreign_province': u'ROMA',
   u'location_country_code': u'USA',
   u'state_code': u'NM',
   u'state_name': None,
   u'zip5': None},
  u'parent_recipient_unique_id': None,
  u'recipient_name': u'UNIVERSITY OF NEW MEXICO'},
 u'subaward_count': 1,
 u'total_obligation': u'14992.00',
 u'total_outlay': None,
 u'total_subaward_amount': u'51856.00',
 u'type': u'04',
 u'type_description': u'Project Grant',
 u'uri': None}

Exploring the Data

Given the API response data, it’s easy to use pandas to start exploring the data. First, we dump the API response data into a data frame:

 
import pandas as pd

# creates a data frame with a bunch of nested structure
df = pd.DataFrame(data)

This gives us data frame with a bunch of nested structure like this:

df.head()
                                     awarding_agency date_signed  \
0  {u'office_agency': None, u'id': 1159, u'subtie...  2017-02-16   
1  {u'office_agency': None, u'id': 1141, u'subtie...  2017-01-30   
2  {u'office_agency': None, u'id': 1159, u'subtie...  2017-01-01   
3  {u'office_agency': None, u'id': 1141, u'subtie...  2017-01-31   
4  {u'office_agency': None, u'id': 1141, u'subtie...  2017-01-19   

   date_signed__fy                                        description  \
0             2017  TAS::57 3600::TAS 'INNOVATIONS IN STATISTICAL ...   
1             2017  OPTIMIZATION STUDY FOR INTEGRATION OF WIDE-BAN...   
2             2017                                       ADP SUPPLIES   
3             2017  THE RECIPIENT SHALL PERFORM A COORDINATED RESE...   
4             2017                    FLARNG - 1003 - SECURITY GUARDS   

            fain                                     funding_agency      id  \
0  FA95501510286                                               None  171259   
1  N000141612940                                               None  172783   
2           None  {u'office_agency': None, u'id': 1159, u'subtie...  174371   
3  FA86501425800                                               None  172583   
4  W911YN1621003                                               None  173326   

  period_of_performance_current_end_date period_of_performance_start_date  \
0                             2018-08-14                       2015-08-15   
1                             2018-07-31                       2017-01-30   
2                             2017-09-30                       2017-01-01   
3                             2020-05-28                       2014-05-08   
4                             2017-09-30                       2016-10-01   

            piid                               place_of_performance  \
0           None  {u'address_line2': None, u'address_line3': Non...   
1           None  {u'address_line2': None, u'address_line3': Non...   
2  FA481417FG034  {u'address_line2': None, u'address_line3': Non...   
3           None  {u'address_line2': None, u'address_line3': Non...   
4           None  {u'address_line2': None, u'address_line3': Non...   

  potential_total_value_of_award  \
0                           None   
1                           None   
2                        6940.00   
3                           None   
4                           None   

                                           recipient  subaward_count  \
0  {u'parent_recipient_unique_id': None, u'legal_...               1   
1  {u'parent_recipient_unique_id': None, u'legal_...               0   
2  {u'parent_recipient_unique_id': None, u'legal_...               0   
3  {u'parent_recipient_unique_id': None, u'legal_...               0   
4  {u'parent_recipient_unique_id': None, u'legal_...               0   

  total_obligation total_outlay total_subaward_amount type  \
0         14992.00         None              51856.00   04   
1        113079.00         None                  None   04   
2          6940.00         None                  None    C   
3        302000.00         None                  None   05   
4        297906.00         None                  None   05   

        type_description   uri  
0          Project Grant  None  
1          Project Grant  None  
2         Delivery Order  None  
3  Cooperative Agreement  None  
4  Cooperative Agreement  None  

One interesting thing we can look at straight away is the distribution of award sizes, using the total_obligation field, which the data dictionary defines as “The amount of money the government is obligated to pay for the award”.

First a few summary statistics:

df.total_obligation.describe()
count        3016.000000
mean       290584.619460
std        922759.315605
min      -2200666.000000
25%         14377.110000
50%         87368.500000
75%        240431.500000
max      20091931.000000
Name: total_obligation, dtype: float64

It looks like the average award is about ~$290k, with the largest being just over $20M. The $20M award is this one, titled “1001 PA ARMY NATIONAL GUARD FACILITIES PROGRAM” that went to PA Military & Veterans Affairs, which makes sense.

One odd thing we notice is that some awards seem to be negative. In fact, this includes about 3% of the awards in the sample. I’m not sure exactly why this would be the case, but one hypothesis is that these are de-obligations e.g. when money is returned to the DoD because a project came in under budget. Some of these negative awards are quite large–e.g. this one that appears to be related to funding for this study on treating severe traumatic injury and haemorrhagic shock.

In any case, let’s remove the negative awards, and take a look at the full distribution of award sizes from the DoD (using the seaborn package; code excluded):

center

We see that most of the awards are on the low end of the distribution (< $1M), though there is a long tail of much larger awards.

Conclusion

There is much more that we can do with this data. The goal of this post has been to give a basic overview of how to access the USASpending data and manipulate it using python and pandas.