Introduction

Today I am reading through and taking some notes on this very interesting paper from John Horton about using LLMs as simulated economic agents. I have been interested in learning more about this emerging field for a while (and also following John’s related Emeritus project and edsl python package), and today is the day.

My main goal for today is to try to understand what this paper is up to conceptually, and also learn a bit more about the edsl package in particular. In the spirit of bloggin, I will be writing my notes here “live” as I read through the paper

Homo Silicus Paper

The first few sentences of the homo silicus paper seem to give a clear articulation of the conceptual approach here. Here is the first, setup point:

Most economic research takes one of two forms: (a) “What would homo economicus do?” and b) “What did homo sapiens actually do?” The (a)-type research takes a maintained model of humans, homo economicus, and subjects it to various economic scenarios, endowed with different resources, preferences, information, etc., and then deducing behavior; this behavior can then be compared to the behavior of actual humans in (b)-type research.

Against this background, the conceptual intervention of the paper is then as follows (emphasis mine):

In this paper, I argue that newly developed large language models (LLM)—because of how they are trained and designed—can be thought of as implicit computational models of humans—a homo silicus. These models can be used the same way economists use homo economicus: they can be given endowments, put in scenarios, and then their behavior can be explored—though in the case of homo silicus, through computational simulation, not a mathematical deduction.

So the key conceptual point here is that we are going to use LLMs as a replacement for what Horton calls homo economicus – i.e. the theoretical models that economists use to understand the world, and that in particular are used in economics research as a benchmark for making predictions that can be compared against actual behavior.

Horton goes on to address the question of why working with LLMs might be useful for understanding humans. He explains:

The core of the argument is that LLMs—by nature of their training and design—are (1) computational models of humans and (2) likely possess a great deal of latent social information

The ultimate argument for experimenting with LLMs is pragmatic:

ultimately, what will matter in practice is whether these AI experiments are practically valuable for generating insights.

Horton describes the value further a few pages later after describing the use of AI simulations of various experiments from the economic literature:

what is the value of these experiments? The most obvious use is to pilot experiments in silico first to gain insights. They could cheaply and easily explore the parameter space, test whether behaviors seem sensitive to the precise wording of various questions, and generate data that will “look like” the actual data.

And further (parens mine):

As insights are gained (from AI experiments), they could guide actual empirical work—or interesting effects could be captured in more traditional theory models. This use of simulation as an engine of discovery is similar to what many economists do when building a “toy model”–a tool not meant to be reality but rather a tool to help us think.

The paper next gets into a handful of conceptual issues that I find very interesting. There’s a lot to think about here and I would like to revisit. One particularly interesting piece (section 2.3) is about concerns that homo silicus might be “performative” if used in economic research, simply regurgitating/parrotting behavior from economics textbooks in its training corpus.

Horton’s response to this worry is as follows:

But the fact it [the LLM model] does not “know” these theories is useful to us because it will not try to apply them. For example, it is clear GPT3 knows π and will recite it if asked for the answer in textbook language, yet it also does not know π or how to apply it in even simple real settings. Like students who have crammed for an exam, these models “know” things, but often do not apply that knowledge consistently to scenarios. This is useful because it makes this “performativity” critique less important. But even if it is a concern, we should avoid a textbook framing of questions.

I think I follow and agree with the argument Horton is making here, but I was also wondering if there is also a more “bite the bullet” type response to this line of critique, inspired by the STS / economic sociology literature which Horton cites via MacKenzie (2007).

I spent a while reading in this literature recently,and my understanding of their point is that real economic actors (i.e. homo sapiens in context of paper) also know about theoretical economic ideas and might sometimes use that knowledge in making economic decisions in various ways.

I’m not sure much that idea has been explored/tested in economics world; however, in the context of the homo silicus argument it seems relevant because ”being exposed to formal economic ideas” or “having read economics textbooks”(and having that potentially influence behavior sometimes) are actually both properties that homo silicus and homo sapiens share, but which homo economicus generally lacks. And so that actually seems like potentially a point in favor of working with homo silicus over homo economicus if the goal is to understand real people.

In the next piece of the paper, Horton shows examples of simulating various economic experiments using LLMs. To better understand this piece, I’m going to shift gears and try messing around with edsl, the new Python package he’s put out with Emeritus for running these kinds of tests.

Understanding edsl

In this section, I want to start exploring the edsl package. I am going to implement a little test inspired by this recent joke post and based on the tutorial outlined here. Here is the code:

import edsl
from edsl.agents import Agent
from edsl.questions import QuestionFreeText

# Make some political alignments
# (These were generated by chatGPT)
political_alignments = [
    "Conservative",
    "Liberal",
    "Progressive",
    "Socialist",
    "Libertarian",
    "Moderate",
    "Populist",
    "Green",
    "Nationalist",
    "Neoconservative",
    "Tea Party",
    "Democratic Socialist",
    "Anarchist",
    "Centrist",
    "Feminist",
    "Paleoconservative",
    "Alt-Right",
    "Communist",
    "Isolationist",
    "Technocrat"
]

# Make some agents with these alignments
agents = [Agent(traits = {'politics':p}) for p in political_alignments]

q = QuestionFreeText(
    question_name = "favorite_color",
    question_text = "What is the color that best represents your political attitude? Please respond with just a hex code corresponding to that color."
)

result = q.by(agents).run()

The result is a list with one record for each of the 20 agents. Each record looks something like this (which also adds some more clarity on what’s happening behind the scenes, including the full prompt):

Result(agent={'politics': 'Conservative'}, scenario={}, model=LanguageModelOpenAIThreeFiveTurbo(model = 'gpt-3.5-turbo', parameters={'temperature': 0.5, 'max_tokens': 1000, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'use_cache': True}), iteration=0, answer={'favorite_color': '#FF0000'}, prompt={'favorite_color_user_prompt': Prompt(text='You are being asked the following question: What is the color that best represents your political attitude? Please respond with just a hex code corresponding to that color.
Return a valid JSON formatted like this:
{"answer": "<put free text answer here>"}'), 'favorite_color_system_prompt': Prompt(text='You are playing the role of a human answering survey questions.
Do not break character.
You are an agent with the following persona:
{'politics': 'Conservative'}')}

In the middle there, we can see our question answer. Let’s visualize what our whole set of results look like! Here’s some python code for making a nice table (via ChatGPT):

def create_html_table(name_color_pairs):
    html = '<table border="1">'

    # Adding table headers
    html += '<tr><th>Name</th><th>Color</th></tr>'

    # Looping through each tuple to create table rows
    for name, color in name_color_pairs:
        html += f'<tr><td>{name}</td><td style="background-color:{color};"></td></tr>'

    html += '</table>'
    return html

tups = [(r.agent.to_dict()['traits']['politics'], r.answer['favorite_color']) for r in result]

html_table = create_html_table(tups)
print(html_table)

Here is the result:

NameColor
Conservative
Liberal
Progressive
Socialist
Libertarian
Moderate
Populist
Green
Nationalist
Neoconservative
Tea Party
Democratic Socialist
Anarchist
Centrist
Feminist
Paleoconservative
Alt-Right
Communist
Isolationist
Technocrat

Funny. The use of grey for moderate / technocratic attitudes is interesting. Most are otherwise fairly intuitive.