Finding Senators with Gap Terms on Wikidata using SPARQL
A few days ago, I went in search of United States senators who had pulled a Grover Cleaveland and served multiple non-consecutive terms. I couldn’t think of any examples, but it’s happened a lot of times including recently with senator Dan Coats (R-IN) who was a senator from 1989 to 1999 and from 2011 to 2017.
This post isn’t really about senators and non-consecutive terms, but is a quick walk-through on how ask Wikidata questions using SPARQL. As a bonus though, I found a senator who was succeeded twice by the same man and a senator who represented three states.
How to ask Wikipedia a question
Let’s start with my question
What senators served non-consecutive terms?
To start, go to the Wikipedia page for a current senator like Elizabeth Warren. On the sidebar, follow the link to “Wikidata item.” As you scroll through that page, you’ll notice a section titled “positions held” with “United States Senator.”
The page you’re currently on is for the “Elizabeth Warren” item (Q595844) and lists properties that relate to her. Each property is related to the original item in the form of a statement containing a subject - property - value like Elizabeth Warren (Q595844) - Position Held (P39) - United States Senator (Q13217683). This “connection” itself can have properties as well, like “start time” (P580) which we can use in a filter later.
That’s basically how the data is structured. In the next section, we’ll use SPARQL to actually start asking questions with this data model in mind.
Building the Query
Let’s start with a simple question and build it up until it answers our original question.
List the United States Senators
I’ve written that question deliberately in an “imperative” mood to draw a distinction between programming languages like python and languages like SPARQL, which we’ll use to satisfy our query.
In python, you might try to answer a question like this with code like
for person in people:
if person.occupation == "United States Senator":
print(person)
SPARQL is in a different family of programming languages called “query languages” which is related to the family of logical programming languages. Instead of writing instructions to find information, you write situations with variables and let the program figure out what values the variables should take.
You would instead write “find me all items with a property of “position held” with the value “United States Senator.”
SELECT ?senator
WHERE {
?senator
wdt:P39 # "Position Held"
wd:Q13217683. # "United States Senator"
}
?senator
is an unspecified variable (the particular ?name
doesn’t matter)
and the database engine tries to “satisfy” the query by listing all values of
?senator
for which the query is accurate.
The ?senator
values aren’t particularly readable, so we’ll paste in a little
boilerplate so we can use ?senatorLabel
in future queries which is a nice
representation of the corresponding ?senator
item.
SELECT ?senator ?senatorLabel
WHERE {
?senator
wdt:P39 # "Position Held"
wd:Q13217683. # "United States Senator"
}
Who are the current United States Senators?
As a simple modification of the previous query, we’ll ask Wikidata to find senators whose property “Position Held” with value “United States Senator” has no end date.
In the previous query, we were able to ask about a statement directly, but
because we’re asking about qualifiers of the statement, we’ll have to break
down the query to build it back up.
Instead of writing a query like ?subject wdt:?property wd:?value
we’ll write
one statement that assigns the “Position Held” property to its own variable
using the p:
prefix and write another a new statement about the value of of
the “Position Held” using the ps:
prefix.
SELECT ?senator ?senatorLabel ?end_time
WHERE {
?senator p:P39 ?term.
?term ps:P39 wd:Q13217683.
?term pq:P580 ?end_time.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
}
}
I’ve gone ahead and added a third statement to assign the “end time”
[P80][wikidata end_time] to its own variable by extracting it as a qualifier
with the pq:
prefix, but the results that “satisfy” the query always
have end dates. So we’ll add an OPTIONAL
block and put the pq:
statement in
that block.
SELECT ?senator ?senatorLabel ?end_time
WHERE {
?senator p:P39 ?term.
?term ps:P39 wd:Q13217683.
OPTIONAL {
?term pq:P580 ?end_time.
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
}
}
We could filter for only the ?senator
values without an “end time” here, but
that wont be relevant in the final query so I’d recommend doing that in excel,
python, or your other tool of choice.
This “intermediate” query might have seemed like a distraction, but we’ve actually gotten all of the data we need to answer the original question in excel or python. In the first page of results, you may already see some senators showing up twice. That’s because each line corresponds to one way of assigning the variables in the query that satisfies the requirements.
Before we move on, let’s take a careful look at the data.
Checking the data
Recognize this guy?
senator | senatorLabel | end_time |
---|---|---|
Q249898 | David Palmer |
That’s actually Dennis Haysbert’s character from 24. On the item page on Wikidata, you’ll see that he is an “instance of” (P31) “fictional human” (Q15632617). Back on Elizabeth Warren’s Wikidata page, you’ll see she’s an instance of “human” (Q5) so before we go any farther, let’s limit ourselves to only real humans.
SELECT ?senatorLabel ?start_time ?end_time ?stateLabel WHERE {
?senator wdt:P31 wd:Q5.
?senator p:P39 ?term.
?term ps:P39 wd:Q13217683.
OPTIONAL {
?term pq:P580 ?start_time.
?term pq:P582 ?end_time.
?term pq:P768 ?state.
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
}
senatorLabel | start_time | end_time | stateLabel |
---|---|---|---|
Brock Adams | 1987-01-03T00:00:00Z | 1993-01-03T00:00:00Z | Washington |
David C. Broderick | 1857-03-04T00:00:00Z | 1859-09-16T00:00:00Z | California |
David Settle Reid | 1854-12-06T00:00:00Z | 1859-03-04T00:00:00Z | North Carolina |
Arlen Specter | 1981-01-03T00:00:00Z | 2011-01-03T00:00:00Z | Pennsylvania |
Allen Joseph Ellender | 1937-01-03T00:00:00Z | 1972-01-27T00:00:00Z | Louisiana |
download 2014 rows or run query |
I stumbled on Wikidata a few months ago while searching for data about the Massachusetts legislature. Besides some gotchas in the data like “fictional humans” creeping in, the data in that case wasn’t good enough to use: most candidates–even incumbents–didn’t have Wikidata items and the data that did exist was incomplete or implemented differently for some candidates.
It’s likely that this data is also incomplete or sometimes inaccurate. Nonetheless, it’s a really powerful way to start answering my question and if you find inaccurate data, you can submit a correction easily through the Wikidata interface.
Senators with Non-Consecutive Terms
We’re finally here. We have a list of senators with separate entries for each term they’ve served and we have the start and end dates of the terms are included to help catch duplicate data. This part is pretty straightforward in excel or python (see addendum) so here’s you’re reward:
senator | term start | term end | notes |
---|---|---|---|
Alben W. Barkley | 1927-03-04 | 1949-01-19 | |
1955-01-03 | 1956-04-30 | ||
Angus Cameron | 1875-03-04 | 1881-03-03 | |
1881-03-14 | 1885-03-03 | ||
Dan Coats | 1989-01-03 | 1999-01-03 | |
download 57 rows |
Bonus #1: The senator who was succeeded twice by the same man
Among the plenty of cases when that happened, I stumbled found a case where two senators served four alternating terms in 17 years.
In 1869, Matthew Hale Carpenter (R-WI) began his first term in the United States Senate. Although he campaigned for re-election in 1875, he ultimately lost to Angus Cameron (D-WI). After just one term, Cameron decided to call it a quits and Carpenter was reelected by the state legislature (pre-17th Amendment).
You’re probably guessing where this is going, but Carpenter was already in poor health when he took office in 1881 and died just months into his disconnected, second term. Cameron was fortunately still around and was quickly re-elected and began his own disconnected, second term in March 1881.
https://en.wikipedia.org/wiki/Matthew_H._Carpenter
Bonus #2: James Shields (D-IL/MN/MO)
After almost dueling Lincoln and then almost becoming Governor of Oregon Territory, James Shields was appointed Senator of Illinois, resigned because he hadn’t been a citizen long enough, and then was elected again before the end of the year.
After defeat in 1855, Shields moved to the territory of Minnesota and was elected senator when Minnesota became a state in 1858. Shields drew the short straw (literally) and his term ended in 1859 so the other senator from Minnesota could serve until 1863.
In 1879, after moving to California and back to the South and fighting for the Union in the Civil War, Shields was eventually elected to replace a Missouri senator who had died near the end of his term. Shields finally called it a quits, presumably satisfied his record of serving as senator from three states would never be matched.
I’ve left out a lot of details and would definitely recommend reading his Wikipedia article https://en.wikipedia.org/wiki/James_Shields_(politician,_born_1806).
Addendum: Getting duplicates from python
Here’s how I extracted the duplicates using python.
import pandas
df = pandas.read_csv("query.csv", parse_dates=['start_time', 'end_time'])
df['start_time'] = pandas.to_datetime(df['start_time'].dt.date)
df['end_time'] = pandas.to_datetime(df['end_time'].dt.date)
# Data comes from multiple sources and includes a lot of exact duplicates
df.drop_duplicates(inplace=True)
multi_terms = df.duplicates('senatorLabel', keep=False)
df[multi_terms].set_index(['senatorLabel', 'start_time']).sort_index()
The data at this point can be scanned through for anomalies or used however.