R on Big Muddy

Some insight into writing a book using Quarto

Mon, 16 Mar 2026 20:48:00 -0400

Prof. Kieran Healy (Sociology, Yale University) shares some nice insight into the process of writing a book in Quarto using R in this post. The output screenshots he shares look beautiful, and the idea of deploying the same content as a clean PDF and a responsive website is awesome. A full draft of the book, Data Visualization: A Practical Introduction (Second Edition), is available as a website here.

I have grown increasingly tired of writing in any format other than a plain text file I can easily version control and move around, so the idea of writing a book in Quarto is appealing to me (as long as it has enough technical content to justify the format).

Using Claude Claude for cross-package statistical audits

Sun, 15 Mar 2026 22:49:00 -0400

Economist Scott Cunningham shared an important example of why we should always report the statistical package and version used in our analyses, as he used Claude Code to produce six versions of the exact same analysis using six different packages in R, Python, and Stata. In a difference-in-differences analysis of the mental health hospital closures on homicide using the standard Callaway and Sant’Anna estimator (for DiD with multiple time periods), he got very different results for some model specifications.

Since the specifications and the data were identical between packages, he discovered the divergences occurred due to how the packages handled problems with propensity score weights. Packages were not necessarily transparent about issues with these weights. If you were not running multiple analyses and comparing results across packages, or else carefully checking propensity score diagnostics, you might never have realized how precarious your results were.

Prof. Cunningham closes with the following advice:

The fifth point, and the broader point, is that this kind of cross-package, cross-language audit is exactly what Claude Code should be used for. Why? Because this is a task that is time-intensive, high-value, and brutally easy to get wrong. But just one mismatched diagnostic across languages invalidates the entire comparison, even something as simple as sample size values differing across specifications, would flag it. This is both easy and not easy — but it is not the work humans should be doing by hand given how easy it would be to even get that much wrong.

geoBoundaries: An open database of political administrative boundaries

Fri, 13 Mar 2026 17:05:00 -0400

Today I discovered geoBoundaries, a CC BY 4.0-licensed database of political administrative boundaries covering the entire world. It is notable for its high level of detail, going from ADM0 (country), ADM1 (states/provinces), ADM2 (counties/departments or municipalities), to ADM3 (municipalities or sub-municipalities) for many countries. My go-to source for world map files is Natural Earth, which is limited to ADM0 and ADM1 but is in the public domain. Natural Earth also includes some physical geography like water and bathymetry, while geoBoundaries is focused solely on political administrative boundaries. Both datasets deal with disputed boundaries, which is an endless source of tension in the Natural Earth GitHub.

An R package for retrieving data from geoBoundaries, geobounds, was released in February. A similar package for Natural Earth, rnaturalearth, has long been maintained by rOpenSci.

An end-to-end AI pipeline for policy evaluation papers

Thu, 12 Feb 2026 19:11:00 -0500

Prof. David Yanagizawa-Drott from the Social Catalyst Lab at the University of Zurich has launched Project APE (Autonomous Policy Evaluation), an end-to-end AI pipeline to generate policy evaluation papers. The vast majority of policies around the world are never rigorously evaluated, so it would certainly be useful if we were able to do so in an automated fashion.

Claude Code is the heart of the project, but other models are used to review the outputs and provide journal-style referee reports. All the coding is done in R (though Python is called in some scripts). Currently, judging is done by Gemini 3 Flash to compare against published research in top economics journals:

Blind comparison: An LLM judge compares two papers without knowing which is AI-generated Position swapping: Each pair is judged twice with paper order swapped to control for bias TrueSkill ratings: Papers accumulate skill ratings that update after each match

The project’s home page lists the AI’s current “win rate” at 3.5% in head-to-head matchups against human-written papers.

Prof. Yanagizawa-Drott says “Currently it requires at a minimum some initial human input for each paper,” although he does not specify exactly what. If we look at initialization.json that can be found in each paper’s directory, we see the following questions with user-provided inputs:

Policy domain: What policy area interests you?

Method: Which identification method?

Data era: Modern or historical data?

API keys: Did you configure data API keys?

External review: Include external model reviews?

Risk appetite: Exploration vs exploitation?

Other preferences: Any other preferences or constraints?

The code, reviews, manuscript, and even the results of the initial idea generation process are all available on GitHub. Their immediate goal is to generate a sample of 1,000 papers and run human evaluations on them (at time of posting, there are 264 papers in the GitHub repository).