Big Muddy

Links + notes: tech & life

An end-to-end AI pipeline for policy evaluation papers · ↗ ape.socialcatalystlab.org


Prof. David Yanagizawa-Drott from the Social Catalyst Lab at the University of Zurich has launched Project APE (Autonomous Policy Evaluation), an end-to-end AI pipeline to generate policy evaluation papers. The vast majority of policies around the world are never rigorously evaluated, so it would certainly be useful if we were able to do so in an automated fashion.

Claude Code is the heart of the project, but other models are used to review the outputs and provide journal-style referee reports. All the coding is done in R (though Python is called in some scripts). Currently, judging is done by Gemini 3 Flash to compare against published research in top economics journals:

Blind comparison: An LLM judge compares two papers without knowing which is AI-generated Position swapping: Each pair is judged twice with paper order swapped to control for bias TrueSkill ratings: Papers accumulate skill ratings that update after each match

Read more ⟶

There is only one statistical test · ↗ allendowney.substack.com


A classic article by computer scientist Allen Downey on why there is only one statistical test: compute a test statistic from your observed data, simulate a null hypothesis, and finally compute/approximate a p-value by calculating the fraction of test statistics from the simulated data exceeding the test statistic from your observed data.

Diagram illustrating a single hypothesis-testing workflow: observed data are converted into a test statistic (effect  δ∗); a null model H0 generates many simulated datasets to form the distribution of  δ under H0; the p-value is the tail area of that distribution beyond  δ∗.

Downey suggests using general simulation methods over the canon of rigid, inflexible tests invented when computation was difficult and expensive.

Hat tip to Ryan Briggs on Twitter.

The case for sharing clinical trial data · ↗ www.clinicaltrialsabundance.blog


Saloni Dattani of the excellent Works in Progress magazine (and formerly of Our World in Data) launched a new Substack today called The Clinical Trials Abundance blog. The first post is on the case for sharing clinical trial data. We have been gradually moving toward mandatory reporting of clinical trial results (though enforcement is another question), but sharing data would be one step further. Even though clinical trials rely on the trust (and often money) of the public, it can be very difficult to gain access to the raw results, even if journal article authors claim they are “available upon request”. A norm of clinical trial data sharing would not only increase the confidence in published results but also aid future drug development, reduce expensive redundancy, and improve meta-analyses (which are often forced to rely on heterogeneous summary measures).

Why a Canadian news site just launched an AI publishing tool · ↗ thehub.ca


It’s no secret that Canadian journalism (like journalism everywhere) is in trouble. Newsrooms face a steady stream of layoffs despite a couple hundred million Canadian dollars of direct and indirect government subsidies every year. The vast majority of outlets eligible for these subsidies take advantage of them, and combined they can subsidize half of a journalist’s salary. News organizations are desperate to diversify their revenue streams.

The Hub is a right-leaning publication launched in 2021 with a focus on policy and politics. Notably, the outlet declines or donates their subsidies, citing a valid concern that the scale of such subsidies threaten the perceived trustworthiness and independence of the media.

In late January 2026, The Hub launched NewsBox, an AI-powered publishing tool. NewsBox aims to make it easier for creators to transform their content (written, audio, or video) into other formats, such as speeches, essays, or talking points, while maintaining the author’s distinct voice. You can see examples of the tool’s output on new articles in The Hub, each of which is accompanied by an AI-generated summary and list of quotes at the top of the page. There is also a “Hub AI” chatbot in the sidebar of every article.

Read more ⟶

A handful of composers created most classic RPG soundtracks


I’ve always been a big fan of soundtracks, and video game soundtracks are no exception. Buying games on GOG.com usually nets you the soundtracks as well, so recently I’ve been enjoying a lot of classic RPG music. What struck me was how few composers were responsible for creating the ambiance of so many beloved classics. Look at how many series are covered by just the following six composers:

  • Inon Zur (Icewind Dale II, Dragon Age: Origins, Dragon Age II, Fallout series starting with Fallout 3 plus Fallout Tactics, co-composer for Baldur’s Gate II: Throne of Bhaal and Pathfinder: Kingmaker, additional music for Neverwinter Nights)
  • Jeremy Soule (Neverwinter Nights, Icewind Dale, The Elder Scrolls series starting with Morrowind, Star Wars: Knights of the Old Republic)
  • Justin E. Bell (Pillars of Eternity series, Tyranny, The Outer Worlds)
  • Mark Morgan (Fallout, Fallout 2, Planescape: Torment, Torment: Tides of Numenera, Wasteland 2, Wasteland 3)
  • Kirill Pokrovsky (Divinity series up through Divinity: Original Sin)
  • Borislav Slavov (Divinity: Origin Sin II, Baldur’s Gate 3)

Of the above, I highly recommend the truly excellent Divine Divinity soundtrack (terrible title, great music!), as well as Baldur’s Gate 3, particularly the vocal songs like “Down by the River”, “I Want to Live”, and “The Power”.

Read more ⟶

How do you regain access to your computer if you lose your memory? · ↗ news.ycombinator.com


I read this interesting discussion this morning on Hacker News on the question of how to regain access to your computer if you lose your memory. As always, it starts with figuring out your threat model and responding accordingly.

Anthropic's statistical analysis skill doesn't get statistical significance quite right · ↗ github.com


Anthropic’s new statistical analysis skill demonstrates a common misunderstanding of statistical significance:

Statistical significance means the difference is unlikely due to chance.

But this phrasing isn’t quite right. The p-value in Null Hypothesis Significance Testing is not about the probability the results are “due to chance”; it is the probability—under the null hypothesis and the model assumptions—of observing results at least as extreme as the ones we obtained. In other words, the p-value summarizes how compatible the data are with the null, given our modelling choices. What it does not tell you is the probability that the null hypothesis is true.

Statistician Andrew Gelman gave a good definition for statistical significance in a 2015 blog post:

A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed or greater, under the null hypothesis (which is commonly the hypothesis that there is no effect). Thus, the smaller the p-value, the less consistent are the data with the null hypothesis under this measure.

Read more ⟶

The CIA World Factbook has been memory holed · ↗ simonwillison.net


Another staple of my childhood is gone, this time the CIA’s World Factbook. I have fond memories of consulting the World Factbook for school projects in my elementary school computer lab. But as of yesterday, the entire publication along with all of its archives have been suddenly and unceremoniously wiped from the agency’s website. At least archives of the website are still available on the Internet Archive, with complete zip files up to 2020 and Wayback Machine snapshots thereafter.

Guinea worm one step closer to eradication · ↗ www.cartercenter.org


Only 10 cases of guinea worm were reported in 2025, down from an estimated 3.5 million cases per year when the elimination campaign began four decades ago. The disease is an ancient one, believed by some to be the “fiery serpents” that beset the ancient Israelites in The Book of Numbers. It is treated by carefully wrapping the parasite around a small stick as it painfully emerges over the course of weeks. This may be the inspiration for the Staff of Asclepius (⚕), the predominant symbol of medicine showing a a serpent wrapped around a rod.

When I was studying mathematical modelling of infectious diseases at the University of Ottawa in the mid 2010s, the question was whether Jimmy Carter would outlive the guinea worm. Tragically, he did not, but his life’s work helped to prevent an estimated 100 million cases of the disabling disease and made him a hero in global health.

Read more ⟶

msgvault: A personal email archive and search system to watch · ↗ wesmckinney.com


Here’s a new project to watch if you are interested in taking control of your email: msgvault. The tool provides a local, searchable version of all of your Gmail messages and attachments, backed by SQLite and DuckDB.

The author, Wes McKinney, says he may add support for other email services in the future, as well as WhatsApp, iMessage, and SMS. I’ll probably look into it for myself once the project matures a little. Although given that it stores everything in a single giant database file, it won’t fit into my standard backup strategy of versioned, incremental backups. Still, it could be a nice step forward in regaining control over my email archives.

Hat tip to j4mie on HackerNews.

The Divergent Association Task, a measure for creativity · ↗ www.pnas.org


The Divergent Association Task is a short, simple test introduced in 2021 claiming to measure creativity. Taking only a minute and a half, it asks participants to “generate 10 nouns that are as different from each other as possible in all meanings and uses of the words”.

Although the instructions say to “avoid specialized vocabulary (e.g., no technical terms)”, I imagine you might score higher if you’ve just finished cramming wordlists for the GRE. Researchers have used this test to compare human and AI creativity (though the use of GPT-4 in this article with a January 2026 publication date speaks to the incompatibility of AI research with traditional publication timelines).

A/B testing for advertising is not randomized · ↗ flovv.github.io


Florian Teschner writes about a recent paper from Bögershausen, Oertzen, & Bock arguing that online ad platforms like Facebook and Google misrepresent the meaning of “A/B testing” for ad campaigns. In A/B testing, we might assume the platform is randomly assigning users to see ad A or ad B, in an attempt to get a clean causal interpretation about which ad is more likely to drive a click (or whatever outcome you’re tracking).

But according to the paper, this is usually not what is happening. Instead, the platform optimizes delivery for each ad independently, steering each one toward the users most likely to click it. In other words, the two ads may be shown to different groups of users, and differences in click-through rates may be attributable to who is seeing the ad, as opposed to the overall appeal of the ad. Ad platforms convert A/B tests from simple randomized experiments into murky observational comparisons. For example, an ad may appear to do better because it happened to be shown disproportionately to a group with a high click-through rate, not because it presents a more compelling overall message. Advertisers get the warm glow of “experimentally backed” marketing without the assurances of randomization.

Total electoral wipeout · ↗ en.wikipedia.org


The 2002 Turkish general election is the canonical example of total electoral wipeout. Every party holding seats in the previous legislature was completely wiped out. Of the two parties that won seats in the 2002 election, the one that formed government didn’t even exist at the time of the previous election (current president Erdoğan’s AK Party, formed in 2001). Of note, it wasn’t a complete changing of the guard: one of the three independent members from the 1999 parliament won his seat again in 2002 (Mehmet Ağar), though it seems he took over as leader of one of the wiped-out parties shortly after the election.

Hat tip to kynakwado2 on Twitter.

Twyman's law · ↗ en.wikipedia.org


From Wikipedia:

Twyman’s law states that “Any figure that looks interesting or different is usually wrong”

A bit different from that oft-quoted line attributed to Isaac Asimov:

The most exciting phrase in science is not ‘Eureka!’ but ‘that’s funny’

But Twyman’s law is much truer in my experience. Surprising results are usually a signal that something is screwy with my data, my assumptions, or my pipeline.

Hat tip to DJ Rich on Twitter.

Remember that a lot of numbers are fake · ↗ davidoks.blog


David Oks wrote an essay reminding us that in many countries, even the most basic statistic—the population—is often shockingly uncertain or even outright fabricated. It’s a good reminder that many of the numbers we rely on for international comparisons, like crime rates and economic indices, are similarly troubled by incompatible definitions, uneven measurement, and varying degrees of manipulation. Ask Google what the population of Afghanistan is, and it will happily show you an annual timeline of population since 1960, but the tidiness of the chart belies the murkiness of the estimate.

One of the drawbacks of easily accessible international datasets from organizations like the World Bank and Our World in Data is that they paper over the huge differences among the underlying source datasets. Ultimately, you end up with one number from each country and the implication that they are all pointing to a single construct. This makes it far too easy to draw confident comparisons between countries that simply aren’t measuring the same thing. Without being forced to assemble these datasets yourself, it’s difficult to appreciate how messy it is to measure “the same thing” across different places (or even to measure the same thing over time within one place).

Read more ⟶

Welcome to Big Muddy


Hi, I’m Jean-Paul R. Soucy, a data scientist working in healthcare in Montreal, Canada. Welcome to Big Muddy, my spin on a Simon Willison-style links-and-notes blog. Here I collect and share things I’m learning across technology, science, politics, and whatever else catches my interest. You’ll find interesting links, brief write-ups, quick experiments, and the occasional deep dive.