Link on Big Muddy

Adjusting for recalled past vote in political polling

Sun, 12 Apr 2026 22:55:00 -0400

The founder of Abacus Data, a Canadian polling firm, dropped kind of an interesting URL yesterday: abacus-weighting.com. It’s a advertisement in the form of a case study on why Abacus weights their political polls on past vote. It fits perfectly with the theme of yesterday’s post on how pollster’s get different results from the same data (the answer is they weight the raw data differently).

If you follow Nate Silver (or American political polling in general), you probably know that pollsters undercounted Trump support in all three elections where he was on the ballot. What I learned from this post is that support for the Conservative Party of Canada has been underestimated in their firm’s polling data in every polling wave for every election since 2011:

In every single wave, across every single election cycle, Conservative voters are underrepresented in our demographically weighted sample relative to their actual share of the vote. Not in most waves. Not in some elections. In every case we can observe.

Weighting for recalled past vote improves the estimate in every case, sometimes dramatically so:

In every election, past vote weighting moved our Conservative estimates upward and our Liberal estimates downward — consistently in the direction of the actual result. The 2021 election shows the most dramatic correction: a 7-point improvement in our Conservative estimate.

How do pollsters get different results from the same data?

Sat, 11 Apr 2026 22:36:00 -0400

Nate Silver linked to this throwback article from 2016 in The New York Times in his recent article on fake AI polls, which I also wrote about a few days ago. The article, entitled “We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.” is a good reminder that modern polling diverges very far from the theoretical ideal of a simple random sample. Even after deciding on a methodology to sample participants and collecting the data, a lot of work goes into interpreting raw poll responses to give us top-line polling numbers. Every pollster needs to figure out how to weight the responses they get, since poll response rates are abysmal and variable across different demographic groups. As in the example given in this piece, these choices can result in large differences in those top-line numbers: from +4 Clinton to +1 Trump, all from the same raw data!

For an interesting follow-up: “Polling is becoming more of an art than a science”, also on Nate Silver’s Substack.

Scientists invent a fake disease, AI picks it up, other scientists cite it

Fri, 10 Apr 2026 18:27:00 -0400

A somewhat disturbing bit of reporting from Nature tells the story of bixonimania, a fake eye disease invented by Swedish medical researcher Almira Osmanovic Thunström and her team. She seeded the idea for the fake disease in a series of ridiculous, joke-filled blog posts and preprints in mid-2024.

Because AI can be overly credulous with its sourcing (how often do Google’s AI answers confident cite random Reddit posts for the bulk of an answer?), the disease got picked up as an “emerging term” by the leading chatbots. The preprints even got cited a handful of times in real publications, which is further evidence that scientists don’t read the papers they cite (I guess the modern equivalent of copying citations from other papers is having AI dredge the literature for you).

I can see AI agents being exploited by those pushing dubious medical diagnoses to flood the Internet and preprint servers with articles aimed at convincing LLMs of the validity of their positions. That is if the agents aren’t too busy spinning of websites to defame those who incur their wrath.

A data point against the idea that AI will freeze/homogenize culture

Thu, 09 Apr 2026 07:00:00 -0400

Here’s an interesting figure and accompanying passage from this 2023 preprint entitled “Machine Culture”:

The innovations generated by AlphaGo and AlphaGo Zero soon entered human culture, as shown by research comparing human gameplay before and after the algorithms’ introduction. The decision quality, as measured by an open-source variant of AlphaGo Zero, showed very little improvement in human gameplay from 1950 to 2016, followed by a sudden improvement after the introduction of AlphaGo in March 2016. However, this improvement wasn’t solely due to humans adopting strategies developed by AlphaGo. It also reflected an unexpected shift, wherein humans started developing moves that were qualitatively distinct both from previous human moves and from the novel moves introduced by AlphaGo. In summary, AlphaGo served as an early, quantifiable exemplar of machine culture, generating novel cultural variations through genuine, nonhuman innovation. This was followed by a major transition into an even broader range of traits as the result of humans building on the previous discoveries made by machines. As the methods underpinning AlphaGo have been generalized to other games and extended to scientific problems, we anticipate a continued infusion of machine-generated discoveries across diverse domains of human culture.

AI makes it easier to generate fake papers, too

Wed, 08 Apr 2026 20:09:00 -0400

Here’s a fun project from Tyler Vigen, creator of the famous Spurious Correlations page (which has been cited as a cautionary tale in many a science class). Using his database of real but spurious correlations (created by calculating the Pearson correlation coefficient r between a very large number of variables and picking out the hits), he used AI to create amusing fake manuscripts expounding on these statistical flukes as if they were real research questions.

These papers were generated in January 2024, and as previously discussed on this blog, the pipeline for end-to-end paper generation has come a long way in two years. I have no doubt Tyler could make these paper’s sound much more convincing using today’s models, though of course his goal here is to make you laugh (and think), not to trick you. But I have no doubt there will be many scholars adopting this data dredging strategy to generate “real” papers, contributing to a deluge of papers flooding the academic publishing system.

What is a public opinion poll without the public?

Tue, 07 Apr 2026 18:27:00 -0400

A few days ago, two professors (Leif Weatherby and Benjamin Recht) published an opinion piece in the New York Times calling attention to Axios publishing a story on maternal health using invented polling results:

A recent Axios story on maternal health policy referred to “findings” that a majority of people trusted their doctors and nurses. On the surface, there’s nothing unusual about that. What wasn’t originally mentioned, however, was that these findings were made up.

Clicking through the links revealed (as did a subsequent editor’s note and clarification by Axios) that the public opinion poll was a computer simulation run by the artificial intelligence start-up Aaru. No people were involved in the creation of these opinions.

The piece goes on to argue that this so-called “silicon sampling” is seductive because good public opinion polling is expensive, hard to do, and still prone to bias. But this shortcut magnifies the the problem of bias rather than solving it.

I’ve read a little bit about this strategy of using LLM-generated survey participants in the context of social science research in a series of posts (mostly from Prof. Jessica Hullman) over on Andrew Gelman’s blog:

Validating language models as study participants: How it’s being done, why it fails, and what works instead (2025-12-19)
Survey Statistics: Thomas Lumley writes about Interviewing your Laptop (2025-08-26)
When does it make sense to talk about LLMs having beliefs? (2025-08-15)
Better and worse ways to mix human and LLM responses in behavioral research (but you still have to figure what you’re measuring) (2025-06-12)
LLMs as behavioral study participants (2025-05-29)

Silicon sampling seems moderately interesting from a research perspective, but I can’t help but agree with the New York Times opinion piece authors that this will be ruinous for the already waning trust in public opinion polling. If you didn’t bother to ask the public, then why should the public care what you “find”? I think there is probably a lot of utility in using LLM samples to aid in designing and validating surveys, though.

Social media is a freak show

Mon, 06 Apr 2026 13:31:00 -0400

I quite enjoyed Nate Silver’s recent Substack post “Social media has become a freak show” (curiously, the title element of the page is “Social media is turning into a freak show”—I think the transformation has already occurred).

Nate Silver is still a Twitter power user, and yet even he acknowledges the increasing uselessness of Twitter for driving traffic to his newsletter or even just providing a forum for thoughtful engagement. I myself abandoned the platform a few years ago, having seen the direction it was heading under Elon Musk. My impression is that the utility of Twitter in most domains is asymptotically approaching zero, with a handful of exceptions (I will occasionally lurk for AI news, as the discussion is still robust, if polluted with a ton of low-quality bot or bot-like replies).

The rest of the social media ecosystem isn’t much better. Bluesky has declining engagement, probably because it has replicated Twitter’s old schoolyard dynamics on steroids. Facebook hasn’t been relevant for years, and I have no idea what it’s even for anymore if not connecting with your friends (I haven’t had an account in many years). Instagram might still be fun, though I have no idea because I’ve never used it. But it’s certainly not a place where “the discourse” happens.

How effective are Amber alerts?

Sun, 05 Apr 2026 08:15:00 -0400

A few weeks ago, I experienced a situation familiar to many Canadians, described in this article from Jonathan Jarry of McGill University’s Office for Science and Society:

On Sunday, March 22nd of this year, a large swath of the population in Quebec was woken up at 4:25 as cell phones lit up and screamed. An Amber alert had been broadcast. Less than four hours later, the two missing children were thankfully found, unharmed, and the alert was cancelled.

Thankfully, my iPhone respects silent mode and only vibrated forcefully, but apparently not all phone brands respect this setting. Unlike in the United States, Amber alerts to cell phones in Canada cannot be disabled.

The statistics regarding child abductions and Amber alerts discussed in this article are equal parts comforting and disconcerting. For example, most children who are the subject of an Amber alert are recovered unharmed:

a study published a decade ago and looking at 448 Amber alerts in the U.S. revealed that over 95% of the children had been recovered alive and nearly 90% recovered alive and without physical harm, sexual abuse, of withholding of needed medical care during the abduction. Even when Amber alerts don’t trigger a helpful tip, the child is usually found.

Other research from the United States indicates the Amber alert plays a part in the recovery about 25% of the time. However, they may be issued too late to prevent the worst outcomes:

How to avoid cognitive surrender to AI

Sun, 29 Mar 2026 22:59:00 -0400

I am sharing a thoughtful article today from Alex Panetta’s A.I. For You on avoiding over-reliance on AI: “cognitive debt”, “epistemic debt”, or “cognitive surrender”.

A particularly interesting nugget regarding the “Your Brain on ChatGPT” article from the MIT Media Lab (yes, that MIT Media Lab):

The paper is even written to get LLMs to read it carefully. The paper carries instructions telling LLMs which section to read first, which appears to be a clever way to force relevant context atop the context window, as LLMs tend to best remember the beginning and end of conversations — not the middle.

Colorado advances ban on algorothmic price and wage discrimination

Fri, 27 Mar 2026 17:48:00 -0400

The Colorado House voted today to ban the use of personal data to algorithmically set the price of a product or determine a wage. The legislation will now advance to the Colorado Senate for consideration. The summary of the bill, HB26-1210, reads:

Surveillance data is defined in the bill as data that is obtained through observation, inference, or surveillance of consumers or workers and that is related to personal characteristics, behaviors, or biometrics of an individual or group. The bill prohibits discrimination against a consumer or worker through the use of automated decision systems used to engage in:

Individualized price setting based on surveillance data regarding a consumer; or

Individualized wage setting based on surveillance data regarding a worker.

Obviously, the bill enumerates exceptions to the above rules, as it is not intended to ban, for example, charging a customer more to deliver an item a longer distance nor to prohibit schemes like discounts for students or seniors. One of the challenges of writing laws like this is to ensure they are written narrowly enough to target dystopian hyper-individualized pricing based on tracking of Internet and phone activity rather than normal business practices like pricing insurance policies according to demographic risk factors.

Colorado is one of at least a dozen American states considering similar bans. I don’t believe any of these proposed broad-based bans have been signed into law yet. I wrote about algorithmic price discrimination (surveillance pricing) last week in the context of proposed legislation in the Canadian province of Manitoba.

How SARS-CoV-2 variants get named on GitHub

Thu, 26 Mar 2026 07:00:00 -0400

Bioinformatics has long been an unusually collaborative and transparent field, with genomes, protein structures, and other complex biological data habitually deposited into open databases during the course of research. The situation was no different at the outset of the COVID-19 pandemic, when a small group of scientists developed the Pango nomenclature for classifying variants of the SARS-CoV-2 virus. Outside of a handful of Greek-letter “variants of concern” names assigned by the World Health Organization, the Pango nomenclature is the standard for tracking the evolution of the SARS-CoV-2 virus. You may recall names such as B.1.1.7 (Alpha or the UK variant), B.1.351 (Beta or the South African variant), and P.1 (Gamma or the Brazilian variant). You can see a complete list of active SARS-CoV-2 lineages using the Pango nomenclature here.

By August 2020, the work of defining new lineages of SARS-CoV-2 had moved to GitHub, where the scientific process could happen in transparent and collaborative way. The definition of new lineages happens on proposals submitted as GitHub issues. In May 2023, a second GitHub repository was opened to move discussions of smaller or less clear lineages out of the main repository. These discussions can be promoted to the main repository, as this issue tracking LP.8.1 sub-lineages was in May 2025.

The work of defining new lineages of SARS-CoV-2 continues to this day on the GitHub repository, as the virus continues to mutate and evolve. And bioinformatics continues to be a shining beacon for open science for the rest of us to learn from.

Prediction markets are coming to Canada

Wed, 25 Mar 2026 21:00:00 -0400

(Archive link to this story)

Wealthsimple is a fintech company at the forefront of a lot of innovation in Canada’s personal finance sector since the company’s founding in 2014. Notably, Wealthsimple was the first broker in Canada to offer zero-commission trades, back in 2019. In 2020, they started offering the ability to trade crypto. In 2025, they launched zero-commission options trading. This year, the company received regulatory approval to bring prediction trading to Canada.

Unlike in other parts of the world, prediction markets have not flourished in Canada and have been considered basically illegal since a 2017 ruling from Canada’s federal securities regulator. Wealthsimple has been able to get around this ruling by only offering contracts on a narrow set of questions:

Despite a 2017 ruling that largely banned these kinds of short-term, yes-or-no contracts, certain regulated firms that are CIRO members are able to offer certain types of “event contracts,” […] The approval for Ontario-based Wealthsimple permits it only to offer contracts tied to economic indicators, financial markets and climate trends, the company confirmed – not sports or elections, which are among the most popular uses of prediction markets in the United States.

Wealthsimple has driven innovation in the Canadian personal finance sector; however, their new product offerings over the last few years seem to be speedrunning the Robinhood trajectory toward high-risk, high-volatility trading and away from their traditional niche of broad, diversified funds/ETFs for ordinary people to set-and-forget. This pivot can be understood as part of a broader trend toward the casinofication of everything, which took off with crypto and the legalization of online sports betting.

Will AI help Canadian police counter a tsunami of fraud?

Tue, 24 Mar 2026 21:48:00 -0400

Zak Vescera, writing for the Investigative Journalism Foundation, observes that fraud cases reported to Canadian police has more than doubled between 2013 and 2024:

At the same time, the number of cases cleared by Canadian police has fallen. In 2013, the ratio between reported cases and cleared cases was about 3:1; by 2024, this ratio was over 9.5:1.

The vast majority of fraud cases go unsolved. This is unsurprising given that many are perpetrated over the Internet by individuals overseas and involve methods of sending money that are difficult to recover, such as crypto, gift cards, and physical transfers of cash.

In response, the National Cybercrime Coordination Centre (NC3) of the RCMP—Canada’s national police service—have built a case management system and data portal they hope will eventually be adopted by all Canadian police forces. According to the article, this system is aimed at improving coordination, data sharing, and analysis. The platform will also host a set of AI tools, though the RCMP is vague on details and which are currently implemented. The article gives a few examples: OCR allowing victims to scan gift cards used in fraud rather than typing numbers manually, a tool to classify reports to help police target their investigative resources, and a report generator to simply data sharing when investigations go international.

Vandalism of OpenStreetMap

Mon, 23 Mar 2026 17:26:00 -0400

OpenStreetMap (OSM) is an open, community-driven map database powering countless apps and services and used by organizations including Amazon, Apple, Microsoft, Uber, Mapbox, and Wikimedia. In short, it is foundational infrastructure for the web. For regions with active communities (particularly in Europe), OSM is often noted for the superiority of its data on features such as cycling routes, hiking trails, and footpaths.

The Wikipedia article for OpenStreetMap documents several instances of data vandalism, which OSM is vulnerable to as a crowdsourced project. Three incidents stood out:

In 2012, Google fired two “rogue contractors” for vandalizing the OSM database, intentionally adding false data such as reversing the direction of one-way streets.
In 2018, a vandal made several viciously antisemitic edits to place names around New York City. While quickly reverted at the source, these changes nonetheless propagated into downstream applications pulling data from MapBox, such as Zillow, Snapchat, Citibike, and Wikipedia.
Users of the mobile game Pokémon GO regularly vandalize the OSM database underlying the game to gain a gameplay advantage, although the authors of the research article on this subject note this vandalism tends to be transitory rather than sustained.

Side note: I was amused to note how strong Google’s regional results bias is for “OSM”—the entire first page is taken up by results related to the Orchestre symphonique de Montréal.

Properly the work of federal public health agencies

Sun, 22 Mar 2026 23:38:00 -0400

One of the reasons I started this blog was to have a place to put down posts and articles that have lodged themselves in my brain. The wind-down announcement of the COVID Tracking Project, a volunteer-led COVID-19 data tracking collaboration, is one such article.

But the work itself—compiling, cleaning, standardizing, and making sense of COVID-19 data from 56 individual states and territories—is properly the work of federal public health agencies. Not only because these efforts are a governmental responsibility—which they are—but because federal teams have access to far more comprehensive data than we do, and can mandate compliance with at least some standards and requirements.

After one year of work, the COVID Tracking Project decided to quite collecting data on COVID-19 in the United States, because they recognized that the work of collecting a comparable, national-level dataset was the responsibility of federal government agencies.

As someone who co-led the COVID-19 Canada Open Data Working Group, which curated COVID-19 data for Canada until the end of 2023, I think about this article a lot. It’s a good read, and it speaks to how essential open data was to filling in the gaps in the national and international understanding of the COVID-19 pandemic.

For map nerds only: An atlas of world history

Sat, 21 Mar 2026 22:39:00 -0400

I am sharing today TimeMap.org: an atlas of regions, rulers, people, and battles throughout history. Thoroughly enjoyable to swipe through, especially for connoisseurs of the map game genre.

^{Hat tip to agilek on Hacker News.}

Fight club at the bird feeder

Fri, 20 Mar 2026 07:00:00 -0400

Alternate title: Blue Jay brutally feeder mogs Tufted Titmouse

From the Cornell Lab of Ornithology, a pretty neat article about dominance hierarchies at the bird feeder using over 7,600 observations collected by citizen scientists contributing to Project Feeder Watch. Essentially, bird watchers reported instances when one bird species successfully displaced another at the bird feeder, and used this network of comparisons to build a dominance hierarchy. By using information contained within the network, you can even compare birds that are rarely observed together. Not all dominance patterns are linear, however, as the article reports:

A separate analysis uncovered some dominance triangles in which three birds had one-to-one relationships independent of each other, like a game of birdy rock-paper-scissors. For example, the House Finch dominates the Purple Finch, and the Purple Finch dominates the Dark-eyed Junco, but the junco dominates House Finch.

The full paper is here: Fighting over food unites the birds of North America in a continental dominance hierarchy.

This work is reminiscent of network meta-analysis, in which three or more interventions (e.g., drugs) are compared using both direct and indirect evidence. For example, if there are studies comparing drug A versus drug B and drug B versus drug C, we can infer the comparison between drug A and drug C, even if no study has ever directly compared them.

Make buses faster and more reliable by having fewer stops

Thu, 19 Mar 2026 07:30:00 -0400

This fascinating article by Nithin Vejendla in Works in Progress makes the case that bus networks would benefit from bus stop balancing: having fewer stops spaced further apart. This is especially true in the United States where stops tend to be only 700–800 feet (roughly 210–240 metres) apart. While having many bus stops theoretically improves access to the transit network, it also means that buses are slower (more time is spent accelerating, decelerating, and loading/unloading passengers) and less frequent, which reduces where you can actually go in a fixed amount of time, as well increasing the variability in the time it takes to get there.

The biggest problem holding back public transit in North America is that it is unreliable, and bus stop balancing is a rare policy solution that offers improved service without having to spend more. With fewer stops, the same number of buses can complete the same route faster and with greater frequency. This stops a single missed or delayed bus from ruining your plans or forcing you to build in extra time.

A research study from my city of Montreal even gets a shout out. As a big public transit user, I avoid buses where possible in favour of the metro and walking, because these modes of transportation tend to be much more reliable and less variable when it comes to the question of “how long will it take for me to get from point A to point B”. Stop balancing could go a long way toward addressing one of the main complaints about public transit: too many routes are not frequent or reliable enough to let riders stop worrying about the schedule.

Manitoba introduces bill to ban algorithmic price discrimination

Wed, 18 Mar 2026 07:30:00 -0400

The Canadian province of Manitoba has introduced a bill to ban algorithmic price discrimination (also known as surveillance pricing), i.e., the use of personal data to set prices for individual consumers:

New Democrats announced in December they would begin cracking down on what’s known as differential or predatory pricing. That is when retailers charge different amounts for the same products based on the timing of customer purchases, where they live or other personal data. […] The proposed legislation would render the use of “personalized algorithmic pricing,” both online or in store, an unfair business practice.

Okay, I guess there’s a lot of different names for this particular practice. Whatever we call it, I believe bills cracking down on algorithmic price discrimination will be very popular, as it constitutes a very clear example of companies using our data against us to rip us off. The most famous recent exposé of this practice is Groundwork Collaborative’s report on how grocery delivery service Instacart charges users different prices depending on who they are.

Manitoba isn’t the only jurisdiction introducing bills targeting this practice, but I don’t believe anywhere in the US or Canada has actually managed to ban it yet. However, New York has made in mandatory for companies to disclose when they use personal data to set prices.

Prediction markets incentivize bad behaviour

Tue, 17 Mar 2026 18:19:00 -0400

The Times of Israel journalist Emanuel Fabian is claiming that Polymarket gamblers (sorry, “traders”) have threatened his life over a report he released about an Iranian missile attack on Israel on March 10. According to the rules, this bet resolves as true if Iran strikes Israel using a drone, a missile, or an air strike on this date. At issue here is this specific rule:

Missiles or drones that are intercepted and surface-to-air missile strikes will not be sufficient for a “Yes” resolution, regardless of whether they land on Israeli territory or cause damage.

On March 10, Fabian reported a single missile had hit an open area outside the Israeli city of Beit Shemesh; he included in the report a video of the strike. This would resolve the bet as “Yes”. Evidently, holders of “No” shares would very much like him to change his report to say that the missile was intercepted, which would resolve the bet as “No”, according referenced above. This bet has seen more than 23 million USD in trading volume.

If you look at the vitriol in the comments of the bet on Polymarket, I have no trouble believing people would send threats to a journalist demanding him to change his story, whether out of desperation to change their fortunes or just in an attempt to be edgy.

Some insight into writing a book using Quarto

Mon, 16 Mar 2026 20:48:00 -0400

Prof. Kieran Healy (Sociology, Yale University) shares some nice insight into the process of writing a book in Quarto using R in this post. The output screenshots he shares look beautiful, and the idea of deploying the same content as a clean PDF and a responsive website is awesome. A full draft of the book, Data Visualization: A Practical Introduction (Second Edition), is available as a website here.

I have grown increasingly tired of writing in any format other than a plain text file I can easily version control and move around, so the idea of writing a book in Quarto is appealing to me (as long as it has enough technical content to justify the format).

Using Claude Claude for cross-package statistical audits

Sun, 15 Mar 2026 22:49:00 -0400

Economist Scott Cunningham shared an important example of why we should always report the statistical package and version used in our analyses, as he used Claude Code to produce six versions of the exact same analysis using six different packages in R, Python, and Stata. In a difference-in-differences analysis of the mental health hospital closures on homicide using the standard Callaway and Sant’Anna estimator (for DiD with multiple time periods), he got very different results for some model specifications.

Since the specifications and the data were identical between packages, he discovered the divergences occurred due to how the packages handled problems with propensity score weights. Packages were not necessarily transparent about issues with these weights. If you were not running multiple analyses and comparing results across packages, or else carefully checking propensity score diagnostics, you might never have realized how precarious your results were.

Prof. Cunningham closes with the following advice:

The fifth point, and the broader point, is that this kind of cross-package, cross-language audit is exactly what Claude Code should be used for. Why? Because this is a task that is time-intensive, high-value, and brutally easy to get wrong. But just one mismatched diagnostic across languages invalidates the entire comparison, even something as simple as sample size values differing across specifications, would flag it. This is both easy and not easy — but it is not the work humans should be doing by hand given how easy it would be to even get that much wrong.

Getting citizenship just got a lot harder for those of Italian descent

Sat, 14 Mar 2026 22:18:00 -0400

Many people in the Americas would probably be surprised to learn that, in much of the rest of the world, being born in a country does not by itself make you a citizen. In most of the Americas, citizenship is automatically granted on the basis of jus soli (“right of soil”): birth on the territory. Elsewhere, citizenship is more often based on jus sanguinis (“right of blood”): descent. This is the case in most of the EU.

Citizenship in an EU country is considered unusually desirable because of the mobility rights and powerful passport it confers. However, the rules concerning exactly what kind of descent confers citizenship varies widely among member states. Italy used to be considered among the easiest, requiring only that an applicant prove they had an Italian ancestor alive after March 17, 1861, when the Kingdom of Italy was founded. That changed last year, when the country passed a new law significantly tightening the requirements for citizenship, which was recently upheld by the country’s Constitutional Court. The new law brings requirements more in line with norm among EU member states:

Now, only individuals with at least one parent or grandparent born in Italy will automatically qualify for citizenship by descent. The amended law does not affect the 60,000 applications currently pending review. Additionally, dual nationals risk losing their Italian citizenship if they “don’t engage” by paying taxes, voting or renewing their passports.

geoBoundaries: An open database of political administrative boundaries

Fri, 13 Mar 2026 17:05:00 -0400

Today I discovered geoBoundaries, a CC BY 4.0-licensed database of political administrative boundaries covering the entire world. It is notable for its high level of detail, going from ADM0 (country), ADM1 (states/provinces), ADM2 (counties/departments or municipalities), to ADM3 (municipalities or sub-municipalities) for many countries. My go-to source for world map files is Natural Earth, which is limited to ADM0 and ADM1 but is in the public domain. Natural Earth also includes some physical geography like water and bathymetry, while geoBoundaries is focused solely on political administrative boundaries. Both datasets deal with disputed boundaries, which is an endless source of tension in the Natural Earth GitHub.

An R package for retrieving data from geoBoundaries, geobounds, was released in February. A similar package for Natural Earth, rnaturalearth, has long been maintained by rOpenSci.

Open banking comes to Canada

Thu, 12 Mar 2026 22:03:00 -0400

Canada’s banking sector is legendarily stable. However, this stability comes at the cost of innovation. Canada lags behind peers such as the EU, UK, US, and Australia in an area I care a lot about: open banking.

The premise of open banking is that consumers should be free to share their financial data with the third parties of their choosing, such as a budgeting app.. I have been following open banking in Canada for years now, ever since I started closing tracking my own finances. For a long time, I have been looking for a better way to export transactions than logging into my bank’s website and manually downloading a CSV file representing a certain time range.

Over the years, people have tried to solve this problem by writing third-party packages to retrieve data from specific banks. However, these packages were fragile and prone to breaking, and they usually relied on you providing your full account credentials, granting them to ability to impersonate a login to your account. Shockingly, this is actually the default security model for Canadian fintech companies: even a humble budget app must be given your username, password, and (implicitly) the ability to take any action on your behalf. Needless to say, this is at best a grey zone for liability, since you are willingly handing over the keys to the kingdom to a third party.

The other half of the ATM–bank teller story

Wed, 11 Mar 2026 23:49:00 -0400

David Oks had a great post yesterday on the classic parable of how the adoption of ATMs did not lead to the predicted job losses among bank tellers. In fact, the opposite occurred: the number of bank tellers rose. I heard this story recounted several times in early discussions I had about the anticipated effect of AI on labour. I think I first heard it from Ryan Khurana. More recently it has been trotted out by US Vice President JD Vance.

The problem with this story is that the key statistic quoted alongside it, namely that there are more bank tellers than ever before, is no longer true. The famous graph supporting this assertion stops in 2010, and with good reason: the number of bank tellers has sharply fallen since then.

I think I had come across this fact before, this second half of the famous ATM–bank teller story, but it wasn’t until I read David Oks’s post that I understood the reason behind it. Quite simply, mobile banking ate physical banks. The ATM didn’t reduce the demand for bank tellers because it simply changed the kind of labour they did inside the bank. The iPhone made it so we didn’t need to go to the bank at all. It changed the paradigm. Explained this way, it seems obvious. Many new banks (including my own) do not have physical locations and never did.

What will the paper of the future look like?

Tue, 10 Mar 2026 23:48:00 -0400

I am sharing today a short blog post by the Institute for Replication: “What will the paper of the future look like?”

In short: research looking more like software development (as presaged by Prof. Richard McElreath, author of the excellent Statistical Rethinking), with the ability to reuse common material, formalize results, and remix analyses built into the pipeline.

Changes in acetaminophen use after the White House Tylenol briefing

Mon, 09 Mar 2026 18:17:00 -0400

Back in September 2025, US President Donald Trump and Health and Human Services Secretary Robert F. Kennedy, Jr. held a White House briefing linking Tylenol (acetaminophen, or paracetamol to Europeans) use in pregnancy to autism. A new study in The Lancet looks at what happened to acetaminophen prescriptions during emergency room encounters for pregnant females aged 15–44. They used data from a large database covering over 1,633 hospitals and 37,000 clinics.

Here is panel A from the figure in the study, with the vertical dashed line marking the date of the White House briefing (September 22, 2025) and the other dashed lined showing the expected prescribing rates (compared to the observed ones).

Canada exports a lot of coal, but not for power generation

Sun, 08 Mar 2026 14:05:00 -0400

This provocatively titled piece in the The Hub (“Why the world needs even more Canadian coal”) made me realize I know very little about one of Canada’s most important exports: coal.

Coal is often villainized because it is incredibly dirty way of generating power. I vaguely recall an article from maybe 20 years ago claiming something along the lines of “if everyone in Canada replaced their incandescent bulbs with energy-efficient ones, the greenhouse gas savings would be cancelled out by a single coal plant that China is building every [some shockingly short amount of time]”. Although, China’s dependence on coal for power has been falling for the past two decades.

It turns out LLM-assisted search is fantastic for finding these half-remembered quotes. Here is the exact article and quote I was remembering, from a 2008 Macleans magazine article (I was pretty close):

Even if every household in the U.S. screwed in an energy-efficient light bulb today, the savings in greenhouse gas emissions would be wiped out by fewer than two medium-sized coal plants - the kind of plant that is being built in China at a rate of one a week.

But coal is also used to make most of the world’s steel (“metallurgical coal”), and this is the kind of coal that Canada (or specifically, British Columbia) overwhelmingly exports. The article goes on to claim that Canada’s production of metallurgical coal is among the cleanest (by greenhouse gas emissions) in the world.

Open By Default: A database of access to information requests to the Canadian government

Sat, 07 Mar 2026 14:32:00 -0500

In Canada, any person or corporation in the country can make a request for general records to any agency of the federal government through the Access to Information Act (the equivalent in the United States is the Freedom of Information Act). The government provides a searchable database of completed requests, but includes only a summary of the request and the number of pages of responsive material. The actual documents turned over are not included. However, completed request packages may be informally re-requested, and should you do so, someone from the relevant agency will (usually) send them to you eventually.

This re-request process has its limits. It can takes weeks or months for the documents to be sent, and the database itself only goes back to January 2020 (they used to delete records older than two years, but stopped doing this some time after 2020). Occasionally, they will never send the documents at all, and all you can do is either re-request them again or open a formal access to information request (which will cost you $5).

Making it easier to access completed access to information requests is why the Investigative Journalism Foundation built Open By Default, “the biggest database of internal government documents never before made publicly accessible”. It includes documents from completed access to information requests obtained using both automated (presumably the re-request form) and manual processes (donations from trusted partners, particularly of documents from before the online re-request form was available). The files are cleaned and OCRed into one beautiful, searchable database.

The surprising whimsy of the Time Zone Database

Fri, 06 Mar 2026 21:07:00 -0500

Time zones are hard. As a well-known Computerphile video so eloquently puts it:

What you learn after dealing with time zones, is that what you do is you put away your code, you don’t try and write anything to deal with this. You look at the people who have been there before you. You look at the first people, the people who have dealt with this before, the people who have built the spaghetti code, and you thank them very much for making it open source, and you give them credit, and you take what they have made and you put it in your program, and you never ever look at it again. Because that way lies madness.

The Canadian province of British Columbia recently decided to switch to permanent daylight time. I wanted to see if this update made it to the IANA Time Zone Database yet. Luckily, we can now view updates to this database as commits on GitHub. And there it was in the news file!

I’ve perused the tz repository before, and I always learn something interesting. For example, during WWII Britain adopted double summer time, adding two hours to the clock in the summer and one hour in the winter. The bulk of the comments in the database are dedicated to documenting this extensive history of time zone changes across the world.

Homeownership rate doesn't mean what you think it does

Wed, 04 Mar 2026 20:15:00 -0500

This thread from demographer Lyman Stone on the definition of the US homeownership rate has stuck in my head for a couple of years now. Reading it produced a pretty profound “oh” for why this particular metric didn’t line up with my perception of the issue.

To put it simply, the definition of the homeownership rate is:

Take the number of households where the home is owned by the household head, divide by the total number of households.

The homeownership rate is based on households, not individuals. If an adult child lives with their parents (and their parents own their own home), they are counted as “homeowners” for the purpose of the homeownership rate. If more and more people in their 20s and their 30s move in with their parents (or never move out in the first place) rather than renting an apartment, this has the effect of increasing the homeownership rate, because you have reduced the denominator (number of households) without changing the numerator (number of owner-occupied households).

Canada uses the same definition:

The homeownership rate refers to the proportion of all households that are owner occupied.

The productivity shock coming to academic publishing

Tue, 03 Mar 2026 19:33:00 -0500

Today, I wanted to share this piece from economist Scott Cunningham (Baylor University), who wrote about how AI is widening the gap between research and publishing. Or, in economics terms (emphasis mine):

But what happens when the same productivity shock hits a system where the bottleneck was never really production in the first place, but rather was a hierarchical journal structure that depended immensely on editor time, skill, discretion and voluntary workers with the same talents called referees for screening quality deemed sufficient for publication?

The post mentions the Autonomous Policy Evaluation project—the end-to-end AI paper pipeline I wrote about a few weeks ago—and discusses the likely consequences of this flood of AI-generated papers. Assuming the number of publication slots in reputable journals is relatively fixed, AI-generated papers should add a very large amount of mass to the left side of the paper quality distribution. Acceptance rates will plummet and journals may rely on other signals of quality (name recognition, pedigree, institution) to thin the herd before actually reviewing content. As always, the rich get richer!

But this is imperfect, not to mention unfair, and so desk rejection gets noisier: good papers get killed by tired editors and marginally lower quality papers slip through to referees. It’s a cascading failure: volume breaks editors, broken editing wastes referees, wasted referees slow science.

Will you peruse this post?

Sat, 28 Feb 2026 13:38:00 -0500

I learned a new word today: contronym. It means a word whose definitions contradict each other. The example, thanks to a random Silicon Valley clip, is “peruse”. I’ve always used this word synonymously with “skim”, but Merriam-Webster presents two contradictory definitions:

to examine or consider with attention and in detail
to look over or through in a casual or cursory manner

I think I was vaguely aware of this definitional confusion, but only today did I learn that there was a term for this category of words.

Another one that annoys me is “sanction”…to sanction a behaviour can either mean to endorse it or to punish it…not helpful!

Agentic engineering patterns

Wed, 25 Feb 2026 16:15:00 -0500

Simon Willison is building a library of posts covering best practices for using agentic coding tools like Claude Code and OpenAI’s Codex. The existing articles cover test-driven development (red/green—ensure tests fail before the change and succeed after it) and AI-assisted code walkthroughs.

Comparing the Claw-like agent ecosystem

Tue, 24 Feb 2026 22:44:00 -0500

Chrys Bader has created ClawCharts to track the popularity and growth of OpenClaw and its growing number of competitors.

I have an unused Raspberry Pi 4 4GB that I’ve been meaning to test one of these Claw-like personal agents on (locked down to prevent the security nightmare scenarios we’ve seen play out since OpenClaw took off).

OpenClaw is a bit of a resource hog (which is why so many people are running out to buy Mac Minis), so I’ve been looking at the list of lightweight competitors. There is no obvious reason to prefer one over the other, so I’ll probably go with the fast-growing ZeroClaw.

ZeroClaw offers OAuth connectors for OpenAI and Anthropic subscription plans, but presently neither company is clear on whether this usage is permissible or not. Anthropic recently blew up the OpenClaw community by updating their docs to specifically ban using OAuth outside of Claude Code. An Anthropic employee partially walked this back on Twitter, but there is still no clear statement whether this use case is permitted. Regarding the use of OAuth from OpenAI for OpenClaw (specifically, GPT Codex), Peter Steinberger, creator of OpenClaw, stated on Twitter: “that already works, OAI publicly said that”. No one can seem to find this public statement, but it’s worth noting that Steinberger himself is now an OpenAI employee. So, will you get banned for using your ChatGPT Plus/Pro or Claude Pro/Max subscriptions with OpenClaw? Nobody knows.

LLMs automate the erosion of online anonymity

Mon, 23 Feb 2026 22:37:00 -0500

Economist Florian Ederer linked a new preprint describing the creation of an automated LLM-based pipeline for linking anonymous users across datasets based on unstructured text written by or about them. Prof Ederer is himself famous for unmasking the IP addresses of users of the infamous (but influential) Economics Job Market Rumors message board, exploiting a flaw in how usernames were assigned to anonymous posters. For platforms not encoding a user’s IP address in their “anonymous” username, the LLM-based approach involves:

Extracting structured features from free text
Encoding extracted features to embeddings to compare to candidate profiles
Reasoning using all available context to identify the most likely match among top candidates
Calibrate the quality of match by asking the LLM to report confidence

I guess it’s only a matter of time before someone uses this strategy to unmask Reviewer 2. (Currently this is only possible if Reviewer 2 insists you cite all of the work of the brilliant Dr. X.)

Oral texts

Sun, 22 Feb 2026 13:18:00 -0500

A major intellectual current in the post-social media age is the rediscovery of media theorists like Marshall McLuhan, Walter Ong, and Neil Postman, whose works seem incredibly prescient in the age of the Internet and the instantaneous and omnipresent mass communication it enables.

A particular sub-current of this trend is the return to orality, a culture rooted in the spoken rather than written word. Indeed, the vast majority of human history is defined by oral culture, and the world’s brief sojourn to the written tradition may have finally ended thanks to the Internet.

One of the most impressive projects to come out of this domain is Havelock.AI, a tool created by journalist Joe Weisenthal and entirely vibe coded with Claude. The tool analyzes text to give an “orality score” with supporting analysis. For example, qualified assertions are considered literate, whereas categorical statements are considered oral. The tool defines 68 oral/literate markers based on the framework of Walter Ong. It really is an impressive tool that I recommend checking out.

I plugged a few of my old articles into the tool and apparently my writing is very much rooted in the written tradition! (This post also scores as strongly literate.)

The increasingly inevitable social media ban for kids

Fri, 20 Feb 2026 23:57:00 -0500

Jon Haidt writes on his Substack about the increasingly popular movement to ban social media for kids, following the implementation of Australia’s under-16 social media ban a few months ago.

A brief history of chocolate in the army

Thu, 19 Feb 2026 18:11:00 -0500

I’m almost a week late, but I enjoyed this Valentine’s themed article from Joe Schwarcz of McGill University’s Office for Science and Society giving a brief history of the use of chocolate in the army.

It turns out M&Ms were first sold to the U.S. Army during World War II. Canadians will of course be familiar with Smarties, a similar candy that was invented first.

Democratizing voice cloning scams

Wed, 18 Feb 2026 22:26:00 -0500

Jamie Pine has launched Voicebox, a new voice cloning studio built upon the open weight Qwen3-TTS model. The project is positioned as a free, local alternative to the well-known ElevenLabs voice generator. A short demo video is available.

Obviously, there are legitimate uses for voice cloning technology. But in practice, this will be used to enable AI impersonation scams and spam on a massive scale. The GitHub page for this release isn’t exactly encouraging on this front. Demo screenshots show voice clones of YouTuber Linus Tech Tips, Minecraft creator Markus “Notch” Persson, and deceased streamer twomad.

Make sure you have a secret passphrase set up with your family, since your voice is no longer uniquely your own.

Don't let AI do your thinking for you

Tue, 17 Feb 2026 21:11:00 -0500

Here’s a thought-provoking article from Harry Law on “The last temptation of Claude”—the urge to outsource all of your thinking to AI (and remember, writing is thinking).

A common theme in the AI commentary I’ve been reading lately is the growing importance of taste. AI is sending the cost of creating “content” (articles, analyses, video, etc.) to zero, even as the attention to consume it all remains fixed. If we want to keep living in a world where AI serves us, we need—more than ever—the discernment to choose the questions worth asking.

As I put it in my Globe and Mail op-ed on AI and journalism a few years ago:

AI won’t replace the sort of journalism that holds power accountable, but it could certainly enhance it. After all, you can teach a machine to spot patterns, but you can’t force it to care about your community.

In the multiverse of forking paths

Mon, 16 Feb 2026 22:49:00 -0500

STRANGE: I went forward in time to view alternate modelling decisions, to see all the possible outcomes of the coming analysis.
STAR-LORD: How many did you see?
STRANGE: 14,000,605.
STARK: How many did we achieve statistical significance?
STRANGE: One.

Prof. Jessica Hullman recently wrote a piece on Andrew Gelman’s blog discussing the use of ‘multiverse analysis’, i.e., what if we could see the results of the many slightly different decisions we could have made when constructing a model. This problem is commonly known as the garden of forking paths—during an analysis, a researcher is forced to make many small, sometimes arbitrary decisions that can lead to a different result if another researcher tries to independently replicate the analysis. While usually an innocent and inevitable part of the modelling process, these ‘researcher degrees of freedom’ can also be manipulated to produce a desired result.

Prof. Hullman points out that multiverse analysis will only become salient as AI coding tools such as Claude Code make it easier than ever to iterate on how we model our research questions.

Her longer paper with Julia M. Rohrer and Andrew Gelman, “What’s a multiverse good for anyway?” is available here.

Regulatory uncertainty threatens biotech innovation

Sun, 15 Feb 2026 22:32:00 -0500

Another post from the Clinical Trials Abundance blog, this time by Ruxandra Teslo, on how the recent refusal-to-file by the US FDA for Moderna’s new mRNA influenza vaccine increases regulatory uncertainty and threatens innovation across the entire biotechnology sector. The decision reportedly came after the country’s top vaccine regulator, Dr. Vinay Prasad, overruled career staff to quash Moderna’s application. This is just one more blow against mRNA vaccine technology to come from Health and Human services, the US federal health agency led by the world’s most prominent antivaxxer, Robert F. Kennedy Jr.

US Medicaid data gets DOGE'd

Sat, 14 Feb 2026 10:29:00 -0500

The US Health and Human Services DOGE team (I guess DOGE still exists in some form) just released a new aggregated, provider-level Medicaid claims database covering January 2018 through December 2024. With this dataset, you can track the monthly claims for each procedure (by HCPCS Code) and provider over time.

Even if the framing around this dataset’s release is partisan—tied to allegations of Medicaid fraud in Minnesota—it is a genuine advance in transparency for the US’s third largest spending program. No doubt this accomplishment required a lot of work on the backend to harmonize countless fragmented datasets into one tidy schema. These data were difficult to access before, and now they are free for anyone to use. Journalists, policy researchers, and companies working in the US healthcare sector will benefit the most, but every taxpayer benefits from added transparency about where their tax dollars go.

I would say there is the potential for these data to be misused to spark witch hunts, but this is more or less the stated purpose for this data release. Per Elon Musk: “Medicaid data has been open sourced, so the level of fraud is easy to identify.” If you go on Twitter, you will find several people have already plugged in the dataset to Claude Code and trumpeted their ASCII tables of providers flagged for potential fraud. Inevitably, some of these providers targeted by public scrutiny for their unusual billing patterns will have perfectly innocent explanations. But if ProPublica is excited about the release of this new dataset, then so am I.

More on vibe researching

Fri, 13 Feb 2026 23:49:00 -0500

To follow on yesterday’s post on AI-produced research, here is a reflection on “vibe researching” from Prof. Joshua Gans of the University of Toronto’s Rotman School of Management. Since the release of the first “reasoning” models in late 2024, he has gone all in on experimenting with AI-first research.

One of the key takeaways is that he found himself pursuing low quality ideas to completion more often, precisely because the cost of choosing to continue to pursue a questionable idea has been lowered. Sycophancy is a problem, too. With an AI cheerleader, it is easy to convince yourself you have a result when you do not.

Those ideas were all fine but not high quality, and what is worse, I didn’t realise that they weren’t that significant until external referees said so. I didn’t realise it because they were reasonably hard to do, and I was happy to have solved them.

I will note that (human) peer reviewers cannot be the levee that stops the flood of middling AI research: the system of uncompensated labour that undergirds all of academic publishing is already strained to bursting, as every editor desperate to find referees for a paper will tell you.

Prof. Gans concludes his year-long experiment in “vibe researching” was a failure, despite publishing many working papers and publishing a handful of them:

An end-to-end AI pipeline for policy evaluation papers

Thu, 12 Feb 2026 19:11:00 -0500

Prof. David Yanagizawa-Drott from the Social Catalyst Lab at the University of Zurich has launched Project APE (Autonomous Policy Evaluation), an end-to-end AI pipeline to generate policy evaluation papers. The vast majority of policies around the world are never rigorously evaluated, so it would certainly be useful if we were able to do so in an automated fashion.

Claude Code is the heart of the project, but other models are used to review the outputs and provide journal-style referee reports. All the coding is done in R (though Python is called in some scripts). Currently, judging is done by Gemini 3 Flash to compare against published research in top economics journals:

Blind comparison: An LLM judge compares two papers without knowing which is AI-generated Position swapping: Each pair is judged twice with paper order swapped to control for bias TrueSkill ratings: Papers accumulate skill ratings that update after each match

The project’s home page lists the AI’s current “win rate” at 3.5% in head-to-head matchups against human-written papers.

Prof. Yanagizawa-Drott says “Currently it requires at a minimum some initial human input for each paper,” although he does not specify exactly what. If we look at initialization.json that can be found in each paper’s directory, we see the following questions with user-provided inputs:

Policy domain: What policy area interests you?

Method: Which identification method?

Data era: Modern or historical data?

API keys: Did you configure data API keys?

External review: Include external model reviews?

Risk appetite: Exploration vs exploitation?

Other preferences: Any other preferences or constraints?

The code, reviews, manuscript, and even the results of the initial idea generation process are all available on GitHub. Their immediate goal is to generate a sample of 1,000 papers and run human evaluations on them (at time of posting, there are 264 papers in the GitHub repository).

There is only one statistical test

Wed, 11 Feb 2026 23:58:00 -0500

A classic article by computer scientist Allen Downey on why there is only one statistical test: compute a test statistic from your observed data, simulate a null hypothesis, and finally compute/approximate a p-value by calculating the fraction of test statistics from the simulated data exceeding the test statistic from your observed data.

Downey suggests using general simulation methods over the canon of rigid, inflexible tests invented when computation was difficult and expensive.

^{Hat tip to Ryan Briggs on Twitter.}

The case for sharing clinical trial data

Tue, 10 Feb 2026 19:39:00 -0500

Saloni Dattani of the excellent Works in Progress magazine (and formerly of Our World in Data) launched a new Substack today called The Clinical Trials Abundance blog. The first post is on the case for sharing clinical trial data. We have been gradually moving toward mandatory reporting of clinical trial results (though enforcement is another question), but sharing data would be one step further. Even though clinical trials rely on the trust (and often money) of the public, it can be very difficult to gain access to the raw results, even if journal article authors claim they are “available upon request”. A norm of clinical trial data sharing would not only increase the confidence in published results but also aid future drug development, reduce expensive redundancy, and improve meta-analyses (which are often forced to rely on heterogeneous summary measures).

Why a Canadian news site just launched an AI publishing tool

Mon, 09 Feb 2026 19:49:00 -0500

It’s no secret that Canadian journalism (like journalism everywhere) is in trouble. Newsrooms face a steady stream of layoffs despite a couple hundred million Canadian dollars of direct and indirect government subsidies every year. The vast majority of outlets eligible for these subsidies take advantage of them, and combined they can subsidize half of a journalist’s salary. News organizations are desperate to diversify their revenue streams.

The Hub is a right-leaning publication launched in 2021 with a focus on policy and politics. Notably, the outlet declines or donates their subsidies, citing a valid concern that the scale of such subsidies threaten the perceived trustworthiness and independence of the media.

In late January 2026, The Hub launched NewsBox, an AI-powered publishing tool. NewsBox aims to make it easier for creators to transform their content (written, audio, or video) into other formats, such as speeches, essays, or talking points, while maintaining the author’s distinct voice. You can see examples of the tool’s output on new articles in The Hub, each of which is accompanied by an AI-generated summary and list of quotes at the top of the page. There is also a “Hub AI” chatbot in the sidebar of every article.

The app very much uses The Hub’s branding, prominently featuring the outlet’s co-creators, who also created NewsBox. While their pitch talks about preserving creators’ voices to avoid the “soulless prose” and “slop” outputted by ChatGPT and similar tools, I have to wonder if tighter integration of AI into the news and opinion side of the operation will raise its own issues with trust. The Hub has always been fairly tech-friendly, including a longstanding sponsorship by Meta.

How do you regain access to your computer if you lose your memory?

Sat, 07 Feb 2026 22:05:00 -0500

I read this interesting discussion this morning on Hacker News on the question of how to regain access to your computer if you lose your memory. As always, it starts with figuring out your threat model and responding accordingly.

Anthropic's statistical analysis skill doesn't get statistical significance quite right

Fri, 06 Feb 2026 19:30:00 -0500

Anthropic’s new statistical analysis skill demonstrates a common misunderstanding of statistical significance:

Statistical significance means the difference is unlikely due to chance.

But this phrasing isn’t quite right. The p-value in Null Hypothesis Significance Testing is not about the probability the results are “due to chance”; it is the probability—under the null hypothesis and the model assumptions—of observing results at least as extreme as the ones we obtained. In other words, the p-value summarizes how compatible the data are with the null, given our modelling choices. What it does not tell you is the probability that the null hypothesis is true.

Statistician Andrew Gelman gave a good definition for statistical significance in a 2015 blog post:

A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed or greater, under the null hypothesis (which is commonly the hypothesis that there is no effect). Thus, the smaller the p-value, the less consistent are the data with the null hypothesis under this measure.

As some of the commenters in this blog post observe, simply being able to parrot a technically accurate definition of a p-value does not necessarily make us better at applying statistical significance in practice. It is certainly true that statistical significance is widely misused in scientific publishing as a threshold to distinguish signal from noise (or to be fancy, a “lexicographic decision rule”), which is why some scientists have argued that we should abandon it as the default statistical paradigm for research.

The CIA World Factbook has been memory holed

Thu, 05 Feb 2026 16:37:00 -0500

Another staple of my childhood is gone, this time the CIA’s World Factbook. I have fond memories of consulting the World Factbook for school projects in my elementary school computer lab. But as of yesterday, the entire publication along with all of its archives have been suddenly and unceremoniously wiped from the agency’s website. At least archives of the website are still available on the Internet Archive, with complete zip files up to 2020 and Wayback Machine snapshots thereafter.

Guinea worm one step closer to eradication

Wed, 04 Feb 2026 23:57:00 -0500

Only 10 cases of guinea worm were reported in 2025, down from an estimated 3.5 million cases per year when the elimination campaign began four decades ago. The disease is an ancient one, believed by some to be the “fiery serpents” that beset the ancient Israelites in The Book of Numbers. It is treated by carefully wrapping the parasite around a small stick as it painfully emerges over the course of weeks. This may be the inspiration for the Staff of Asclepius (⚕), the predominant symbol of medicine showing a a serpent wrapped around a rod.

When I was studying mathematical modelling of infectious diseases at the University of Ottawa in the mid 2010s, the question was whether Jimmy Carter would outlive the guinea worm. Tragically, he did not, but his life’s work helped to prevent an estimated 100 million cases of the disabling disease and made him a hero in global health.

While we are within spitting distance of zero cases in humans, true eradication will be more difficult due to significant animal reservoirs of the disease. The press release notes nearly 700 reported cases in animals across six countries (and who knows how many unreported cases). These non-human reservoirs pose a significant barrier to true eradication, since the disease must die out not only in human populations but also in wildlife.

msgvault: A personal email archive and search system to watch

Tue, 03 Feb 2026 08:00:00 -0500

Here’s a new project to watch if you are interested in taking control of your email: msgvault. The tool provides a local, searchable version of all of your Gmail messages and attachments, backed by SQLite and DuckDB.

The author, Wes McKinney, says he may add support for other email services in the future, as well as WhatsApp, iMessage, and SMS. I’ll probably look into it for myself once the project matures a little. Although given that it stores everything in a single giant database file, it won’t fit into my standard backup strategy of versioned, incremental backups. Still, it could be a nice step forward in regaining control over my email archives.

^{Hat tip to j4mie on HackerNews.}

The Divergent Association Task, a measure for creativity

Mon, 02 Feb 2026 12:14:00 -0500

The Divergent Association Task is a short, simple test introduced in 2021 claiming to measure creativity. Taking only a minute and a half, it asks participants to “generate 10 nouns that are as different from each other as possible in all meanings and uses of the words”.

Although the instructions say to “avoid specialized vocabulary (e.g., no technical terms)”, I imagine you might score higher if you’ve just finished cramming wordlists for the GRE. Researchers have used this test to compare human and AI creativity (though the use of GPT-4 in this article with a January 2026 publication date speaks to the incompatibility of AI research with traditional publication timelines).

A/B testing for advertising is not randomized

Sun, 01 Feb 2026 23:09:00 -0500

Florian Teschner writes about a recent paper from Bögershausen, Oertzen, & Bock arguing that online ad platforms like Facebook and Google misrepresent the meaning of “A/B testing” for ad campaigns. In A/B testing, we might assume the platform is randomly assigning users to see ad A or ad B, in an attempt to get a clean causal interpretation about which ad is more likely to drive a click (or whatever outcome you’re tracking).

But according to the paper, this is usually not what is happening. Instead, the platform optimizes delivery for each ad independently, steering each one toward the users most likely to click it. In other words, the two ads may be shown to different groups of users, and differences in click-through rates may be attributable to who is seeing the ad, as opposed to the overall appeal of the ad. Ad platforms convert A/B tests from simple randomized experiments into murky observational comparisons. For example, an ad may appear to do better because it happened to be shown disproportionately to a group with a high click-through rate, not because it presents a more compelling overall message. Advertisers get the warm glow of “experimentally backed” marketing without the assurances of randomization.

Total electoral wipeout

Sat, 31 Jan 2026 13:01:00 -0500

The 2002 Turkish general election is the canonical example of total electoral wipeout. Every party holding seats in the previous legislature was completely wiped out. Of the two parties that won seats in the 2002 election, the one that formed government didn’t even exist at the time of the previous election (current president Erdoğan’s AK Party, formed in 2001). Of note, it wasn’t a complete changing of the guard: one of the three independent members from the 1999 parliament won his seat again in 2002 (Mehmet Ağar), though it seems he took over as leader of one of the wiped-out parties shortly after the election.

_{Hat tip to kynakwado2 on Twitter.}

Twyman's law

Fri, 30 Jan 2026 19:25:00 -0500

From Wikipedia:

Twyman’s law states that “Any figure that looks interesting or different is usually wrong”

A bit different from that oft-quoted line attributed to Isaac Asimov:

The most exciting phrase in science is not ‘Eureka!’ but ‘that’s funny’

But Twyman’s law is much truer in my experience. Surprising results are usually a signal that something is screwy with my data, my assumptions, or my pipeline.

_{Hat tip to DJ Rich on Twitter.}

Remember that a lot of numbers are fake

Thu, 29 Jan 2026 23:20:00 -0500

David Oks wrote an essay reminding us that in many countries, even the most basic statistic—the population—is often shockingly uncertain or even outright fabricated. It’s a good reminder that many of the numbers we rely on for international comparisons, like crime rates and economic indices, are similarly troubled by incompatible definitions, uneven measurement, and varying degrees of manipulation. Ask Google what the population of Afghanistan is, and it will happily show you an annual timeline of population since 1960, but the tidiness of the chart belies the murkiness of the estimate.

One of the drawbacks of easily accessible international datasets from organizations like the World Bank and Our World in Data is that they paper over the huge differences among the underlying source datasets. Ultimately, you end up with one number from each country and the implication that they are all pointing to a single construct. This makes it far too easy to draw confident comparisons between countries that simply aren’t measuring the same thing. Without being forced to assemble these datasets yourself, it’s difficult to appreciate how messy it is to measure “the same thing” across different places (or even to measure the same thing over time within one place).

When evaluating a statistical claim, it’s always worth asking where the numbers come from and how they were measured. It’s easy to take figures at face value, especially when they’re rarely presented with any explicit uncertainty, which may be large. This goes double for more esoteric constructs like freedom scores or corruption indices, which often show up in social media posts cheerleading (or doom-mongering) one country over another. I remember one slickly produced video uncritically comparing COVID-19 statistics between Australia and Niger on the basis that they have the same population (do they?). Niger is one of the poorest and youngest countries in the world, and differences in demographics and health infrastructure alone invalidate any straightforward comparison with a wealthy Western country.