Research

Research

Publications, working papers, and works-in-progress.

Working Papers

“Post” Pre-Analysis Plans: Valid Inference for Non-Preregistered Specifications

with Vod Vilfort · Working Paper · 2026 Pre-analysis plans (PAPs) have become standard in experimental economics research, but it is nevertheless common to see researchers deviating from their PAPs to supplement preregistered estimates with non-prespecified findings. While such ex-post analysis can yield valuable insights, there is broad uncertainty over how to interpret—or whether to even acknowledge—non-preregistered results. In this paper, we consider the case of a truth-seeking researcher who, after seeing the data, earnestly wishes to report additional estimates alongside those preregistered in their PAP. We show that, even absent “nefarious” behavior, conventional confidence intervals and point estimators are invalid due to the fact that non-preregistered estimates are only reported in a subset of potential data realizations. We propose inference procedures that account for this conditional reporting. We apply these procedures to Bessone et al. (2021), who study the economic effects of increased sleep among the urban poor. We demonstrate that, depending on the reason for deviating, the adjustments from our procedures can range from having no difference to an economically significant difference relative to conventional practice. Finally, we consider the robustness of our procedure to certain forms of misspecification, motivating possible heuristic checks and norms for journals to adopt.

Integrating Diagnostic Checks into Estimation

with Vod Vilfort · Working Paper · 2026 Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one’s baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a “free lunch,” delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.

“Narrative Hacking”

Working Paper · 2025 Economists seldom base conclusions on isolated hypothesis tests. Rather, it is common to combine multiple atomic tests in a logical structure to serve higher-order purposes, such as distinguishing between competing theories, defending causal claims, diagnosing mechanisms, and arranging disparate facts into coherent stories. These economic “narratives” are foundational to how findings are framed, interpreted, and communicated—governing a paper’s overarching message and ultimate societal impact. Yet, a single set of test outcomes can support many narratives, not all of which are true. While practitioners today recognize the importance of multiple testing corrections to guard against practices such as p-hacking, these procedures target atomic errors, not those of downstream narratives built upon them. This paper presents a general model for the construction and testing of narratives that admits a formal definition of Type I “narrative error.” After partitioning narratives into two classes—monotonic (e.g., impact evaluations) and non-monotonic (e.g., balance checks)—I first show a positive result: if a narrative admits a representative test that is (weakly) increasing in atomic rejections, any procedure that controls the family-wise error rate (FWER) over underlying atomic tests at level alpha automatically delivers uniform narrative size control at alpha. A corollary is a “free narrative shopping” guarantee: once atomic tests are fixed or preregistered, researchers may explore any monotone narratives ex post without inflating size, thereby immunizing them against potential concerns of ex post “narrative-hacking.” I then find an impossibility result: when testing sets include non-monotone narratives—such as a set containing both a narrative and its negation—atomic FWER control cannot achieve uniform narrative size control with alpha < 0.5. To accommodate arbitrary collections of narratives, I provide a novel procedure that relates uniform size control to the construction of joint confidence sets. I show this approach is necessary and sufficient for uniform narrative error control. I demonstrate practical implementation of this procedure by replicating several key findings in Dell (2010).


Publications

Online Estimation of DSGE Models

with Michael Cai, Marco Del Negro, Edward Herbst, Ethan Matlin, and Frank Schorfheide · The Econometrics Journal · Vol. 24, Issue 1, January 2021.

This paper illustrates the usefulness of sequential Monte Carlo (SMC) methods in approximating dynamic stochastic general equilibrium (DSGE) model posterior distributions. We show how the tempering schedule can be chosen adaptively, document the accuracy and runtime benefits of generalized data tempering for ‘online’ estimation (that is, re-estimating a model as new data become available), and provide examples of multimodal posteriors that are well captured by SMC methods. We then use the online estimation of the DSGE model to compute pseudo-out-of-sample density forecasts and study the sensitivity of the predictive performance to changes in the prior distribution. We find that making priors less informative (compared with the benchmark priors used in the literature) by increasing the prior variance does not lead to a deterioration of forecast accuracy.

Hindsight and Sequential Rationality of Correlated Play

with Dustin Morrill, Ryan D’Orazio, Reca Sarfati, Marc Lanctot, James R Wright, Amy R Greenwald, Michael Bowling · Proceedings of the AAAI Conference on Artificial Intelligence · 35(6), 5584-5594, May 2021.

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

Estimating HANK for Central Banks

with Sushant Acharya, Marco Del Negro, Ethan Matlin, Reca Sarfati, William Chen, Keshav Dogra, Shlok Goyal, Donggyu Lee, Sikata Sengupta · Heterogeneity in Macroeconomics: Implications for Monetary Policy, 1st ed. Central Bank of Chile · 2024

We provide a toolkit for efficient online estimation of heterogeneous agent (HA) New Keynesian (NK) models based on Sequential Monte Carlo methods. We use this toolkit to compare the out-of-sample forecasting accuracy of a prominent HANK model, Bayer et al. (2022), to that of the representative agent (RA) NK model of Smets and Wouters (2007, SW). We find that HANK’s accuracy for real activity variables is notably inferior to that of SW. The results for consumption in particular are disappointing since the main difference between RANK and HANK is the replacement of the RA Euler equation with the aggregation of individual households’ consumption policy functions, which reflects inequality.


Works in progress