Research
Research
Publications, working papers, and works-in-progress.
Working Papers
“Post” Pre-Analysis Plans: Valid Inference for Non-Preregistered Specifications
Pre-analysis plans (PAPs) have become standard in experimental economics research, but it is nevertheless common to see researchers deviating from their PAPs to supplement preregistered estimates with non-prespecified findings. While such ex-post analysis can yield valuable insights, there is broad uncertainty over how to interpret—or whether to even acknowledge—non-preregistered results. In this paper, we consider the case of a truth-seeking researcher who, after seeing the data, earnestly wishes to report additional estimates alongside those preregistered in their PAP. We show that, even absent “nefarious” behavior, conventional confidence intervals and point estimators are invalid due to the fact that non-preregistered estimates are only reported in a subset of potential data realizations. We propose inference procedures that account for this conditional reporting. We apply these procedures to Bessone et al. (2021), who study the economic effects of increased sleep among the urban poor. We demonstrate that, depending on the reason for deviating, the adjustments from our procedures can range from having no difference to an economically significant difference relative to conventional practice. Finally, we consider the robustness of our procedure to certain forms of misspecification, motivating possible heuristic checks and norms for journals to adopt.
Integrating Diagnostic Checks into Estimation
Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one’s baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a “free lunch,” delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.
“Narrative Hacking”
Economists seldom base conclusions on isolated hypothesis tests. Rather, it is common to combine multiple atomic tests in a logical structure to serve higher-order purposes, such as distinguishing between competing theories, defending causal claims, diagnosing mechanisms, and arranging disparate facts into coherent stories. These economic “narratives” are foundational to how findings are framed, interpreted, and communicated—governing a paper’s overarching message and ultimate societal impact. Yet, a single set of test outcomes can support many narratives, not all of which are true. While practitioners today recognize the importance of multiple testing corrections to guard against practices such as p-hacking, these procedures target atomic errors, not those of downstream narratives built upon them. This paper presents a general model for the construction and testing of narratives that admits a formal definition of Type I “narrative error.” After partitioning narratives into two classes—monotonic (e.g., impact evaluations) and non-monotonic (e.g., balance checks)—I first show a positive result: if a narrative admits a representative test that is (weakly) increasing in atomic rejections, any procedure that controls the family-wise error rate (FWER) over underlying atomic tests at level alpha automatically delivers uniform narrative size control at alpha. A corollary is a “free narrative shopping” guarantee: once atomic tests are fixed or preregistered, researchers may explore any monotone narratives ex post without inflating size, thereby immunizing them against potential concerns of ex post “narrative-hacking.” I then find an impossibility result: when testing sets include non-monotone narratives—such as a set containing both a narrative and its negation—atomic FWER control cannot achieve uniform narrative size control with alpha < 0.5. To accommodate arbitrary collections of narratives, I provide a novel procedure that relates uniform size control to the construction of joint confidence sets. I show this approach is necessary and sufficient for uniform narrative error control. I demonstrate practical implementation of this procedure by replicating several key findings in Dell (2010).
Publications
Online Estimation of DSGE Models
This paper illustrates the usefulness of sequential Monte Carlo (SMC) methods in approximating dynamic stochastic general equilibrium (DSGE) model posterior distributions. We show how the tempering schedule can be chosen adaptively, document the accuracy and runtime benefits of generalized data tempering for ‘online’ estimation (that is, re-estimating a model as new data become available), and provide examples of multimodal posteriors that are well captured by SMC methods. We then use the online estimation of the DSGE model to compute pseudo-out-of-sample density forecasts and study the sensitivity of the predictive performance to changes in the prior distribution. We find that making priors less informative (compared with the benchmark priors used in the literature) by increasing the prior variance does not lead to a deterioration of forecast accuracy.
Estimating HANK for Central Banks
We provide a toolkit for efficient online estimation of heterogeneous agent (HA) New Keynesian (NK) models based on Sequential Monte Carlo methods. We use this toolkit to compare the out-of-sample forecasting accuracy of a prominent HANK model, Bayer et al. (2022), to that of the representative agent (RA) NK model of Smets and Wouters (2007, SW). We find that HANK’s accuracy for real activity variables is notably inferior to that of SW. The results for consumption in particular are disappointing since the main difference between RANK and HANK is the replacement of the RA Euler equation with the aggregation of individual households’ consumption policy functions, which reflects inequality.
Works in progress
Inference with Selected Instruments (with Vod Vilfort)
Optimal Pre-Analysis Plan Specification for Ex Post Deviations (with Vod Vilfort)