Running Stata within a Jupyter Notebook and Rendering to HTML with Quarto.

Author

Bernhard Bieri

Published

September 2, 2022

This notebook illustrates how to run Stata code within a Jupyter notebook by leveraging PyStata and rendering a Quarto HTML output. It is based on the following tutorial. Note that you could also directly use a .qmd file and specify advanced formatting options within its YAML header.

Install the Components:

In your conda console, type the following commands to create an environment and install the required packages to run the script.

# Create a new environment and activate it:
conda create --name StataPythonQuarto
conda activate StataPythonQuarto
# To run the Jupyter Notebook:
conda install ipykernel
# Stata bridge itself:
pip install --upgrade --user stata_setup
# For matplotlib graphs:
pip install -U matplotlib

Set the YAML:

The YAML header contains the metadata of your document such as the title, the name of the author and the output format. In your Jupyter notebook, add a raw chunk and adapt the following code.

---
title: Running Stata within a Jupyter Notebook and Rendering to HTML with Quarto.
author: Bernhard Bieri
date: "09-02-2022"
theme: "cosmo"
format: html
---

Wrap things up by loading the required packages and setting up PyStata with the code below.

# Setup Stata from within Python
import stata_setup
stata_setup.config("C:/Program Files/Stata17", "be")
# Import other utilities
import json
from pandas import json_normalize;

Run your Data Analysis

You’re all set to run the data analysis part with Python and Stata. Simply copy and paste the code below to get it set up and running. Note: you can find the .did dataset on the Stata website.

# Import the json file into a Python dataframe
with open("did.json") as json_file:
    data = json.load(json_file)
data = json_normalize(data, 'records', ['hospital_id', 'month'])
# Load Python dataframe into Stata
from pystata import stata
stata.pdataframe_to_data(data, True)
# Run block of Stata code
stata.run('''
destring satisfaction_score, replace
destring hospital_id, replace
destring month, replace

gen proc = 0
replace proc = 1 if procedure == "New"
label define procedure 0 "Old" 1 "New"
drop procedure
rename proc procedure
label value procedure procedure
''', quietly=True)

stata.run('''
didregress (satisfaction_score) (procedure), group(hospital_id) time(month)
''', echo=True)


. 
. didregress (satisfaction_score) (procedure), group(hospital_id) time(month)

Number of groups and treatment time

Time variable: month
Control:       procedure = 0
Treatment:     procedure = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
 hospital_id |        28         18
-------------+---------------------
Time         |
     Minimum |         1          4
     Maximum |         1          4
-----------------------------------

Difference-in-differences regression                     Number of obs = 7,368
Data type: Repeated cross-sectional

                           (Std. err. adjusted for 46 clusters in hospital_id)
------------------------------------------------------------------------------
             |               Robust
satisfacti~e | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
   procedure |
       (New  |
         vs  |
       Old)  |   .8479879   .0321121    26.41   0.000     .7833108     .912665
------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

. 
# Load Stata saved results to Python
r = stata.get_return()['r(table)']
# Use them in Python
print("\n")
print("The treatment hospitals had a %5.2f-point increase." % (r[0][0]), end=" ")
print("The result is with 95%% confidence interval [%5.2f, %5.2f]." % (r[4][0], r[5][0]))


The treatment hospitals had a  0.85-point increase. The result is with 95% confidence interval [ 0.78,  0.91].
# Generate Stata graph in Python
stata.run("estat trendplots", echo=True)
stata.run("graph export did.png, replace", quietly=True)
. estat trendplots

To render the report and activate the live preview hit ... in the Jupyter notebook options bar on VSCode and select Render to HTML. You can also render your document by hitting the CTRL + Shift + K key-binding.

Check out Jamel’s tutorial to see how this can be done without using Quarto. It makes outputs less straightforward to customise though.