Running Stata within a Jupyter Notebook and Rendering to HTML with Quarto.

Author

Bernhard Bieri

Published

September 2, 2022

Introduction:

This notebook illustrates how to run Stata code within a Quarto file by leveraging PyStata and rendering a Quarto HTML output. It is based on the following tutorial. Note that you could also use a Jupyter notebook.

Install the Components:

In your conda console, type the following commands to create an environment and install the required packages to run the script.

# Create a new environment and activate it:
conda create --name StataPythonQuarto
conda activate StataPythonQuarto
# To run the Jupyter Notebook:
conda install ipykernel
# Stata bridge itself:
pip install --upgrade --user stata_setup
# For matplotlib graphs:
pip install -U matplotlib

Set the YAML:

The YAML header contains the metadata of your document such as the title, the name of the author and the output format. In your Jupyter notebook, add a raw chunk and adapt the following code.

---
title: Running Stata within a Jupyter Notebook and Rendering to HTML with Quarto.
author: Bernhard Bieri
date: "09-02-2022"
theme: "cosmo"
format: html
---
# Setup Stata from within Python
import stata_setup
stata_setup.config("C:/Program Files/Stata17", "be")
# Import other utilities
import json
from pandas import json_normalize

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       BE—Basic Edition

 Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Single-user  perpetual
Serial number: 301706398922
  Licensed to: Bernhard Bieri
               IHEID

Notes:
      1. Unicode is supported; see help unicode_advice.

Run the Data Analysis:

You’re all set to run the data analysis part with Python and Stata. Simply copy and paste the code below to get it set up and running. Note: you can find the .did dataset on the Stata website.

# Import the json file into a Python dataframe
with open("did.json") as json_file:
    data = json.load(json_file)
data = json_normalize(data, 'records', ['hospital_id', 'month'])
# Load Python dataframe into Stata
from pystata import stata
stata.pdataframe_to_data(data, True)
# Run block of Stata code 
stata.run('''
destring satisfaction_score, replace
destring hospital_id, replace
destring month, replace

gen proc = 0
replace proc = 1 if procedure == "New"
label define procedure 0 "Old" 1 "New"
drop procedure
rename proc procedure
label value procedure procedure
''', quietly=True)

stata.run('''
didregress (satisfaction_score) (procedure), group(hospital_id) time(month)
''', echo=True)

. 
. didregress (satisfaction_score) (procedure), group(hospital_id) time(month)

Number of groups and treatment time

Time variable: month
Control:       procedure = 0
Treatment:     procedure = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
 hospital_id |        28         18
-------------+---------------------
Time         |
     Minimum |         1          4
     Maximum |         1          4
-----------------------------------

Difference-in-differences regression                     Number of obs = 7,368
Data type: Repeated cross-sectional
                           (Std. err. adjusted for 46 clusters in hospital_id)
------------------------------------------------------------------------------
             |               Robust
satisfacti~e | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
   procedure |
       (New  |
         vs  |
       Old)  |
   .8479879   .0321121    26.41   0.000     .7833108     .912665
------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

. 
# Load Stata saved results to Python
r = stata.get_return()['r(table)']
# Use them in Python
print("\n")
print("The treatment hospitals had a %5.2f-point increase." % (r[0][0]), end=" ")
print("The result is with 95%% confidence interval [%5.2f, %5.2f]." % (r[4][0], r[5][0]))


The treatment hospitals had a  0.85-point increase. The result is with 95% confidence interval [ 0.78,  0.91].
# Generate Stata graph in Python
stata.run("estat trendplots", echo=True)
stata.run("graph export did.png, replace", quietly=True)
. estat trendplots

To render the report and activate the live preview hit ... in the Jupyter notebook options bar on VSCode and select Render to HTML. You can also render your document by hitting the CTRL + Shift + K key-binding.

Check out Jamel’s tutorial to see how this can be done without using Quarto. It makes outputs less straightforward to customise though.