R Package for “Policy Targeting under Network Interference”

Reference: "Policy Targeting under Network Interference" (Viviano, forthcoming in Restud). R package developed by Jacob Carlson and Davide Viviano

Description

Dependencies

Exact optimization procedure method = 'MILP'

In the following example we generate a simple DGP and illustrate the functions of interest.

Example of data generating process to illustrate the method

library(NetworkTargeting)
#> Loading required package: tidyverse
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.3     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

## Hyperparameters
n <- 50
p = 4
mu = 1

## Simulate coefficients 
set.seed(99)
b_1 <- sample(c(-1,1), replace = T, size = p)
b_2 <- sample(c(-1,1), replace = T, size = p)
b_3 <- sample(c(-1,1), replace = T, size = p)
b_4 <- sample(c(-1.5,1.5), replace = T, size = p)
b_5 <- sample(c(-1,1), replace = T, size = p)

## Simulate covariates 
X <- runif(n*p, min = -1, max = 1)
X <- matrix(X, nrow=n, ncol=p)

## Simulate geometric network 
edges <- matrix(NA, nrow = n, ncol = n)
for(i in 1:(n-1)){ 
  edges[c((i + 1):n), i] <- sapply( abs(X[i,2] - X[ (i + 1):n,2])/2 +  abs(X[i,4] - X[ (i + 1):n,4])/2 , function(x) ifelse(x <= sqrt(4/(2.75*n)), 1, 0 ) )
}
diag(edges) <- rep(1,n)
edges[upper.tri(edges)] <- t(edges)[upper.tri(edges)]
edges[is.na(edges)] <- rev(edges[upper.tri(edges)])

## Simulate W
W <- edges
W_0 <- W
diag(W_0) <- rep(0, n)

## Draw neighbors 
D <- sapply(X%*%b_1 + rnorm(n), function(x) ifelse(x > 0, 1, 0))
spill1 <- X%*%b_3
spill2 <- mu + X%*%b_5
cate <-   X%*%b_4
errors <- rnorm(n)

## Local dependence 
errors <- errors/sqrt(2) + W_0%*%errors/(sqrt(2*max(1, apply(W_0, 1, sum))))
neighbors <-apply(W_0, 1, sum)
neighbors <- sapply(neighbors, function(x) ifelse(x > 0, x, 1))
y <-   as.vector(W_0%*%D)*spill2/neighbors + D*cate +  as.vector(D)*as.vector(W_0%*%D)*spill1/neighbors + errors

Exact Optimization method = 'MILP'

In the following lines we implement the method invoking the function NetworkTargeting. The user may select help(NetworkTargeting) for details. See below also for details on the parameters.

Inputs

In this example we use the first two columns of \(X\) to design treatments. We estimate the conditional mean function and the propensity score using all information we have about the \(X\).

#' @return final_coef: coefficients of the policy;
#' @return policies: individual treatment assignments

opt1 <- NetworkTargeting(
  Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0, 
  cost = 0, method = 'MILP', maximum_time = 100, 
  focal_units = rep(1, length(y)), 
  params = list(numcores = 2, verbose = F))

Extracting the policy

We can extract the policies as follows:

opt1$final_coef
#> [1]  0.5158864  1.0000000 -0.2729417

Above, we print the coefficients of the estimated policy (the first coefficient is the intercept):

\[ \pi(X)= 1\Big\{ X^\top \beta > 0\Big\}. \]

Prediction

To predict the treatment status we can use the following code:

## Policies 
policies1 <- PredictPolicies(opt1, X[, c(1,2)])

## Compute welfare associated with the policy
objective1 <- NetworkWelfare(
  policies = policies1, Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0)
print(objective1$welfare)
#> [1] 2.039243

## Compare with welfare with treating none of the individuals 
policies_none_treated = rep(0, n)
objective_not_treated <- NetworkWelfare(
  policies = policies_none_treated, Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0)
print(objective_not_treated$welfare)
#> [1] -0.1609397

Plot the network with red denoting individuals to be treated and blue individuals to keep under control:

suppressMessages(library(igraph))

network <- graph_from_adjacency_matrix(W_0, mode='undirected', diag=F )
color = sapply(policies1, function(x) ifelse(x == 1, 'red', 'blue'))
my_plot = plot(network, vertex.size     = degree(network) +4, vertex.label    = NA,
     vertex.color    = color)

Parameters for exact optimization method = 'MILP'

It is possible to select additional parameters and pass them under params_list. Here is a list of parameters that the user can choose.

params_default = list(
  # fitted model for conditional mean(if NA - default estimation is performed)
  ## if cross fitting = T, m1 needs to be a list of models for each obs
  m1 = NA,
  # remove effects on units with high degree from computations for increasing speed
  cut_high_neighb = F,
  # tolerance constraints for the MILP program (do not set less than 10**(-6))
  tolerance_constraint1 = 10**(-3),
  tolerance_constraint2 = 10**(-3),
  ## Upper and lower bound for the parameters
  B = 1, low_b = - 1,
  ## list of parameters to pass to gurobi
  params_gurobi = NA,
  ## introduce monotonicity constraint in the optimization (default FALSE)
  monotonicity_c = F,
  ## use double robust estimation
  doubly_robust = T,
  ## penalized method for estimating the propensity score
  penalized= F,
  ## number of strata for propensity score estimation (aggregate individuals with similar percentage of treated neighbors)  If NA each number of neighbor correspond to a different treatment status
  num_strata = 4,
  ## trimming on the propensity score (extrapolation is performed)
  trimming = 0.01,
  ## if not NA, a matrix with nrow = dim(X)[2] + 1 indicating upper bound (left column) and lower bound (right column) for each parameter
  additional_param_constraints = NA,
  ## return the gurobi model output without computations
  model_return = F,
  ## names of the column of the model passed to a random forest (keep NA, active only if m1 != 0 and is a random forest)
  names_rf = NA,
  ## print checking values for model stability
  print_checking = F,
  ## return the estimated conditional mean functions
  return_m1 = F,
  ## upper and lower bounds on the outcme variable
  bound_outcomes = c(-Inf, Inf),
  ## removed the subpopulation that is trimmed
  trimmed_subpopulation = F,
  ## minimum number of treated units
  min_treated = 0,
  ## number of cores for computations
  numcores = 1,
  ## cross_fitting: boolean note: m1 if not NA has to be a list with models for each observation (done via mincut)
  cross_fitting = F,
  ## when computing partitions of the graph for cross fitting
  ## use a slack parameter to find approximately equal sized
  ## partitions up to +- slack number of obs
  slack = 10,
  ## n_folds: number of folds for cross fitting
  n_folds = 10,
  ## print output from optimization
  verbose = F
)

Approximation procedure for medium scale \(n\) or block-diagonal graph MILP = 'MILP_graph_clustering'

When the adjacency matrix presents an (almost but not necessarily exact) block diagonal structure, it is possible to solve the welfare maximization problem in each block separately. The following function uses an approximation device that maximizes the empirical welfare by first partitioning the network into approximately independent sub-components (solved via a max-cut program) and then estimating the policy in each sub-network. The function returns the policy with largest welfare across all networks.

This implementation can be used to reduce the time complexity, while still using a mixed integer linear program formulation.

#' @return final_coef: coefficients of the policy;
#' @return policies: individual treatment assignments
#' @return solutions_clustering: matrix with solutions when MILP_graph_clustering is selected
#' @return partitions: list of indexes with partitions of the graph if MILP_graph_clustering is selected

opt2 <- NetworkTargeting(
  Y = y, X = X[, c(1,2)], Z = X,
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0, 
  cost = 0, method = "MILP_graph_clustering", maximum_time = 100, 
  params = list(
    numcores = 2, verbose = F,
    ## maximum time for the partition used to cut the network (active if MILP_graph_clustering is selected)
    maxtime_partitions = 100,
    ## size of each sub-partition of the network (active if MILP_graph_clustering is selected)
    length_small_samples = 20))

## Policies 
policies2 <- PredictPolicies(opt2, X[, c(1,2)])

## Compute welfare associated with the policy
objective2 <- NetworkWelfare(
  policies = policies2, Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0)

Three important parameters for MILP_graph_clustering

Three important parameters for the approximate method are:

Exact optimization with very large networks on internal machine method = 'MILP_sparse'

With a very large network you can have RAM constraints. We implemented a slower version of the MILP algorithm that uses a sparse matrix of constraints, saving memory.

To implement this function you need to make sure that Gurobi works in R through the R package ROI and ompr. If you have issues see below for optimization using the original Gurobi model.

#' @return final_coef: coefficients of the policy;
#' @return policies: individual treatment assignments

opt4 <- NetworkTargeting(
  Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0, 
  cost = 0, method = 'MILP_sparse', maximum_time = 100, 
  focal_units = rep(1, length(y)), 
  params = list(numcores = 2, 
  ## pass the solver used 
  solver = 'gurobi', verbose=F))

## Policies 
policies4 <- PredictPolicies(opt4, X[, c(1,2)])

## Compute welfare associated with the policy
objective4 <- NetworkWelfare(
  policies = policies4, Y = y, X = X[, c(1,2)], Z = X,
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0)
print(objective4$welfare)

Exact optimization with very large networks on external machine (selected model_return = T in the params list)

We can be interested in optimizing the model exactly with a large network. In this case, we suggest to run the following commands.

model <- NetworkTargeting(
  Y = y, X = X[, c(1,2)], Z = X, 
  Z_int = X, D = D, W_0 = W_0, W_0_train = W_0, 
  cost = 0, method = 'MILP', maximum_time = 100, 
  params = list(model_return = T, verbose = F))

## The function creates a Gurobi model. The model can be saved as follows: 
## Uncomment line 

# written_model <- gurobi_write(model, filename = './model.mps')

The above function saves a .mps file. The file can now be solved calling it from the shell of an external machine calling the function with the shell script:

gurobi_cl InputFile=./MS.prm ResultFile=./model.sol ./model.mps

The file MS.prm must contain the parameters. The default .prm can be found on the paper replication package available on the author’s webpage dviviano.github.io/projects.