A lot of genomic knowledge are often curated in the form of genesets where a combination of genes are involved in certain biological process/function/pathway. In other words, a pathway is often enriched with different combination of genes. Similarly this can also be related to patient-genesets data structure.

This can be thought of as a bipartite graph representing relation between individual pathway to a gene. Often bioinformaticians are interested to visualize if there is any kind of relation between the pathways. This problem can be modeled as the conversion of two-mode network (bipartite graph) to one-mode network of pathways and visualize the resulting one-mode network.

`# Load required libraries library("stringr") library("igraph") library("RColorBrewer") library("gplots") library("cluster")`

`jColFun <- colorRampPalette(brewer.pal(n = 9, "YlOrRd"))`

`# Load input data file file.dat <- url("https://dl.dropboxusercontent.com/u/30823824/dat_genesets.txt") dat <- read.table(file.dat, sep = "\t", header = T, as.is = c("Pathway", "Genesets")) str(dat)`

`## 'data.frame': 244 obs. of 4 variables: ## $ Pathway : chr "P-1" "P-2" "P-3" "P-4" ... ## $ pvalue : num 6.31e-12 8.51e-10 4.90e-08 6.03e-08 1.55e-07 ... ## $ BHpvalue: num 1.58e-09 1.07e-07 3.72e-06 3.72e-06 7.59e-06 ... ## $ Genesets: chr "Gene-1,Gene-2,Gene-3,Gene-4,Gene-5,Gene-6" ...`

`### Filter data using some p-value cutoff. ### (pvalue is for the significance of the geneset enrichment)) threshold <- 0.005 dat.sub <- subset(dat, dat$BHpvalue <= threshold)`

`pathways <- dat.sub$Pathway genes <- sort(unique(unlist(str_split(dat.sub$Genesets, ","))), decreasing = F) # Compute bipartite graph bip <- matrix(0, nrow = length(pathways), ncol = length(genes)) colnames(bip) <- genes rownames(bip) <- pathways list.genes <- str_split(dat.sub$Genesets, ",") list.gene.index <- lapply(list.genes, function(x) which(colnames(bip) %in% x)) for (i in 1:length(list.gene.index)) { bip[i, list.gene.index[[i]]] <- 1 }`

`dat.sub$TotalGenes <- unlist(lapply(list.gene.index, length)) dat.sub$Color <- rev(jColFun(nrow(dat.sub)))`

`heatmap.2(bip, col = jColFun(256), Colv = TRUE, Rowv = TRUE, dendrogram = "both", trace = "none", hclustfun = function(x) hclust(x, method = "ward"), distfun = function(x) dist(x, method = "binary"), colsep = c(1:25), rowsep = c(1:25), sepcolor = "white", sepwidth = c(0.05, 0.05), key = "FALSE", cexRow = 0.8, cexCol = 0.8)`

```
# Compute Distance Matrix
dat.dist <- dist(bip, method = "binary", diag = TRUE, upper = TRUE)
plot(hclust(dat.dist, method = "ward"), hang = 0.1)
```

## Compute Connection Specific Index (CSI)

In a bi-partite graph, the Y-type hub nodes may confer artificially high levels of interaction-profile similarity for the X-type nodes which may not be very informative and create biases when clustering the data (Green *et al.*, 2011; Fuxman Bass *et al.*, 2013). The connection specificity index (CSI) provides a context-dependent measure that mitigates the effect of nonspecific interactions by ranking the significance of similarity between two X-type nodes according to the specificity of their shared interaction partners (Green *et al.*, 2011).

`### First compute pearson correlation dat.pcc <- cor(t(bip), method = "pearson") ### Function to compute CSI: Connection Specific Index ### get.csi <- function(dat) { mat <- matrix(0, nrow = nrow(dat), ncol = ncol(dat)) colnames(mat) <- colnames(dat) rownames(mat) <- rownames(dat) for (i in 1:nrow(dat)) { a <- rownames(dat)[i] for (j in 1:ncol(dat)) { b <- colnames(dat)[j] pcc.ab <- dat[a, b] - 0.05 conn.pairs.a <- colnames(dat)[which(dat[a, ] >= pcc.ab)] conn.pairs.b <- rownames(dat)[which(dat[, b] >= pcc.ab)] conn.pairs.ab <- length(union(conn.pairs.a, conn.pairs.b)) n <- nrow(dat) csi <- 1 - (conn.pairs.ab/n) mat[i, j] <- csi } } return(mat) } dat.csi <- get.csi(dat.pcc)`

`heatmap.2(as.matrix(dat.csi), col = jColFun(256), Colv = TRUE, Rowv = TRUE, dendrogram = "both", trace = "none", hclustfun = function(x) hclust(x, method = "ward"), distfun = function(x) dist(x, method = "minkowski", p = 1), colsep = c(1:25), rowsep = c(1:25), sepcolor = "white", sepwidth = c(0.05, 0.05), key = "FALSE", cexRow = 0.8, cexCol = 0.8, main = "CSI")`

## Visualizing the network

Here I’ve used igraph R package to visualize the network of pathways. The correlation matrix computed using the CSI index serves as an adjacency matrix for the graph and the value in the matrix is converted as a weight to the corresponding edge in the graph. (For more on the topic: Working with Bipartite/Affiliation Network Data in R)

`# Convert the adjacency (correlation) matrix to an igraph object g <- graph.adjacency(dat.csi, mode = "undirected", weighted = TRUE, diag = FALSE) g <- simplify(g, edge.attr.comb = list(weight = "sum"))`

`# get the node size ratio for (ctr in 1:length(V(g)$name)) { index <- which(dat.sub$Pathway == V(g)$name[ctr]) V(g)$size[ctr] <- dat.sub$TotalGenes[index] V(g)$color[ctr] <- dat.sub$Color[index] } V(g)$size <- V(g)$size * 3 # Set vertex attributes V(g)$label <- V(g)$name V(g)$label.color <- "black" V(g)$label.cex <- 0.8 V(g)$label.family <- "Helvetica" V(g)$frame.color <- "grey" #Set edge attributes E(g)$color <- rgb(0.6, 0.6, 0, E(g)$weight) E(g)$width <- E(g)$weight * 2`

`plot(g, layout = layout.kamada.kawai)`

## Using PAM Clustering with k=2

```
samplecenters <- 2
pr.pam <- pam(dat.csi, metric = "euclidean", k = samplecenters)
clusplot(pr.pam, shade = FALSE, labels = 2, cex.txt = 0.7, col.p = "black",
col.txt = "black", cex = 1.25, color = TRUE, plotchar = FALSE, span = FALSE,
pch = 19, main = "PAM Cluster")
```