Friday, August 3, 2012

Graph combination software (Shin, Tsuda, Schölkopf)

Sometime ago I was driven to write a conference paper on positive semi-definite matrix combination bent towards binary classification and its application to gene functional prediction. The team leader (yes, AMG) made me program a competing method so we could test against it. The competing method was the one proposed in the paper "Protein functional class prediction with a combined graph" by Hyunjung Shin.

I have worked on a number of papers with my old team, but this one was the one with the most potential. Up until that date, I had rarely worked on these techniques before and due to poor management, that paper got rejected. The underlying idea was great, but I was left with all the work and with little experience, all to do within the week. The competing method I programmed was never used and that weighed on the reviewers' decision to reject the paper.

Considering that the nodes are present in all graphs (they refer to the same objects), but are interconnected differently, the method defines a diffusion process on a graph, but penalizes the dissimilarity between adjacent nodes. Let $L_k$ be the Laplacian matrix for each graph, $K$ the total number of graphs, and the vector $y$ that we try to approximate
$$
\min_{\beta,f} = \sum\limits_{k=1}^K \beta f^T L f - \log \det \left( I - \sum\limits_{k=1}^K \beta_k L_k \right) + \mu (f-y)^T C (f-y)
$$
subject to $\beta_k\geq 1$ and $\beta^T 1 < 0.5$
where $\mu$ regularizes the approximation term. Notice that the coefficient computed $f$ are the same for each Laplacian.

So here I left it to you. Try it with small graphs because there is something missing. If you modify it please let me know.

A testing program example is
load("mat.RData")

source("graph_comb_shin.R")

d1<-apply(G1,1,sum)
id1<-d1
id1[id1>0]<-1/id1[id1>0]
iD1<-diag(sqrt(id1))
D1<-diag(d1)
L1<-iD1%*%(D1-G1)%*%iD1
EL1<-eigen(L1)
rm(d1,id1,D1,iD1,G1)

d2<-(apply(G2,1,sum))
id2<-d2
id2[id2>0]<-1/id2[id2>0]
iD2<-diag(sqrt(id2))
D2<-diag(d2)
L2<-iD2%*%(D2-G2)%*%iD2
EL2<-eigen(L2)
rm(d2,id2,D2,iD2,G2)

d3<-(apply(G3,1,sum))
id3<-d3
id3[id3>0]<-1/id3[id3>0]
iD3<-diag(sqrt(id3))
D3<-diag(d3)
L3<-iD3%*%(D3-G3)%*%iD3
EL3<-eigen(L3)
rm(d3,id3,D3,iD3,G3)

LL=list(L1,L2,L3)
rm(L1,L2,L3)

for (c in c(1:50)) {gc()}

mu=10
delta=0.4
tol=0.1

y=as.matrix(variables[,1]==1)
res=graph_comb_shin(LL,y,mu,delta,tol)

save.image("shin.Rdata")

Download the software.
Checkout the paper.

No comments:

Post a Comment