mi@0: #!/usr/bin/python mi@0: # mi@0: # Copyright (C) Christian Thurau, 2010. mi@0: # Licensed under the GNU General Public License (GPL). mi@0: # http://www.gnu.org/licenses/gpl.txt mi@0: """ mi@0: PyMF Semi Non-negative Matrix Factorization. mi@0: mi@0: SNMF(NMF) : Class for semi non-negative matrix factorization mi@0: mi@0: [1] Ding, C., Li, T. and Jordan, M.. Convex and Semi-Nonnegative Matrix Factorizations. mi@0: IEEE Trans. on Pattern Analysis and Machine Intelligence 32(1), 45-55. mi@0: """ mi@0: mi@0: mi@0: mi@0: import numpy as np mi@0: mi@0: from nmf import NMF mi@0: mi@0: __all__ = ["SNMF"] mi@0: mi@0: class SNMF(NMF): mi@0: """ mi@0: SNMF(data, num_bases=4) mi@0: mi@0: Semi Non-negative Matrix Factorization. Factorize a data matrix into two mi@0: matrices s.t. F = | data - W*H | is minimal. mi@0: mi@0: Parameters mi@0: ---------- mi@0: data : array_like, shape (_data_dimension, _num_samples) mi@0: the input data mi@0: num_bases: int, optional mi@0: Number of bases to compute (column rank of W and row rank of H). mi@0: 4 (default) mi@0: mi@0: Attributes mi@0: ---------- mi@0: W : "data_dimension x num_bases" matrix of basis vectors mi@0: H : "num bases x num_samples" matrix of coefficients mi@0: ferr : frobenius norm (after calling .factorize()) mi@0: mi@0: Example mi@0: ------- mi@0: Applying Semi-NMF to some rather stupid data set: mi@0: mi@0: >>> import numpy as np mi@0: >>> data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]]) mi@0: >>> snmf_mdl = SNMF(data, num_bases=2) mi@0: >>> snmf_mdl.factorize(niter=10) mi@0: mi@0: The basis vectors are now stored in snmf_mdl.W, the coefficients in snmf_mdl.H. mi@0: To compute coefficients for an existing set of basis vectors simply copy W mi@0: to snmf_mdl.W, and set compute_w to False: mi@0: mi@0: >>> data = np.array([[1.5], [1.2]]) mi@0: >>> W = np.array([[1.0, 0.0], [0.0, 1.0]]) mi@0: >>> snmf_mdl = SNMF(data, num_bases=2) mi@0: >>> snmf_mdl.W = W mi@0: >>> snmf_mdl.factorize(niter=1, compute_w=False) mi@0: mi@0: The result is a set of coefficients snmf_mdl.H, s.t. data = W * snmf_mdl.H. mi@0: """ mi@0: mi@0: mi@0: def update_w(self): mi@0: W1 = np.dot(self.data[:,:], self.H.T) mi@0: W2 = np.dot(self.H, self.H.T) mi@0: self.W = np.dot(W1, np.linalg.inv(W2)) mi@0: mi@0: def update_h(self): mi@0: def separate_positive(m): mi@0: return (np.abs(m) + m)/2.0 mi@0: mi@0: def separate_negative(m): mi@0: return (np.abs(m) - m)/2.0 mi@0: mi@0: XW = np.dot(self.data[:,:].T, self.W) mi@0: mi@0: WW = np.dot(self.W.T, self.W) mi@0: WW_pos = separate_positive(WW) mi@0: WW_neg = separate_negative(WW) mi@0: mi@0: XW_pos = separate_positive(XW) mi@0: H1 = (XW_pos + np.dot(self.H.T, WW_neg)).T mi@0: mi@0: XW_neg = separate_negative(XW) mi@0: H2 = (XW_neg + np.dot(self.H.T,WW_pos)).T + 10**-9 mi@0: mi@0: self.H *= np.sqrt(H1/H2) mi@0: mi@0: if __name__ == "__main__": mi@0: import doctest mi@0: doctest.testmod()