a. To minimize the least square problem
$$ \{C,\ \boldsymbol{V}_{1} ,...,\boldsymbol{V}_{d}\} =\ \arg\min{\left\Vert Y-\Sigma_{i=1}^{p} X_{i} B\right\Vert } \\ B_{j} =C_{j} \times _{1} U_{j1} \times _{2} U_{j2} \ \times _{3} ...\times _{l_{j}} \ U_{jl_{j}} \ \times _{l_{j} +1} \ V_{1} \ \times _{l_{j} +2} ...\times _{l_{j} +d} \ V_{d} \\ V_{i}^{T} V_{i} =I_{\tilde{Q}_{i}} $$Where $\displaystyle Y\ \in R^{Q_{1} \times ...\times Q_{d}}$ and $\displaystyle X_{i} \in R^{P_{i1} \times ...\times P_{il_{i}}} \ C_{j} \in R^{\tilde{P}_{1} \times ...\times \tilde{P}_{l_{j}} \times \tilde{Q}_{1} \times ...\times \tilde{Q}_{d}}$ is a core tensor with $\displaystyle \tilde{P}_{ji} \ll \ P_{ji}$ and $\displaystyle \tilde{Q}_{i} \ll Q_{i} ;\{U_{ij} :j=1,...,p;\ i=1,...,l_{j}\}$ is a set of bases that span the $\displaystyle j^{\text{th}}$ input space; and $\displaystyle \{V_{j} :i=1,...,d\}$ is a set of bases that spans the output space.
Prove:
When $U_{ij}$, $V_{j}$ and $R_{j}$ are known, a reshape from the core tensor $C_{j}$ can be estimated as:
\begin{equation*} \tilde{C}_{j} =R_{j} \times _{1}\left(\boldsymbol{Z}_{j}^{T}\boldsymbol{Z}_{j}\right)^{-1}\boldsymbol{Z}_{j}^{T} \times _{2}\boldsymbol{V}_{1}^{T} \times _{3}\boldsymbol{V}_{2}^{T} ...\times _{d+1}\boldsymbol{V}_{d}^{T} \end{equation*}Where $Z_{j} =X_{j( 1)}(\boldsymbol{U}_{jl} \otimes \boldsymbol{U}_{jl-1} \otimes ...\otimes \boldsymbol{U}_{j1})$ and $R_{j} =Y-\Sigma _{i\neq j}^{p} B_{j} *X_{j}$.
1) Prove: $$ \arg\min _{C}\left\Vert R_{j( 1)} -X_{j( 1)} B_{j}\right\Vert _{F}^{2} = \arg\min _{C}\left\Vert vec( R_{j( 1)}) -( V_{d} \otimes V_{d-1} ...\otimes V_{1} \otimes Z_{j}) vec( C_{j})\right\Vert _{F}^{2} $$ Where $ vec( X)$ stacks the columns of matrix $\displaystyle X$ on top of each other. Hint: $ \left( vec\left( ABC^{T}\right) =( C\otimes A) vec( B)\right)$
On the left side, first we apply the $\displaystyle vec$ representation, followed by replacing $\displaystyle X_{j( 1)}$ (modified) and $\displaystyle Bj$ by their already mentioned definitions:
$$ \begin{equation*} \arg\min_{C}\left\Vert vec( R_{j( 1)} -X_{j( 1)} B_{j}) \ ⩮ \ vec( R_{j( 1)}) -vec\left( Z_{j}( U_{jl} \otimes U_{jl-1} \otimes ...\otimes U_{j1})^{-1}( C_{j} \times _{1} U_{j1} \times ...\times _{l_{j} +d} \ V_{d})\right)\right\Vert _{F}^{2} \end{equation*} $$We can now use the first 'hint' once to cross out the $\displaystyle U$s from the definition of $\displaystyle B_{j}$ and a second time you reformat the formula:
\begin{gather*} \arg\min_{C}\Vert ...⩮ \ vec( R_{j( 1)}) -vec( Z_{j} C_{j}( V_{d} \otimes V_{d-1} ...\otimes V_{1}))\Vert _{F}^{2}\\ \arg\min_{C}\Vert ...⩮ \ vec( R_{j( 1)}) -( V_{d} \otimes V_{d-1} ...\otimes V_{1} \otimes Z_{j}) vec( C_{j})\Vert _{F}^{2} \end{gather*}2) Prove $ \arg\min_{C}\Vert vec( R_{j( 1)}) -( V_{d} \otimes V_{d-1} ...\otimes V_{1} \otimes Z_{j}) vec( C_{j})\Vert _{F}^{2} $ has optimal solution: $$ vec( C_{j}) =\left( V^T_{d} \otimes V^T_{d-1} ...\otimes V^T_{1} \otimes _{1}\left( Z_{j}^{T} Z_{j}\right)^{-1} Z_{j}^{T}\right) vec( R_{j( 1)}) $$
For $W=(V_{d} \otimes V_{d-1} ...\otimes V_{1} \otimes Z_{j} )$, we can isolate $vec(C_{j} )$ by moving it to the left side and apply $\displaystyle \left( W^{T} W\right)^{-1} W^{T}$ to both side of the equation on their left side:
\begin{gather*} vec( C_{j}) \ =\ \left( W^{T} W\right)^{-1} W^{T} \ vec( R_{j( 1)})\\ =\ \left( V_{d}^{T} V_{d} \otimes V_{d-1}^{T} V_{d-1} ...\otimes V_{1}^{T} V_{1}\right)^{-1} \otimes \left( Z_{j}^{T} Z_{j}\right)^{-1} \otimes (V_{d} \otimes V_{d-1} ...\otimes V_{1})^{T} \otimes Z_{j}^{T} \ vec( R_{j( 1)}) \end{gather*}Considering definition of $\displaystyle V_{d}^{T} V_{d}$ being the identity matrix and inverse of the identity is still the identity and its product returns the original matrix, we can simplify the formula as: \begin{equation*} vec( C_{j}) =\left( V_{d} \otimes V_{d-1} ...\otimes V_{1} \otimes _{1}\left( Z_{j}^{T} Z_{j}\right)^{-1} Z_{j}^{T}\right) vec( R_{j( 1)}) \end{equation*}
reversing the $vec$ we have:
\begin{equation*} \tilde{C}_{j} =R_{j} \times _{1}\left(\boldsymbol{Z}_{j}^{T}\boldsymbol{Z}_{j}\right)^{-1}\boldsymbol{Z}_{j}^{T} \times _{2}\boldsymbol{V}_{1}^{T} \times _{3}\boldsymbol{V}_{2}^{T} ...\times _{d+1}\boldsymbol{V}_{d}^{T} \end{equation*}b. Show that CP decomposition is the special case of tucker decomposition
From SIAM Review Vol. 51, No. 3 Pg.475 (Kolda,Bader 2009):
Considering the (elementwise) Tucker decomposition of a 3 dimensional tensor: $$ x_{ijk} \approx \sum _{p=1}^{P}\sum _{q=1}^{Q}\sum _{r=1}^{R} g_{pqr} a_{ip} b_{jq} c_{kr} \ \ \text{for} \ i=1,...,I,\ j=1,...,J,\ k=1,...,K\ \text{and} \ g;( a,b,c) \ \in \ G;( A,B,C) $$
If we impose superdiagonality on the core, we can dissolve two of the factors with the non-diagonal elements of the core $g_{pqr}$ and rewrite the same decomposition as: $x_{ijk} \approx \sum _{p=1}^{R} a_{ir} b_{jr} c_{kr}$ which is a CP decomposition by definition.
import numpy as np
import pandas as pd
import scipy.io as sio
import tensorly as tl
import tensorly.decomposition as td
import utils_ as ut
rs = ut.sparam(rs=True)
# Load aminoacid.mat
d_mat = sio.loadmat('aminoacid.mat')
data_X = d_mat['X'][0][0][0]
data_Y = d_mat['Y']
def cp_als(X, tol=1e-4, maxiter=200, random_state=rs):
np.random.seed(random_state)
i, j, k = X.shape
# Initialize A~1, A~2, A~3
U = np.random.rand(i, 3)
V = np.random.rand(j, 3)
W = np.random.rand(k, 3)
# Factor matrices X_1, X_2, X_3
X_1 = tl.unfold(X, 0)
X_2 = tl.unfold(X, 1)
X_3 = tl.unfold(X, 2)
# Initialize error, lambda and iter
error, err_1 = tol + 1, 500
lam = [1,1,1]
iter = 1
# ALS iterations
while error > tol and iter < maxiter:
# Update U
vect = V.T.dot(V) * W.T.dot(W)
U_ = X_1.dot(tl.kr([V,W]))\
.dot(np.linalg.inv(vect.T.dot(vect)))\
.dot(vect.T)
U_e = np.sum(U - U_)
U = U_
# Update V
vect = U.T.dot(U) * W.T.dot(W)
V_ = X_2.dot(tl.kr([U,W]))\
.dot(np.linalg.inv(vect.T.dot(vect)))\
.dot(vect.T)
V_e = np.sum(V - V_)
V = V_
# Update W
vect = U.T.dot(U) * V.T.dot(V)
W_ = X_3.dot(tl.kr([U,V]))\
.dot(np.linalg.inv(vect.T.dot(vect)))\
.dot(vect.T)
W_e = np.sum(W - W_)
W = W_
# Normalize U, V, W and store norm as lambda
lam = [np.linalg.norm(U, axis=0),
np.linalg.norm(V, axis=0),
np.linalg.norm(W, axis=0)]
U = U/lam[0]
V = V/lam[1]
W = W/lam[2]
iter += 1
# Update error (factor change, lambda change)
err_fac = np.abs(U_e + V_e + W_e)
err_lam = np.abs(np.sum(lam[1:])-6)
error = np.min([err_lam, np.abs(err_1 - err_fac)])
err_1 = err_fac
print('Convergence in %d iterations. (error: %.2e)' %
(iter-1, error))
return (U, V, W), lam[0]
cp_fac, cp_w = cp_als(data_X, tol=1e-6, maxiter=200)
Convergence in 119 iterations. (error: 9.81e-07)
Using TensorLy for comparison.
tl_w, tl_fac = \
td.parafac(data_X, rank=3, init='random',
tol=1e-6, random_state=40,
normalize_factors=True)
# Plot score matrix my implementation vs tensorly
ut.plt_score(cp_fac[0],'My Implementation',
tl_fac[0], 'w/ TensorLy Library',
data_Y, 'Normalized Y')
# Comparison of Weights (my implementation vs tensorly)
pd.DataFrame({'My Implementation':cp_w.astype(int), 'TensorLy Library':tl_w.astype(int)}).T
0 | 1 | 2 | |
---|---|---|---|
My Implementation | 33487 | 23483 | 21192 |
TensorLy Library | 33455 | 23515 | 21193 |
Considering both used the same random state, the slight difference in results can be explained by the convergence criteria of the TensorLy implementation (TensorLy implementation converged in much fewer iterations than my implementation which indicates it uses a different convergence criteria for its tolerance.)
# Plot of the emission and excitation matrices
ut.plt_emex(cp_fac[1], cp_fac[2])
This decomposition produces three distinct emission and excitation patterns, this approach is very useful in analyzing high dimensional data.
import numpy as np
from numpy.linalg import svd
import tensorly as tl
from tensorly.decomposition import tucker
import scipy.io as sio
import utils_ as ut
ut.sparam()
# Load data
data = sio.loadmat('ATT.mat')['A']
# Compute the Tucker decomposition as A ~= B ._1 U^1 ._2 U^2 ._3 U^3
def tk_als(X, ranks, tol=1e-4, maxiter=50):
i, j, k = X.shape
# Factor matrices X_1, X_2, X_3
X_1 = tl.unfold(X, 0)
X_2 = tl.unfold(X, 1)
X_3 = tl.unfold(X, 2)
# Initialize A~1, A~2, A~3
U = svd(X_1)[0][:, :ranks[0]]
V = svd(X_2)[0][:, :ranks[1]]
W = svd(X_3)[0][:, :ranks[2]]
# Initialize Error
error, err_1 = tol + 1, 2*(tol + 1)
iter = 1
while error > tol and iter < maxiter:
# Update U
Y = ut.ndot(ut.ndot(X, V.T, mode=1), W.T, mode=2)
U_ = svd(tl.unfold(Y, 0))[0][:i, :ranks[0]]
U_e = np.sum(U - U_)
U = U_
# Update V
Y = ut.ndot(ut.ndot(X, U.T, mode=0), W.T, mode=2)
V_ = svd(tl.unfold(Y, 1))[0][:j, :ranks[1]]
V_e = np.sum(V - V_)
V = V_
# Update W
Y = ut.ndot(ut.ndot(X, U.T, mode=0), V.T, mode=1)
W_ = svd(tl.unfold(Y, 2))[0][:k, :ranks[2]]
W_e = np.sum(W - W_)
W = W_
iter += 1
err_ = np.abs(U_e + V_e + W_e)
error = np.abs(err_1 - err_)
err_1 = err_
G = ut.ndot(ut.ndot(ut.ndot(
X, U.T, mode=0),
V.T, mode=1),
W.T, mode=2)
tuck_err = ut.err(X, G)
print('Tucker Decomposition converged in %d iterations' % iter)
print('Relative Error: ~%.5f' % tuck_err)
return G, (U, V, W)
tk_cor, tk_fac = tk_als(data, [30, 30, 20])
Tucker Decomposition converged in 5 iterations Relative Error: ~0.04244
# Tensorly Tucker Decomposition for Comparison
tl_cor, tl_fac = tucker(
data.astype(float), [30, 30, 20], verbose=1)
reconstruction error=0.04243982149101271, variation=7.911678256977694e-11. converged in 2 iterations.
My implementation produces the same exact results as the TensorLy's tucker function.
data_rec = ut.tk_rec(tk_cor, tk_fac)
ut.plt_sbs(data[:, :, 4], "Original", data_rec[:, :, 4], "Reconstructed")
ut.plt_th(data, tk_fac, ut.corth(tk_cor, 10),
ut.corth(tk_cor, 20), ut.corth(tk_cor, 30),
ut.corth(tk_cor, 40), ut.corth(tk_cor, 50))
import scipy.io as sio
import matplotlib.pyplot as plt
from tensorly.decomposition import tucker
from tensorly.tenalg import multi_mode_dot
import utils_ as ut
rs = ut.sparam(True)
# Loading Labels
tst_labs = sio.loadmat('test_lab.mat')['test'].ravel()
trn_labs = sio.loadmat('train_lab.mat')['train'].ravel()
# Importing All Images and Creating Tensors
trn_files = ['CatsBirds\\train%s.jpg' % str(i+1) for i in range(28)]
tst_files = ['CatsBirds\\Test%s.jpg' % str(i+1) for i in range(12)]
trn_tens, tst_tens, trn_tens_G, tst_tens_G = \
ut.l_img(trn_files, tst_files)
# Verifying that the tensors are loaded correctly
ptss = [trn_tens_G[:,:,0], tst_tens_G[:,:,3],
trn_tens[:,:,:,19], tst_tens[:,:,:,9]]
plt.figure(figsize=(3, 3))
for i, pts in enumerate(ptss):
plt.subplot(2, 2, i+1), plt.imshow(pts, cmap='gray')
plt.axis('off')
plt.subplots_adjust(wspace=0, hspace=0), plt.show()
print('Gray Train & Test Shapes: ', trn_tens_G.shape, tst_tens_G.shape,
'\nRGB Train & Test Shapes: ', trn_tens.shape, tst_tens.shape)
Gray Train & Test Shapes: (500, 500, 28) (500, 500, 12) RGB Train & Test Shapes: (500, 500, 3, 28) (500, 500, 3, 12)
Part 1. Classification of Grayscale Image Tensor
_, trn_factG = tucker(trn_tens_G, rank=[10,10,28], random_state=rs)
transf_G = [trn_factG[0].T, trn_factG[1].T]
tst_featG, trn_featG =\
multi_mode_dot(tst_tens_G, transf_G, [0,1]),\
multi_mode_dot(trn_tens_G, transf_G, [0,1])
trn_featG, tst_featG =\
trn_featG.reshape(100,28).T,\
tst_featG.reshape(100,12).T
%%time
bg_paramsG = {'n_estimators': [14,15,16],
'max_features': [0.6,0.65,0.7]}
ut.tBagger(trn_featG, tst_featG, trn_labs,
tst_labs, bg_paramsG, rs)
Best Parameters: {'max_features': 0.65, 'n_estimators': 15} Best Score: 0.9940 (4-fold Stratified CV Repeated 6 times.) Test Classification Report precision recall f1-score support 0 0.86 1.00 0.92 6 1 1.00 0.83 0.91 6 accuracy 0.92 12 macro avg 0.93 0.92 0.92 12 weighted avg 0.93 0.92 0.92 12 CPU times: total: 1.28 s Wall time: 1.97 s
Part 2. Classification with RGB Image Tensor
_, trn_fact = tucker(trn_tens.astype(float), rank=[10,10,3,28], random_state=rs)
transf_ = [trn_fact[0].T, trn_fact[1].T, trn_fact[2].T]
tst_feat, trn_feat =\
multi_mode_dot(tst_tens, transf_, [0,1,2]),\
multi_mode_dot(trn_tens, transf_, [0,1,2])
trn_feat, tst_feat =\
trn_feat.reshape(300,28).T,\
tst_feat.reshape(300,12).T
%%time
bg_params = {'n_estimators': [12,13,14],
'max_features': [0.55,0.6,0.65]}
ut.tBagger(trn_feat, tst_feat, trn_labs,
tst_labs, bg_params, rs)
Best Parameters: {'max_features': 0.6, 'n_estimators': 13} Best Score: 1.0000 (4-fold Stratified CV Repeated 6 times.) Test Classification Report precision recall f1-score support 0 0.86 1.00 0.92 6 1 1.00 0.83 0.91 6 accuracy 0.92 12 macro avg 0.93 0.92 0.92 12 weighted avg 0.93 0.92 0.92 12 CPU times: total: 688 ms Wall time: 1.3 s
It appears that our model does not have enough training with very white birds (in all cases, it misclassified the white bird as cat since there are a few white cats in the training data). But adding colors improved our average training accuracy to 100% (average of best param model with 24 different training data split configurations).
1 import matplotlib.pyplot as plt 2 import numpy as np 3 from numpy.linalg import norm 4 import warnings 5 from tensorly.tenalg import mode_dot 6 from sklearn.ensemble import BaggingClassifier 7 from sklearn.tree import DecisionTreeClassifier 8 from sklearn.model_selection import RepeatedStratifiedKFold 9 from sklearn.model_selection import GridSearchCV 10 from sklearn.metrics import classification_report 11 import tensorly as tl 12 13 14 15 16 # Setting up default parameters. 17 def sparam(rs=False): 18 warnings.filterwarnings("ignore") 19 plt.rcParams['figure.dpi'] = 144 20 np.random.seed(40) 21 if rs: return 40 22 23 def ndot(a,b,mode): return mode_dot(a,b,mode); 24 25 ####_________________ T2 plot functions _________________#### 26 def plt_score(score1, title1, score2, title2, y, y_title): 27 y_normd = (y - np.mean(y, axis=0)) / np.std(y, axis=0) 28 _ = plt.figure(figsize=(9, 5)) 29 plt.subplot(131), plt.title(title1), plt.imshow(score1) 30 plt.subplot(132), plt.title(title2), plt.imshow(score2) 31 plt.subplot(133), plt.title(y_title), plt.imshow(y_normd) 32 plt.suptitle('Score Matrices', size=15) 33 plt.show() 34 35 def plt_emex(em, ex): 36 plt.figure(figsize=(10, 3)) 37 plt.subplot(121), plt.title('Emissions') 38 _ = plt.plot(np.arange(250, 451), em), plt.xlabel('nm') 39 plt.subplot(122), plt.title('Excitations') 40 _ = plt.plot(np.arange(240, 301), ex), plt.xlabel('nm') 41 plt.show() 42 ####_____________________________________________________#### 43 44 45 ####________________ T3 helper functions ________________#### 46 def tk_rec(core, fact): 47 X_rec = ndot(ndot(ndot(core, 48 fact[0], mode=0), 49 fact[1], mode=1), 50 fact[2], mode=2) 51 return X_rec 52 53 def plt_sbs(img1, title1, img2, title2): 54 plt.figure(figsize=(6, 5)) 55 plt.subplot(121), plt.title(title1) 56 plt.imshow(img1, cmap='gray'), plt.axis('off') 57 plt.subplot(122), plt.title(title2) 58 plt.imshow(img2, cmap='gray'), plt.axis('off') 59 plt.show() 60 61 def corth(core, perc): 62 core_th = np.where(np.abs(core) > np.percentile( 63 np.abs(core), 100-perc), core, 0) 64 return core_th 65 66 def err(X, core): 67 return np.sqrt((norm(X)**2 - norm(core)**2))/norm(X) 68 69 def plt_th(X, fact, c1t, c2t, c3t, c4t, c5t): 70 err_ = err(X, c1t), err(X, c2t), err(X, c3t), \ 71 err(X, c4t), err(X, c5t) 72 ptt = tk_rec(c1t,fact), tk_rec(c2t,fact), tk_rec( 73 c3t, fact), tk_rec(c4t, fact), tk_rec(c5t, fact) 74 fig, ax = plt.subplots(3, 5, figsize=(9, 7)) 75 for i in range(5): 76 for j in range(3): 77 ax[j, i].imshow(ptt[i][:,:,j*5+4], cmap='gray', 78 vmin=0, vmax=255) 79 ax[j, i].axis('off') 80 ax[0, i].set_title("Top %d0%% Kept\nError: %.4f" % 81 (i+1,err_[i])) 82 fig.tight_layout() 83 fig.show() 84 85 ####_____________________________________________________#### 86 87 88 ####________________ T4 helper functions ________________#### 89 90 def l_img(trn, tst): 91 trn_tens, tst_tens = [], [] 92 trn_tens_G, tst_tens_G = [], [] 93 for file in trn: 94 img = plt.imread(file) 95 trn_tens.append(img.T) 96 trn_tens_G.append(toG(img).T) 97 for file in tst: 98 img = plt.imread(file) 99 tst_tens.append(img.T) 100 tst_tens_G.append(toG(img).T) 101 trn_tens = np.array(trn_tens).T 102 tst_tens = np.array(tst_tens).T 103 trn_tens_G = np.array(trn_tens_G).T 104 tst_tens_G = np.array(tst_tens_G).T 105 106 return trn_tens, tst_tens, trn_tens_G, tst_tens_G 107 108 def toG(img): 109 return np.dot(img, [0.299, 0.587, 0.114]) 110 111 def tBagger(trn_feat, tst_feat, trn_labs, tst_labs, 112 bg_params, rs=42): 113 rskf = RepeatedStratifiedKFold( 114 n_splits=4, n_repeats=6, random_state=rs) 115 dt_clf = DecisionTreeClassifier( 116 max_depth=5, min_samples_leaf=2, random_state=rs) 117 bg_clf = BaggingClassifier( 118 base_estimator=dt_clf, n_jobs=-1, 119 warm_start=True, random_state=rs) 120 121 # Training and Testing Bagging 122 clf_BG_GS = GridSearchCV(bg_clf, bg_params, cv=rskf, 123 scoring='accuracy', n_jobs=-1) 124 125 clf_BG_GS.fit(trn_feat[:,1:], trn_labs) 126 print('Best Parameters: ', clf_BG_GS.best_params_) 127 print('Best Score: %.4f' % clf_BG_GS.best_score_, 128 '(4-fold Stratified CV Repeated 6 times.)\n') 129 tst_pred = clf_BG_GS.predict(tst_feat[:,1:]) 130 print('Test Classification Report\n', 131 classification_report(tst_labs, tst_pred)) 132 ####_____________________________________________________#### 133 134