Homework for Course MSBD5004 Mathematical Methods for Data Analysis, BDT, HKUST

# Question 1

Consider the vector space $R^{n}$

(a)

Check that $\|x\|_{\infty}=\max_{1 \leq i \leq n} |x_{1}|$ is indeed a norm on $\mathbb{R}^n$

According to the definition of norm , we check

(1)
\begin{aligned} \| x\|_{\infty} =\max _{1 \leq i \leq n}\left|x_{i}\right| \geq 0, \forall x \in \mathbb{R}^{n}\end{aligned}

and $\| x\|_{\infty} = 0 \Leftrightarrow |x_i|=0 \Leftrightarrow x=\overrightarrow{0}$

(2)
\begin{aligned} \|\alpha x\|_{\infty}=\max _{1 \leq i \leq n}|a x_{i}|=|\alpha| \cdot \max _{1 \leq i \leq n}\left|x_{i}\right|=|\alpha| \cdot \| x\|_{\infty}, \forall x \in \mathbb{R}, \alpha \in \mathbb{R}\end{aligned}

(3)
\begin{aligned}\|x+y\|_{\infty}=\max _{1 \leq i \leq n}\left|x_{i}+y_{i}\right| =\max_{1 \leq i \leq n} \left| x_i \right| +\max_{1 \leq i \leq n}\left| y_{i}\right|=\|x\|_{\infty}+\|y\|_{\infty}, \quad \forall x, y \in \mathbb{R}^{n}\end{aligned}

Therefore, $\|x\|_{\infty}=\max _{1 \leq i \leq n}\left|x_{i}\right|$ is a norm on $\mathbb{R}^{n}$

(b)

Prove that: for any $x \in \mathbb{R}^{n}, \quad\|x\|_{\infty}= lim_{p \rightarrow \infty} \|x\|_p$

Prove:

\begin{aligned} &\|x\|_{p}=\left(\sum_{i=1}^{n}\left|x_{i}\right|^{p}\right)^{1 / p} \\ &=\left(\left|x_{1}\right|^{p}+\left|x_{x}\right|^{p}+\cdots+\left|x_{n}\right|^{p}\right)^{1 / p} \end{aligned}

Let

$$\left|x_{m}\right|=\max \left\{\left|x_{1}\right|,\left|x_{2}\right|, \cdots,\left|x_{n}\right|\right\}$$

Because

$$\frac{\left|x_{1} \right|}{\left|x_{m} \right|}, \frac{\left|x_{2} \right|}{\left|x_{m}\right|}, \ldots, \frac{\left|x_{n}\right|}{\left|x_{m}\right|}<1$$

Then we can get this inequality

$$1 \cdot |x_m| \leq \|x\|_{p}=\left[\left(\frac{x_1}{x_{m}}\right)^{p}+\left(\frac{x_2}{x_{m}}\right)^{p}+ \cdots + \left(\frac{x_n}{x_{m}}\right)^{p} \right]^{\frac{1}{p}} \cdot |x_m| \leq (n)^{\frac{1}{p}} \cdot |x_m|$$

when $p \rightarrow \infty, (n)^{\frac{1}{p}} \rightarrow 1$ so $\| x \|_p \rightarrow |x_m|$

Finally
$\lim _{p \rightarrow \infty}\|x\|_{p}=\left|x_{m}\right|=\max _{1 \leq i \leq n}\left|x_{i}\right| = \|x\|_\infty$. Hence proved.

(c)

Prove the equivalence
$$\|x\|_{\infty} \leqslant \| x\left\|_{1} \leqslant n\right\|x \|_{\infty}, \forall x \in \mathbb{R}^{n}$$

Prove: According to the definition

Obviously, $\max_{1 \leq i \leq n} \left|x_{i}\right|$ is taken from$\left\{\left|x_{i}\right|,\left|x_{2}\right|, \cdots,\left|x_{n}\right|\right\}$

So, $\| x \|_{1}=\sum_{i=1}^{n}\left|x_{1}\right| \geqslant \max _{1 \leq i \leq n}\left|x_{i}\right|= \|x\|_\infty$

Then

$$\left|x_{1}\right| \leq \max_{1 \leq i \leq n} |x_i|\\ \left|x_{2}\right| \leq \max_{1 \leq i \leq n} |x_i|\\ \vdots \\ \left|x_{n}\right| \leq \max_{1 \leq i \leq n}|x_i|$$

Therefore $\sum_{i=1}^{n}\left|x_{i}\right|=n \cdot \max_{1 \leq i \leq n} |x_i|=n \cdot\|x\|_{\infty}$

Altogether, $\|x\|_{\infty} \leqslant\|x\|_{1} \leqslant n \cdot\|x\|_{\infty}$

# Question 2

For any $A \in \mathbb{R}^{m \times n}$, we have defined

$$\|A\|_{2}=\sup _{x \in \mathbb{R}^{n}, x \neq 0} \quad \frac{\|A x\|_{2}}{\|x\|_{2}}$$

(a)

Prove that

$$\|A\|_{2}=\max _{x \in \mathbb{R}^n, \|x\|_2=1}\|A x\|_{2}$$

Prove :

\begin{aligned} \|A\|_2 &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\frac{\|A x\|_{2}}{\|x\|_{2}}\\ &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\left\|\frac{A x}{\|x\|_{2}}\right\|_{2} \\ &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\left\|A \cdot \frac{x}{\|x\|_{2}}\right\|_{2} \end{aligned}

Note that $\frac{x}{\|x\|_2}$ has unit length. So we can express

$$\|A\|_2=\sup _{x \in \mathbb{R}^{n},\|x\|_{2}=1} \|A x \|_{2}$$

However, $\|A\|_2$ is a continuous function of $x$ and the unit ball $\|x\|_{2}=1$ is closed and bounded. Now, on a closed and bounded set, a continuous function always achieves its maximum and minimum values.

Hence, the 'sup' can be replaced by 'max'. i.e.

$$\|A\|_{2}=\max _{x \in R^{n},\|x\|_{2}=1}\|A x\|_{2}$$

(b)

Prove that $\|\cdot\|_2$ is a norm on $\mathbb{R}^{m \times n}$

Prove: According to the definition of norm we check

(1) $\forall A \in \mathbb{R}^{m \times n}$

\begin{aligned} \|A\|_{2}=\max _{x \in \mathbb{R}^n, \|x\|_2=1}\|A x\|_{2} \geqslant 0 \end{aligned}

when \begin{aligned}\max _{x \in \mathbb{R}^n, \|x\|_2=1}\|A x\|_{2} = 0\end{aligned} with $\|A x\|_{2} \geq 0$, we have

$$\|A x\|_{2} = 0, s.t. \|x\|_2=1, x \in \mathbb{R}^n \Rightarrow A=O$$

Thus, $\|A x\|_{2} = 0 \Leftrightarrow A=O$

(2) $\forall \alpha \in \mathbb{R}, A \in \mathbb{R}^{m \times n}$

\begin{aligned} \|\alpha A\|_{2} &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|\alpha A x\|_{2} \\ &= |\alpha| \cdot \max _{x \in \mathbb{R}^n, \|x\|_2=1} \|A x\|_{2}\\ &= |\alpha| \cdot \|A\|_2 \end{aligned}

(3) $\forall A,B \in \mathbb{R}^{m \times n}$

\begin{aligned} \|A + B\|_{2} &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|(A+B)x\|_{2} \\ &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Ax+Bx\|_{2} \\ &\leq \max _{x \in \mathbb{R}^n, \|x\|_2=1} \left(\|Ax\|_{2} + \|Bx\|_{2}\right) \\ &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Ax\|_{2} + \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Bx\|_{2}\\ &= \|A\|_2 + \|B\|_2 \end{aligned}

Altogether, $\|\cdot\|_{2}$ is a norm on $\mathbb{R}^{m \times n}$

(c)

Prove that $\|A x\|_{2} \leqslant\|A\|_{2}\|x\|_{2}$ for any $A \in \mathbb{R}^{n \times n}$ and $x \in \mathbb{R}^{n}$

Prove: Consider the definition of matrix 2-norm, $\forall A \in \mathbb{R}^{m \times n}$

$$\|A\|_{2}=\sup _{x \in \mathbb{R}^{n}, x \neq 0} \quad \frac{\|A x\|_{2}}{\|x\|_{2}}$$

It reveals that

$$\|A x\|_{2} \leqslant \|A\|_2 \cdot \|x\|_{2}$$

Hence proved

(d)

Prove that $\|A B\|_{2} \leqslant\|A\|_{2} \cdot\|B\|_{2}$ for all $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$

Prove: In Question (c) we have proved $\|A x\|_{2} \leqslant\|A\|_{2}\|x\|_{2}$, $A \in \mathbb{R}^{n \times n}$, $x \in \mathbb{R}^{n}$

use this inequality twice. Let $B \in \mathbb{R}^{n \times p}, y \in \mathbb{R}^p$

\begin{aligned} \|A B y\|_{2} &=\|A(B y)\|_{2} \\ &\leqslant\|A\|_{2} \cdot\|B y\|_{2} \\ &\leqslant\|A\|_{2} \cdot \|B\|_{2} \cdot\|y\|_{2} \end{aligned}

Then for $y \neq 0$

$$\begin{array}{l} \frac{\|A B y\|_{2}}{\|y\|_{2}} \leqslant\|A\|_{2} \cdot \|B\|_2 \\ \left\|A B \cdot \frac{y}{\|y\|_{2}}\right\|_{2} \leqslant\|A\|_{2} \cdot\|B\|_{2} \end{array}$$

Hence, $\|A B\|_{2} \leqslant\|A\|_{2} \cdot \| B \|_{2},$ proved

# Question 3

Let $a_{1}, a_{2}, \cdots, a_{m}$ be the $m$ given real numbers. Prove that a median of $a_{1}, a_{2}, \dots, a_{m}$ minimizes

$$|a_{1} - b|+\left|a_{2}-b\right|+\cdots+\left|a_{m}-b\right|$$

over all $b \in R$

Prove: To prove a median can minimize $\left|a_{1}-b\right|+\left|a_{2}-b\right|+\dots+\left|a_{m}-b\right|$ is equivalent to prove a median minimizes

$$\text { Let } f(b)=\left(a_{1}-b\right)^{2}+\left(a_{2}-b\right)^{2}+\cdots+\left(a_{n}-b\right)^{2}$$

\begin{aligned} f^{\prime}(b)&=-2\left[\left(a_{1}-b\right)+\left(a_{2}-b\right)+\cdots+\left(a_{m-b}\right)\right] \\ &=-2\left[\left(a_{1}+a_{2}+\dots+a_{m}\right)-m b\right] \end{aligned}

Make $f^{\prime}(b)=0, \text { get } b=\frac{1}{m}\left(a_{1}+a_{2}+\dots+a_{m}\right)$

Due to $f^{\prime \prime}(t)=2 m>0$

Therefore when $b=\frac{1}{m}\left(a_{1}+a_{2}+\dots+a_{m}\right), f(b)$ achieves its minimum
i.e. the median minimizes $\left|a_{1}-b\right|+\left|a_{2}-b\right|+\dots+\left|a_{m}-b\right|$

# Question 4

Suppose that the vectors $x_{1}, \ldots . x_{N}$ in $\mathbb{R}^{n}$ are clustered using the K-means algorithm, with group representative $z_1, \cdots, z_{k}$

(a) Suppose the original vectors $x_i$ are nonnegative. i.e. their entries are nonnegative. Explain why the representatives $z_j$ output by the k-means algorithm are also nonnegtive.

Answer: we see the algorithm step 2

Fix group $G_{1}, G_{2}, \cdots, G_{k}$ and solve

$$\min _{z_1, \cdots, z_k} \sum_{j=1}^{k}\left(\sum_{i \in G_j}\left\|x_{i}-z_{j}\right\|_{2}^{2}\right)$$

$$\text { that } \quad z_{j}=\frac{1}{\left|G_{j}\right|}\left(\sum_{i \in G_j} x_{i}\right)$$

Because $x_i$ are nonnegative, according to the equation, $z_j$ is also nonnegative.

(b) Suppse the original vector $x_i$ represent proportions, i.e. their entries are nonnegative and sum to one. Explain why the representatives
$z_j$ output by k-means algorithms are also represent proportions.

Answer: Still consider the equation above, rewrite it to vector form

$$z_{i}=\frac{1}{\left|G_{i}\right|} \sum_{i \in G_{j}}\left(\begin{array}{l} x_{i1} \\ x_{i2} \\ \vdots \\ x_{in} \end{array}\right)$$

$$=\frac{1}{\left|G_{i}\right|} \left(\begin{array}{c} \sum_{i \in G_{j}} x_{i1} \\ \sum_{i \in G_{j}} x_{i2} \\ \vdots\\ \sum_{i \in G_{j}} x_{in} \end{array}\right)$$

For $x_{i}+x_{i 2}+\cdots+x_{i n}=1, \quad \sum_{i \in G_{j}} x_{i1}+\sum_{i \in G_{j}} x_{i2}+\cdots+ \sum_{i \in G_{j}} x_{in} = |G_j|$

We can get

$$\frac{1}{\left|G_{i}\right|} \left(\sum_{i \in G_{j}} x_{i1}+\sum_{i \in G_{j}} x_{i2}+\cdots+ \sum_{i \in G_{j}} x_{in}\right)=1$$

and $z_j$ nonnegstive is obvious. Hence proved.

(c) Suppose the original vector $x_i$ are Boolean. i.e. their entries are either 0 or 1. Give an interpretation of $(z_j)_i$, the i-th entry of the $j$ group representative.

• If $(z_j)_i = 0$, it means in group $j$, the number of vectors with i-th entry ZERO is more than those with i-th entry ONE
• If $(z_j)_i = 1$ it means in group $j$, the number of vectors with i-th entry ONE is more that those with i-th entry ZERO