Jinwoo Park, thank you for your comment!

2 min readJun 8, 2019

Below are the answers.
Q1. What are the 2 small squares at the top of each cells in the figures? Are they ‘hidden state’ and ‘output’?
A.
* A small square is an LSTM cell. This cell is explained in Colah’s blog http://colah.github.io/posts/2015-08-Understanding-LSTMs/. One green square in Colah’s blog is equivalent to one small square in Fig. 2.3. Each horizontal slice of 3 cells is the same as the connected green squares in Colah’s blog. In an LSTM layer of size 128, we will have 128 x 3 squares. In Fig. 2.3 I only show two squares on the top to keep the diagram simpler.

* The square is an LSTM cell. The yellow arcs are its output. The output can have both: hidden state and cell state. But generally only the hidden state is outputted. In Keras, the cell state can also be outputted with `stateful=True` argument. See this https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/ for more.

Q2. I am curious to know if the ‘encoded vector’ in the Figure 2.3 is ‘output’ or ‘hidden state’ of the previous layer.
A. Yes, it is the hidden state. The typical output of an LSTM cell.

Q3. Shouln’t the encoded vector in the Figure 2.3 should be without those 2 squares at the top?
A. To make it clearer I updated Fig. 2.3. The encoded vector is the output (the hidden state) of the prior LSTM layer.

Q4. Compared to the input’s number of features, layers’ cells have much bigger space. Why don’t they just copy the inputs?
A. We can look at this as:
* One input (sample) is spread over the entire layer.
* Similarly the second and all the remaining sample are also spread over the same layer.
* Even if the feature space of the layer is larger than the input size, the inputs are not isolated to any specific part of the layer. Meaning the inputs are not separated in the layer.
* Therefore, even with larger layer size, we do not perfectly mimic the inputs.

I believe what you are asking is we can have the LSTM layer values as block identity matrices. I am not sure if my understanding is correct. But if this is correct, this is more likely to happen in a Dense layer.

Please feel free to leave a comment if you have further questions.

Written by Chitta Ranjan

No responses yet