# RNN for text prediction

![](https://637078585-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYsi-h_n0zY_8MKKgyu%2Fuploads%2Fgit-blob-f70ae0c42edd48c12e24f24fbcefd973fc1d3c12%2Frnn-3.png?alt=media)

* Input text: “the cat sat on the ma”
* Question: what is the next char?
* RNN outputs a distribution over the chars.

![](https://637078585-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYsi-h_n0zY_8MKKgyu%2Fuploads%2Fgit-blob-20bf2a54b1f956fb0068f8357f4b255aa469af21%2Frnn-4.png?alt=media)

* Sample a char from it; we may get ‘t’.
* Take “the cat sat on the mat” as input.
* Maybe the next char is period ‘.’.

## Training

* Cut text to segments (with overlap). E.g., seg\_len=40 and stride=3.
  * Partition text to (segment, next\_char) pairs.
* A segment is used as input text.
* Its next char is used as label.
* One-hot encode the characters.
  * Character -> $$v \times 1$$ vector.
  * Segment $$l \times v$$ matrix.
* Training data: (segment, next\_char) pairs
* It is a multi-class classification problem. #class = #unique chars.
