BERT
BERT is a neural network developed by Jacob Devlin et.al. (Google) back in 2018. It improves performance nerual nets on natural language processing tasks significantly when compared to most other network types, including previous leader - Recurrent Neural Network Architectures. BERT addresses such RNN issues as handling long sequences of text, scalability and parallelism. BERT resolved those by introducing a special type of architecture called Transformers. Transformers apply positional encoding and attention to build outputs. Positional encoding deals with encoding word order information into the data itself. Attention determines relationship between every single word in the input and establish how it relates to each words in the output. This is something that's learned from data by seeing many examples. BERT stands for: Bidirectional - which means it uses left/right context (i.e. the whole input, not just preceding or following words) when dealing with a word Encoder Representa