Jack Glenn

Vision Transformer

This project was a comprehensive study and implementation of the Vision Transformer(Vit) architecture as proposed in the Google paper from 2020. This project was undertaken to both understand the general transformer model and its use when applied to the computer vision task of image classifcation. The transformer model itself was originally used for natural language processing tasks and would later go on to be the base for models like ChatGPT.