Adam Colton: University of Utah 2023
me me2

Projects:
A Theory for Coupling Generation and Compression: I formalize two-stage generation as a bilevel optimization problem and offer a theory to improve the alignment between stage 1 models (VAEs VQGANS, tokenizers) and stage 2 models (Diffusion, autoregressive, or other generative models)
Image Self Supervised Learning on a Shoestring : IJEPA-Enhanced: I create a technique for quickly training self supervised image encoders on a single GPU. I optimize for training throughput. The code allows you to train a ViT-S at a rate of 1300 medium sized images per second. I release the training code here: IJEPA-enhanced
IJEPA-Enhanced uses masked latent prediction to train a machine learning model how to 'see'.
source code available
Generative modelling of compressed image file bits: Do you have issues with achieving GPU saturation because of your dataloading load? Don't you wish you could train directly on compressed image files? Say no more! I trained llama to directly generate the bits of a lossy image compression file format called spiht. Check out my report! There will be more coming soon.
source code available
spiht-py: An implementation of the SPIHT algorithm in Rust, with Python bindings SPIHT is an lossy image compression algorithm. Like JPG, it can reduce the amount of bits required to store images. Unlike JPG, the SPIHT bitstream can be interrupted at any point and the entire image decoded. There are no 'blocks' in the SPIHT algorithm.
animation of the spiht algorithm Left: Intermediate decoded image
Right: Intermediate coefficients from the discrete wavelet transform.
Bits per pixel (BPP) are shown in the top left corner
As BPP increases, you can see how the encoder assigns information to coefficients at higher frequencies.
source code available
Your VAE Sucks A short foray into the Forier transform, JPG, Image Autoencoders, and a new image-autoencoder architecture inspired by jpg, that produces latent codes with a left-to-right positional bias.
decoded ski jump animation
LLMs and faithfulness to reasoning (blog post): Humans have written a trove of step-by-step explanations and proofs. These exist on the internet, and some of them end up in the pre-training data for large language models. Thus, some modicum of step-by-stepedness exists in the weights of LLMs....
VQ-CLIP: So, you've heard about CLIP, but have you heard about CLIP but with a
quantized embedding space
??

Rather than using a real number vector as the embedding, VQ-CLIP uses a list of k integer class IDs. The embedding codes from these models can be used for exiting downstream tasks, such as autregressive CLIP embedding generation.

You can find code+models here
Image Retrieval: A Picture is worth 8x8x8 Words: SOTA image retrieval models usually use global real-number embeddings, obtained from big neural networks. But why is there no love for more traditional information retrieval techniques such as BoW? Given a 256x256 image, we encode it into a 8x8x8 matrix of discrete integer tokens. We do this using a ResNet model trained with a learned vector quantization layer. Using these tokens, we use
2D Kgrams
to obtain global vectors containing term frequencies.
2d kgrams diagram
2D kgrams can be used to produce global representations which capture spatial relationships
Our group's poster. Thanks for voting for us!
Gravity Market:
A D3 javascript web app which uses a physics simulation to display the percent change in value of various stocks from the S&P 500. This project was made for Vis for Data Science 2022 taught by Dr. Alexander Lex at the University of Utah. Source code to be added soon!
Songs from Ukraine :
I scraped several hundred gigabytes of social media posts and analyzed the music found in them. Videos were scraped from the Telegram social media platform. Music metadata was retrieved from the videos using Shazam.

Videos were retrieved over the course of months. The scraping process was scheduled using SystemD units on a linux server. CSV metrics on the data were updated asyncronously via file changes. The code for the scraper and data processing was written in Python.
Visualization of a Computationally Derived Fentanyl Binding Protein :
I used PyMol to create an animation of a binder enzyme as it transitions to the bound state. The program 'Climber' is used to interpolate between the bound and apo states.
Ascii Art Latent Masked Transformer Model:
I trained a variational quantized autoencoder on ASCII art. The characters in the art are represented as one hot encoded vectors at each 'pixel' in the string. The discrete latents learned by this model were used to train a bidirectional transformer. The transformer was trained to predict masked latent tokens, akin to MaskGit by Google research.
interpolation of the latent space, the top left corner shows the discrete tokens, The left ascii art is the decoded representation of the current embedding. On the right is the original
Ethminer GUI:
A simple cross platform GUI app written in Rust for the ethminer CLI program. It includes capturing of console output from the program, and asyncronous channel communication using Tokio.

Adventures:
How to go from Užice to Bajina Bašta, a Backpacker's guide Explore the Serbian countryside the simple way by walking
How to go from Romania to Serbia, a Backpacker's guide Discusses a bio-ecological super low emmision community-driven form of transport colloquially known as hitchhiking