NP-Curious

Scaling Elections with GPUs and Mojo across Nvidia and AMD 🔥

Last summer, me, Chris Lattner, and a bunch of other people across the industry gathered together for a GPU-programming hackathon at the AGI House in San Francisco. After one too many LLM optimizations, I decided to accelerate something nobody asked for! Most elections use simple plurality voting — whoever gets the most votes wins. But there are “fairer” methods that consider ranked preferences, like the Schulze method used by Wikimedia Foundation, Debian, and pirate parties worldwide. The catch? It scales $O(n³)$ with the number of candidates. ...

Before AI's Kepler Moment - Are LLMs the Epicycles of Intelligence?

I’ve always been fascinated by AI and mega-projects — and as I work on AI infrastructure, you might assume I’m equally fascinated by the current LLM race. In reality, I’m far more skeptical than most. While LLMs are undeniably useful, I’m not convinced “intelligence” is even the right scale to measure them against. The analogy I keep returning to comes not from computer science, but from astronomy: the story of epicycles. ...

USearch Molecules: 28 Billion Chemical Embeddings on AWS ⚗️

TLDR: I’ve finally finished a project that involved gathering 7 billion small molecules, each represented in SMILES notation and having fewer than 50 “heavy” non-hydrogen atoms. Those molecules were “fingerprinted”, producing 28 billion structural embeddings, using MACCS, PubChem, ECFP4, and FCFP4 techniques. These embeddings were indexed using Unum’s open-source tool USearch, to accelerate molecule search. This extensive dataset is now made available globally for free, thanks to a partnership with AWS Open Data. You can find the complete data sheet and scripts for data visualization on GitHub. ...

Combinatorial Stable Marriages for DBMS Semantic Joins 💍

How can the 2012 Nobel Prize in Economics, Vector Search, and the world of dating come together? What are the implications for the future of databases? And why do Multi-Modal AI model evaluation datasets often fall short? Synopsis: Stable Marriages are generally computed from preference lists. Those consume too much memory. Instead, one can dynamically recalculate candidate lists using a scalable Vector Search engine. However, achieving this depends on having high-quality representations in a shared Vector Space. While this already works well for text-based features using modern BERT-like architectures, the quality decreases significantly for Multi-Modal data. This shortcoming, reflected in OpenAI’s CLIP and our own Unum’s UForm, signifies the need for improving modern space-alignment techniques. Their advancement could not only catalyze the integration of AI into databases but also enhance the performance of upcoming Generative Vision-Language models. ...