>

Over-engineering 5x Faster Set Intersections in SVE2, AVX-512, & NEON

Set intersections are one of the standard operations in databases and search engines. They are used in: TF-IDF ranking in search engines, Table Joins in OLAP databases, Graph Algorithms. Chances are, you rely on them every day, but you may not realize that they are some of the most complex operations to accelerate with SIMD instructions. SIMD instructions make up the majority of modern Assembly instruction sets on x86 and Arm....

September 16, 2024 · 25 min · 5120 words · Ash Vardanian

35% Discount on Keyword Arguments in Python

Python has a straightforward syntax for positional and keyword arguments. Positional arguments are arguments passed to a function in a specific order, while keyword arguments are passed to a function by name. Surprising to most Python developers, the choice of positional vs keyword arguments can have huge implications on readability and performance. Let’s take the cdist interface as an example. It’s a function implemented in SimSIMD, mimicking SciPy, that computes all pairwise distances between two sets of points, each represented by a matrix....

September 8, 2024 · 16 min · 3203 words · Ash Vardanian

NumPy vs BLAS: Losing 90% of Throughput

Downloaded over 5 Billion times, NumPy is the most popular library for numerical computing in Python. It wraps low-level HPC libraries like BLAS and LAPACK, providing a high-level interface for matrix operations. BLAS is mainly implemented in C, Fortran, or Assembly and is available for most modern chips, not just CPUs. BLAS is fast, but bindings aren’t generally free. So, how much of the BLAS performance is NumPy leaving on the table?...

March 12, 2024 · 9 min · 1729 words · Ash Vardanian

The Painful Pitfalls of C++ STL Strings

Criticizing software is easy, yet the C++ and C standard libraries have withstood the test of time admirably. Nevertheless, they are not perfect. Especially the <string>, <string_view>, and <string.h> headers. The first two alone bring in over 20,000 lines of code, slowing the compilation of every translation unit by over 100 milliseconds. Most of that code seems dated, much slower than LibC, and equally error-prone, with interfaces that are very hard to distinguish....

February 12, 2024 · 11 min · 2262 words · Ash Vardanian

USearch Molecules: Searchable Dataset of 28 Billion Chemical Embeddings on AWS ⚗️

TLDR: I’ve finally finished a project that involved gathering 7 billion small molecules, each represented in SMILES notation and having fewer than 50 “heavy” non-hydrogen atoms. Those molecules were “fingerprinted”, producing 28 billion structural embeddings, using MACCS, PubChem, ECFP4, and FCFP4 techniques. These embeddings were indexed using Unum’s open-source tool USearch, to accelerate molecule search. This extensive dataset is now made available globally for free, thanks to a partnership with AWS Open Data....

November 20, 2023 · 15 min · 3183 words · Ash Vardanian

Binding a C++ Library to 10 Programming Languages 🔟

Experienced devs may want to skip the intro or jump immediately to the conclusions. The backbone of many foundational software systems — from compilers and interpreters to math libraries, operating systems, and database management systems — is often implemented in C and C++. These systems frequently offer Software Development Kits (SDKs) for high-level languages like Python, JavaScript, GoLang, C#, Java, and Rust, enabling broader accessibility. But there is a catch....

November 9, 2023 · 16 min · 3309 words · Ash Vardanian

Python, C, Assembly - 2'500x Faster Cosine Similarity 📐

In this fourth article of the “Less Slow” series, I’m accelerating Unum’s open-source Vector Search primitives used by some great database and cloud providers to replace Meta’s FAISS and scale-up search in their products. This time, our focus is on the most frequent operation for these tasks - computing the the Cosine Similarity/Distance between two vectors. It’s so common, even doubling it’s performance can have a noticeable impact on applications economics....

October 30, 2023 · 17 min · 3465 words · Ash Vardanian

GCC Compiler vs Human - 119x Faster Assembly 💻🆚🧑‍💻

When our Python code is too slow, like most others we switch to C and often get 100x speed boosts, just like when we replaced SciPy distance computations with SimSIMD. But imagine going 100x faster than C code! It sounds crazy, especially for number-crunching tasks that are “data-parallel” and easy for compilers to optimize. In such spots the compiler will typically “unroll” the loop, vectorize the code, and use SIMD instructions to process multiple data elements in parallel....

October 23, 2023 · 6 min · 1192 words · Ash Vardanian

Accelerating JavaScript arrays by 10x for Vector Search 🏹

You’ve probably heard about AI a lot this year. Lately, there’s been talk about something called Retrieval Augmented Generation (RAG). Unlike a regular chat with ChatGPT, RAG lets ChatGPT search through a database for helpful information. This makes the conversation better and the answers more on point. Usually, a Vector Search engine is used as the database. It’s good at finding similar data points in a big pile of data. These data points are often at least 256-dimensional, meaning they have many Number-s....

October 21, 2023 · 11 min · 2247 words · Ash Vardanian

Our CPython bindings got 5x faster without PyBind11 🐍

Python’s not the fastest language out there. Developers often use tools like Boost.Python and SWIG to wrap faster native C/C++ code for Python. PyBind11 is the most popular tool for the job not the quickest. NanoBind offers improvements, but when speed really matters, we turn to pure CPython C API bindings. With StringZilla, I started with PyBind11 but switched to CPython to reduce latency. The switch did demand more coding effort, moving from modern C++17 to more basic C99, but the result is a 5x lower call latency!...

October 10, 2023 · 12 min · 2556 words · Ash Vardanian

SciPy distances... up to 200x faster with AVX-512 & SVE 📏

Over the years, Intel’s 512-bit Advanced Vector eXtensions (AVX-512) stirred extensive discussions. While introduced in 2014, it wasn’t until recently that CPUs began providing comprehensive support. Similarly, Arm Scalable Vector Extensions (SVE), primarily designed for Arm servers, have also started making waves only lately. The computing landscape now looks quite different with powerhouses like Intel’s Sapphire Rapids CPUs, AWS Graviton 3, and Ampere Altra entering the fray. Their arrival brings compelling advantages over the traditional AVX2 and NEON extensions:...

October 7, 2023 · 13 min · 2607 words · Ash Vardanian

Combinatorial Stable Marriages for DBMS Semantic Joins 💍

How can the 2012 Nobel Prize in Economics, Vector Search, and the world of dating come together? What are the implications for the future of databases? And why do Multi-Modal AI model evaluation datasets often fall short? Synopsis: Stable Marriages are generally computed from preference lists. Those consume too much memory. Instead, one can dynamically recalculate candidate lists using a scalable Vector Search engine. However, achieving this depends on having high-quality representations in a shared Vector Space....

July 18, 2023 · 10 min · 2102 words · Ash Vardanian

StringZilla: 5x faster strings with SIMD & SWAR 🦖

A few years back, I found a simple trick in tandem with SIMD intrinsics to truly unleash the power of contemporary CPUs. I put the strstr of LibC and the std::search of the C++ Standard Templates Library to the test and hit a throughput of around 1.5 GB/s for substring search on a solitary core. Not too shabby, right? But imagine, that the memory bandwidth could theoretically reach a striking 10-15 GB/s per core....

July 10, 2023 · 9 min · 1816 words · Ash Vardanian

Abusing Vector Search for Texts, Maps, and Chess ♟️

Vector Search is hot! Everyone is pouring resources into a seemingly new and AI-related topic. But are there any non-AI-related use cases? Are there features you want from your vector search engine, but are too afraid to ask? Last week was 🔥 for vector search. Weaviate raised $50M, and Pinecone raised $100M... That's a lot and makes you believe that vector search is hard. But it's not. I have spent the last few days implementing a single-file vector search engine....

May 9, 2023 · 10 min · 2077 words · Ash Vardanian

Counting Strings in C++: 30x Throughput Difference 💬

Some of the most common questions in programming interviews are about strings - reversing them, splitting, joining, counting, etc. These days, having to interview more and more developers across the whole spectrum, we see how vastly the solutions, even to the most straightforward problems, differ depending on experience. Let’s imagine a test with the following constraints: You must find the first occurrence of every unique string in a non-empty array. You are only allowed to use the standard library, no other dependencies....

May 9, 2023 · 8 min · 1622 words · Ash Vardanian

We went through life with a smile 💔

We went through life with a smile. Now I am smiling through tears, alone. Yesterday was the memorial service. One week ago, I didn’t know what that meant. Yesterday I was sitting next to a coffin with the love of my life and our daughter in it. Today I must share their story. Sona There was a girl no one knew. Some have seen her, and some talked to her. Some were friends, and some were relatives, but she was so much more than anyone could have imagined....

April 29, 2022 · 15 min · 3031 words · Ash Vardanian

Mastering C++ with Google Benchmark ⏱️

There are only two kinds of languages: the ones people complain about and the ones nobody uses. – Bjarne Stroustrup, creator of C++. Very few consider C++ attractive, and only some people think it’s easy. Choosing it for a project generally means you care about the performance of your code. And rightly so! Today, machines can process hundreds of Gigabytes per second, and we, as developers, should all learn to saturate those capabilities....

March 4, 2022 · 14 min · 2928 words · Ash Vardanian

Failing to Reach DDR4 Bandwidth 🚌

A bit of history. Not so long ago, we tried to use GPU acceleration from Python. We benchmarked NumPy vs CuPy in the most common number-crunching tasks. We took the highest-end desktop CPU and the highest-end desktop GPU and put them to the test. The GPU, expectedly, won, but not just in Matrix Multiplications. Sorting arrays, finding medians, and even simple accumulation was vastly faster. So we implemented multiple algorithms for parallel reductions in C++ and CUDA, just to compare efficiency....

January 29, 2022 · 6 min · 1215 words · Ash Vardanian

Crushing CPUs with 879 GB/s Reductions in CUDA

GPU acceleration can be trivial for Python users. Follow CUDA installation steps carefully, replace import numpy as np with import cupy as np, and you will often get the 100x performance boosts without breaking a sweat. Every time you write magical one-liners, remember a systems engineer is making your dreams come true. A couple of years ago, when I was giving a talk on the breadth of GPGPU technologies, I published a repo....

January 28, 2022 · 10 min · 1996 words · Ash Vardanian

Apple to Apple Comparison: M1 Max vs Intel 🍏

This will be a story about many things: about computers, about their (memory) speed limits, about very specific workloads that can push computers to those limits and the subtle differences in Hash-Tables (HT) designs. But before we get in, here is a glimpse of what we are about to see. A friendly warning, the following article contains many technical terms and is intended for somewhat technical and hopefully curious readers....

December 21, 2021 · 8 min · 1618 words · Ash Vardanian

Hyperscaler Shopping List: 2022 Data Center Tech Frenzy ☁️

A single software company can spend over 💲10 Billion/year, on data centres, but not every year is the same. When all stars align, we see bursts of new technologies reaching the market simultaneously, thus restarting the purchasing super-cycle. 2022 will be just that, so let’s jump a couple of quarters ahead and see what’s on the shopping list of your favorite hyperscaler! Friendly warning: this article is full of technical terms and jargon, so it may be hard to read if you don’t write code or haven’t assembled computers before....

December 7, 2021 · 15 min · 3003 words · Ash Vardanian

Only 1% of Software Benefits from SIMD Instructions

David Patterson had recently mentioned that (rephrasing): The programmers may benefit from using complex instruction sets directly, but it is increasingly challenging for compilers to automatically generate them in the right spots. In the last 3-4 years I gave a bunch of talks on the intricacies of SIMD programming, highlighting the divergence in hardware and software design in the past ten years. Chips are becoming bigger and more complicated to add more functionality, but the general-purpose compilers like GCC, LLVM, MSVC and ICC cannot keep up with the pace....

November 21, 2021 · 7 min · 1406 words · Ash Vardanian

Artsakh Must Be Independent 🗺️

A full-scale war started between Armenia, Azerbaijan and Turkey on September 27th, 2020. The disputed region of Artsakh or “Nagorno-Karabakh” hasn’t seen such escalation in over two decades. It is an ethnic conflict, that has a religious taste and an almost perpetual political engine keeping it up. It started as an attempt to redistribute power between different nations of the region, but today is used by local dictators to create an illusion of an external enemy....

October 2, 2020 · 8 min · 1588 words · Ash Vardanian

The 7 Sins of Turkish Autocracy 🇹🇷

While the world is too busy fighting COVID, reassembling the economy and preparing for the most turbulent elections in a century, one man certainly had been using that volatility to push his agenda further. As they say, never underestimate the man with the plan. And Tayyip Erdogan certainly has one. A plan that we must all counteract! Disclaimers. The Middle East is one of the most disputed regions in the world....

October 1, 2020 · 9 min · 1771 words · Ash Vardanian

Armenia, Azerbaijan, Turkey. Who's the Aggressor? ⚔️

On September 27, 2020, Azerbaijan attacked civilian populations of Armenia and Artsakh. Recep Erdoğan of Turkey reacted by calling “Armenia the biggest threat to peace in the region”. Please, look at the numbers below and decide for yourself. Armenia Azerbaijan Turkey Population 3 Million 10 Million 82 Million Leader Nikol Pashinyan since 2018 Ilham Aliyev since 2003 (after father) Recep Erdoğan since 2003 GDP 12 Billion $ 47 Billion $ 771 Billion $ Military Budget 0....

September 27, 2020 · 2 min · 295 words · Ash Vardanian

Come to Armenia 🇦🇲

Borders are closed, people are sitting at home, but I bet most of you dream about traveling again. I want to invite you all to my country of origin - Armenia. It has something to offer to every group of people - tourists, entrepreneurs and investors! ...

August 1, 2020 · 8 min · 1680 words · Ash Vardanian

Positive Outlook on the COVID-19 Crisis 😷

The bad news is coming from every direction. Still, I believe there are enough reasons to stay positive and excited about the future! Absurdly, this positive outlook was caused by 2 seemingly negative factors: Since the age of 15, I was convinced that the human race would likely face extinction in the 21st century. I was pretty confident that the threat would come in the form of virus, most likely from China....

March 22, 2020 · 12 min · 2500 words · Ash Vardanian

What's Wrong with WWDC 2016 Keynote?

To introduce myself, I am an iOS and macOS developer, and I use Apple products daily. I like what they usually do, but there is always a catch. It’s no surprise that industry giants should keep raising the bar in technology to save their market shares. And as it always happens, the time comes when giants fall. Luckily, it hasn’t happened yet. However, this 2016 WWDC was still a massive disappointment for me....

June 14, 2016 · 7 min · 1481 words · Ash Vardanian