what do you think of turbopuffer the vector database

share

Summary of results

GPT-4o
Warning: quote links might not work while streaming
2.

As someone interested in kicking the tires of VectorDBs, where does Pinecone rank? Is there one that will be "future proof"?

3.

This is great! Out of curiosity, what's the difference between choosing a dedicated vector database vs. a traditional database with vector indices (e.g. pgvector with postgres?

4.

Congrats to them!

What have your experiences with vector databases been? I've been using https://weaviate.io/ which works great, but just for little tech demos, so I'm not really sure how to compare one versus another or even what to look for really.

5.

Plus, with all the inflated vc money fueled hype on vector databases, they seem to have the only offering in this space that actually makes sense to me. With them you can store your embeddings close to all the rest of your data, in a single postgres db.

6.

What are your thoughts around the various vector dbs (pinecone, etc)?

DuckDB or pgVector or with all of the work going forward in Arrow being able to already support vectors/arrays, it seems that the specific “vector” class of db is hype/marketing.

7.

There's been some chatter recently about how many vector database options exist these days, whether we've reached the peak, etc. At Fixie we recently evaluated several of the options for our own service, so I thought I'd share our conclusions in an entertaining way. Enjoy!

8.

I think the confusing term is "VectorDB" which sounds like a name of an existing product. "A vector db GUI powered by Postgres"?

9.

I've been in the vector database space for a while (primary author of txtai). I do think vector indexing in traditional databases with tools like pgvector is a good option.

txtai has long had SQLite + Faiss support to enable metadata filtering with vector search. That pattern can take you farther than you think.

The design decisions I've made is to make it easy to plug different backends in for metadata and vectors. For example, txtai supports storing both in Postgres (w/ pgvector). It also supports sqlite-vec and DuckDB.

I'm not sure there is a one-size-fits-all approach. Flexibility and options seems like a win to me. Different situations warrant different solutions.

10.

This is a fairly good review of the many vector databases that have cropped up recently:

https://towardsdatascience.com/milvus-pinecone-vespa-weaviat...

11.

Is that actually a thing yet? Proper vector DB integration? I sure would like to see some demos of that, as it's been hyped up a lot but I haven't really seen anyone deploy anything proper with it yet.

12.

This looks super interesting. I'm not that familiar with vector databases. I thought they were mostly something used for RAG and other AI-related stuff.

Seems like a topic I need to delive into a bit more.

13.

postgres has pgvector, an extension for vector databases

(amen re the paywalled articles, though perhaps the person subscribes and didn't realize it.)

there was discussion a few days ago answering your questions:

https://news.ycombinator.com/item?id=35308551

14.

First time I've heard of pgvector - for folks with experience, how does it compare to other ANN plugins (i.e. Redis https://redis.io/docs/stack/search/reference/vectors/) and purpose-built vector databases (i.e. Milvus https://milvus.io)?

Curious about both performance/QPS and scale/# of vectors.

15.

PGvector is very nice indeed. And you get to store your vectors close to the rest of your data. I'm yet to understand the unique use case for dedicated vector dbs. It seems so annoying, having to query your vectors in a separate database without being able to easily join/filter based on the rest of your tables.

I stored ~6 million hacker news posts, their metadata, and the vector embeddings in a cheap 20$/month vm running pgvector. Querying is very fast. Maybe there's some penalty to pay when you get to the billion+ row counts, but I'm happy so far.

16.

If it could support the pgvector extension it would be a super fast vector database with all the power of Pg - the relational aspect brings the ability to add and query using rich domain specific metadata usually contained in relational databases.

17.

Also plugging my crappy vector database, which you probably shouldn't use for anything but a fun project, however it can be set up and used in seconds. https://github.com/corlinp/Victor

18.

Vector is fantastic software. Currently running a multi-GB/s log pipeline with it. Vector agents as DaemonSets collecting pod and journald logs then forwarding w/ vector's protobuf protocol to a central vector aggregator Deployment with various sinks - s3, gcs/bigquery, loki, prom.

The documentation is great but it can be hard to find examples of common patterns, although it's getting better with time and a growing audience.

My pro-tip has been to prefix your searches with "vector dev <query>" for best results on google. I think "vector" is/was just too generic.

A nice recent contribution added an alternative to prometheus pushgateway that handles counters better: https://github.com/vectordotdev/vector/issues/10304#issuecom...

19.

This article is about why you shouldn't enter the vector database field, and it's reasonable.

But I want to comment on another thing I often hear: "You don't need a vector database - just use Postgres or Numpy, etc". As someone who moved to Pinecone from a Numpy-based solution, I have to disagree.

Using a hosted vector database is straightforward. Get an API key from Pinecone, send them your vectors, and then query it with new vectors. It's fast, supports metadata filtering, and scales horizontally.

On the other hand, setting up pgvector is a hassle - especially since none of the Cloud vendors support it natively, and a Numpy-based solution, while great for a POC, quickly becomes a hassle when trying to append to it and scale it horizontally.

If you need a vector database, use a vector database. You won't regret it.

20.

This is beautiful, but is there a decent way to plow through say, 20TB of text and put that into a vector database (encoder only)?

It would be quite a great addition, especially if the vectors could then be translated into other forms (different language, json representation, pull out names/NER, etc) by just applying a decoder to the database.

21.

If you've been wondering why there's so much hype around vector databases at the moment this article should help explain that too - embeddings and vector databases both occupy the same space.

22.

I’m finding smaller vector databases can be almost ephemeral if you avoid parsing : https://hushh-labs.github.io/hushh-labs-blog/posts/you_dont_...

I can retrieve a query, encode the embedding, load the vector store, calculate KNN, and return rendered results in under 50 milliseconds. It’s even faster if you simply cache the vector store.

Really interested in this space and hoping to hear more ideas.


Terms & Privacy Policy | This site is not affiliated with or sponsored by Hacker News or Y Combinator
Built by @jnnnthnn