What do you think of turbopuffer the vector database

“what do you think of turbopuffer the vector database”

jump to comments

Summary of results

GPT-4o

Warning: quote links might not work while streaming

smitty1e over 1 year ago | source

What is a vector database?

https://www.pinecone.io/learn/vector-database/

...was less than informative.

ofermend 4 months ago | source

Somewhat related: I actually think vector databases are not as important as people are led to believe. Read more here:

https://vectara.com/blog/vector-database-do-you-really-need-...

tristanho 5 months ago | source

Honestly just loading all vectors in-memory (and stuff like sqlite, pgvector) is totally fine when you're dealing with O(100k) vectors, but beyond that all the workable options like pinecone get gnarly, slow, and ridiculously expensive.

The best option by far I know of is turbopuffer.com , which is like 100x cheaper than pinecone and seems to actually scale.

Since it's not listed in the suggested vector dbs section of the slides, wanted to lob it in as a solid suggestion :)

isoprophlex 2 months ago | source

Plus, with all the inflated vc money fueled hype on vector databases, they seem to have the only offering in this space that actually makes sense to me. With them you can store your embeddings close to all the rest of your data, in a single postgres db.

pjot 6 months ago | source

What are your thoughts around the various vector dbs (pinecone, etc)?

DuckDB or pgVector or with all of the work going forward in Arrow being able to already support vectors/arrays, it seems that the specific “vector” class of db is hype/marketing.

MadScientist0 5 months ago | source

There's been some chatter recently about how many vector database options exist these days, whether we've reached the peak, etc. At Fixie we recently evaluated several of the options for our own service, so I thought I'd share our conclusions in an entertaining way. Enjoy!

gk1 almost 2 years ago | source

Weaviate calling themselves a vector database is a fairly new thing.

sandstrom about 1 year ago | source

This is a fairly good review of the many vector databases that have cropped up recently:

https://towardsdatascience.com/milvus-pinecone-vespa-weaviat...

moffkalast about 1 year ago | source

Is that actually a thing yet? Proper vector DB integration? I sure would like to see some demos of that, as it's been hyped up a lot but I haven't really seen anyone deploy anything proper with it yet.

10.

loondri 9 months ago | source

I think the move towards vector databases might be more hype than necessity. Traditional databases, when properly optimized, can handle vector data for many use cases. The push for specialized vector databases could be re-evaluated in terms of efficiency and cost-effectiveness compared to optimizing existing scalar databases.

11.

here4U about 1 year ago | source

Are vector databases (mostly) commodities ? Do they have a winner takes all property (Pinecone seems more popular at the moment?

12.

nighmi 9 months ago | source

We had seen interesting developpments around vector databases, but then people stopped hyping them as you could just save them in normal databases without real differences. I wonder what will happen when the models can freely access them though.

13.

blackcat201 11 months ago | source

I have been following the vector database trend back in 2020 and I ended up with the conclusion: vector search features are a nice to have features which adds more value on existing database (postgres) or text search services (elasticsearch) than using an entirely new framework full of hidden bugs. You could get way higher speedup when you are using the right embedding models and encoding way than just using the vector database with the best underlying optimization. And the bonus side is that you are using a stack which was battle tested (postgres, elasticsearch) vs new kids (pinecone, milvus ... )

14.

jjtheblunt about 1 year ago | source

postgres has pgvector, an extension for vector databases

(amen re the paywalled articles, though perhaps the person subscribes and didn't realize it.)

there was discussion a few days ago answering your questions:

https://news.ycombinator.com/item?id=35308551

15.

fzliu over 1 year ago | source

First time I've heard of pgvector - for folks with experience, how does it compare to other ANN plugins (i.e. Redis https://redis.io/docs/stack/search/reference/vectors/) and purpose-built vector databases (i.e. Milvus https://milvus.io)?

Curious about both performance/QPS and scale/# of vectors.

16.

isoprophlex 2 months ago | source

PGvector is very nice indeed. And you get to store your vectors close to the rest of your data. I'm yet to understand the unique use case for dedicated vector dbs. It seems so annoying, having to query your vectors in a separate database without being able to easily join/filter based on the rest of your tables.

I stored ~6 million hacker news posts, their metadata, and the vector embeddings in a cheap 20$/month vm running pgvector. Querying is very fast. Maybe there's some penalty to pay when you get to the billion+ row counts, but I'm happy so far.

17.

jmole about 1 year ago | source

I know nothing about vector databases – is this just “replace SQL with a dot product and return a ranked list (with optimizations)”?

18.

nborwankar 3 months ago | source

If it could support the pgvector extension it would be a super fast vector database with all the power of Pg - the relational aspect brings the ability to add and query using rich domain specific metadata usually contained in relational databases.

19.

tornato7 about 1 year ago | source

Also plugging my crappy vector database, which you probably shouldn't use for anything but a fun project, however it can be set up and used in seconds. https://github.com/corlinp/Victor

20.

kernelsanderz 7 months ago | source

Very excited about being able to build scalable vector databases on DiskANN like turbopuffer or lancedb. These changes in latency are game changing. The best server is no server. The capability a low latency vector database application that runs in lambda and S3 and is dirt cheap is pretty amazing.

21.

mindvirus 5 months ago | source

Congrats to them!

What have your experiences with vector databases been? I've been using https://weaviate.io/ which works great, but just for little tech demos, so I'm not really sure how to compare one versus another or even what to look for really.

22.

dmezzetti 7 months ago | source

Couple relatively recent HN threads that give a good overview of the vector database landscape.

https://news.ycombinator.com/item?id=36943318

https://news.ycombinator.com/item?id=38416994

https://news.ycombinator.com/item?id=38420554

23.

candiddevmike 7 months ago | source

As someone interested in kicking the tires of VectorDBs, where does Pinecone rank? Is there one that will be "future proof"?

24.

cfors 9 months ago | source

Can anybody speak to how Vespa compares to some other Vector Database solutions? Seems like there's so many options today

25.

miller_joe 3 months ago | source

Vector is fantastic software. Currently running a multi-GB/s log pipeline with it. Vector agents as DaemonSets collecting pod and journald logs then forwarding w/ vector's protobuf protocol to a central vector aggregator Deployment with various sinks - s3, gcs/bigquery, loki, prom.

The documentation is great but it can be hard to find examples of common patterns, although it's getting better with time and a growing audience.

My pro-tip has been to prefix your searches with "vector dev <query>" for best results on google. I think "vector" is/was just too generic.

A nice recent contribution added an alternative to prometheus pushgateway that handles counters better: https://github.com/vectordotdev/vector/issues/10304#issuecom...

26.

wanderingmind 11 months ago | source

When there are so many awesome FOSS vector databases available, I wonder what motivated the airbyte team to use Pinecone, the one database that is anti-FOSS?