More

0xferruccio · 2026-06-19T05:35:30 1781847330

DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them

Another thing it's been really useful for has been getting metrics on Claude skills usage and then dive into use-cases by looking at the transcripts

Other engineers that had never touched DuckDB were so impressed with how easy it is for AI agents to write queries on our dataset

skeeter2020 · 2026-06-19T16:26:35 1781886395

>> DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

I agree, and the dirty (not so) secret big data providers like Snowflake try to hide: the majority of your work is not big data and WILL fit on your local machine. My last company was spending $2M/yr on contract with Snowflake, and another million between Fivetran and Matillion. Of the 1200 clients using analytics maybe 2 had enough data to warrant "infinite scalability" and a dozen wanted Snowflake because they already had corporate warehouses in Snowflake (they probably didn't need it either). Turns out the Extract and Load could be handled by bog-standard C# code and a bunch of SQL, while almost everyone was better off with a DuckDB database running locally, often in the browser. You've probably heard YAGNI before (You Ain't Gonna Need It) but it's even more likely with "Big Data". #SmallDataConvert

tomjakubowski · 2026-06-19T16:39:32 1781887172

Folks have been beating this drum for as long as I've worked in software, dating to the Hadoop era, and it remains true today. So much of "big data" only appears big because it's poorly stored, or is represented wastefully (in persistent storage or in memory).

xnx · 2026-06-20T14:33:20 1781966000

SSD was a big turning point. Before that, there was more case for a database server.

nxm · 2026-06-20T09:46:55 1781948815

A good portion of users querying Snowflake at large companies are not technical so you can’t expect them to run DuckDB, not to mention data access controls

lanstin · 2026-06-20T16:41:31 1781973691

Greybeam (the company writing the blog) offers a service to proxy Snowflake and route queries in the fly to DuckDB or Snowflake based on predicted size. Saves a lot of money.

zurfer · 2026-06-19T07:29:10 1781854150

Like sqlite, duckdb is underappreciated as a production database. You can totally run it on servers or even "serverless" and do some heavy data transformations or with the right server size work with large scale datasets (up to a TB compressed seems fine).

ndr · 2026-06-19T08:54:55 1781859295

This. I've recently used both duckdb and sqlite to power a dashboard for a small restaurant of a family member. It converts all their sales to a very tiny parquet files, daily.

The file fits in memory and can do all sort of computation in the browser itself. The backend is extremely simple, it just loads the JS and serves the parquet files.

It was also trivial to let the owner do their own queries, just give the schema to an LLM and let it use the charting library, no data hallucinations. If they need it in the dashboard they can either use that one or ask me to review that query.

To be honest, given how simple some things became, it's been really fun to work on.

skeeter2020 · 2026-06-19T16:32:13 1781886733

Similar experience here. The best thing I've built in a long time is replacing a complex (and scary) permissions system built on top of Snowflake with single role duckdb databases that - aside from no longer worrying about bugs leaking data across roles - are more performant, timely and flexible. Combined with the use of AI this is the way forward IMO.

At the other end of the spectrum, working with random data on "what if?" and exploration tasks with DuckDB is fun again. it's so straightforward and fast, with tools and functions for pretty much everything.

kristjansson · 2026-06-19T17:28:16 1781890096

> no data hallucinations

Dangerous thing to assert. It’ll happily run SQL that works, but doesn’t necessarily correspond to intentions or unstated assumptions about the data.

ndr · 2026-06-19T17:45:25 1781891125

Of course I meant that it won't make data up.

It can only emit SQL and the json spec of the chart.

Since shipping I've reviewed dozens of queries and charts it produces answering the user. I'm yet to catch sonnet off guard.

noworriesnate · 2026-06-19T16:42:16 1781887336

I have a a theory that LLMs are going to be the death knell of big SaaS. It's so much harder to build and maintain an massive SaaS that does 80% of what 80% of your customers want, than it is to build something small and simple that does 100% of what one customer wants.

kristjansson · 2026-06-19T17:30:03 1781890203

Maybe once the model can administer and operate the service too.

For now building the 10% of the SaaS that you need still leaves you operating 100% of a new service/process

wills_forward · 2026-06-19T16:26:58 1781886418

tomnipotent · 2026-06-19T07:41:58 1781854918

Not to mention it can query across heterogeneous sources, so the same query can use a duckdb table, sqlite, csv, and parquet (including predicate pushdown).

cyanide911 · 2026-06-19T07:36:23 1781854583

>Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them

Nice! How do you set things up so that your engineers's claude code sessions upload to S3? Thanks for the help in advance

0xferruccio · 2026-06-19T20:25:06 1781900706

We have a hook that runs on session start and session end that sends data to a lambda with a hard coded JWT token that we ship in the code

We added that to the managed settings for our Claude instance as a “base” plugin and provision it to all machines using JAMF

A non-enterprises version of that would be to add this hook in your main repository’s .claude folder

_boffin_ · 2026-06-19T13:37:04 1781876224

Probably on a business / Enterprise plan, which has managed settings and also telemetry export. Give it a collector endpoint to export to and then have collector send to s3.

pimeys · 2026-06-19T11:07:50 1781867270

If you use OpenCode, the sessions are all in a local sqlite database. After lunch I'm pushing one of my agents to crunch some data from that using duckdb...

tosh · 2026-06-19T13:21:58 1781875318

Agree, in addition to that DuckDB also works quite well for data that is too big to fit in memory or on the machine DuckDB is on (predicate push down, out of core processing, …).

ashu1461 · 2026-06-19T11:08:08 1781867288

Can you please expand more on the claude analysis part. What exactly you analysed and what outcome it helped with ?

ryanchants · 2026-06-19T13:52:39 1781877159

Not who you responded to, but I've been working on cctx. It's an open source tool for analyzing claude code sessions to see where things went wrong(tool failure loops, bloated context, and the like).

https://github.com/jacquardlabs/cctx

fastasucan · 2026-06-21T13:14:26 1782047666

>DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

It also works great for data that doesn't fit on my laptop.

0xferruccio · 2026-06-11T22:51:22 1781218282

This is a genius idea, I love it!!

matthewbarras · 2026-06-12T00:32:26 1781224346

thank you!

0xferruccio · 2026-05-19T17:14:07 1779210847

Congrats on the launch, this looks very promising. I hadn't seen any installation that uses a URL to point to a skill, seems like an evolution of wizard scripts

That been said for more complex setups like on kubernetes where you need a collector and an operator I found OTEL to be super painful to setup a couple of years ago. Has it gotten any easier now?

signalbright · 2026-05-19T19:33:39 1779219219

Thank you! Glad you liked the install process :)

I'm afraid a collector and the operator are still the recommended way to go by OpenTelemetry (https://opentelemetry.io/docs/platforms/kubernetes/getting-s...). We're still working on a custom skill for Kubernetes, but the general skill should give you a sane default already.

A good way to start can be to start sending traces/logs directly by instrumenting the service and putting our backend as the collector.

I also help out personally whenever our clients have any questions on setting up the telemetry :)

0xferruccio · 2026-04-29T22:05:47 1777500347

Incredibly well done by Neal as usual!! Always has new fun experiments that are always completely new concepts

0xferruccio · 2026-02-11T06:06:01 1770789961

Great article as usual, got a flashback to reading your first post on here 8 years ago. At the time I was starting my career in tech by building small projects for fun and launching them on Product Hunt. Great to see you’re still going at it!

0xferruccio · 2026-01-29T19:00:40 1769713240

At Amplitude we built Moda which is super similar to this.

Our chief engineer Wade gave an awesome demo to Claire Vo some months back here: https://www.youtube.com/watch?v=9Q9Yrj2RTkg

I use this basically every day asking all sorts of questions

0xferruccio · 2026-01-29T18:57:45 1769713065

To be fair I remember spending almost two weeks implementing OTel at my startup, the infrastructure as code setup of getting collectors running within a kubernetes cluster using terraform was a nightmare two years ago.

I just kept running into issues, the docs were really poor and the configuration had endless options

0xferruccio · 2026-01-23T21:39:44 1769204384

some of the design interactions are really polished. the section written with the quotes from founders is really cool. the hover effect with the before and after of the YC partners is a great touch too!

0xferruccio · 2026-01-22T19:52:11 1769111531

to be fair at least half of the software engineers i know are facing some level of existential crisis when seeing how well claude code works, and what it means for their job in the long term

and these are people are not junior developers working on trivial apps

swiftcoder · 2026-01-22T20:06:45 1769112405

Yeah, I've watched a few peers go down this spiral as well. I'm not sure why, because my experience is that Claude Code and friends are building a lifetime of job security for staff-level folks, unscrewing every org that decided to over-delegate to the machine

Macha · 2026-01-23T12:33:38 1769171618

Cleanup is less enjoyable than product building. If every future job is cleaning up a massive pile of AI slop, then that is a less fulfilling world than currently.

swiftcoder · 2026-01-23T12:49:54 1769172594

I mean, cleaning up after outsourcing firms isn't the most glamorous work either, but we've done that for years too

2sk21 · 2026-01-23T13:06:59 1769173619

I feel grateful that I retired a few years ago and no longer have to make a living being a developer.

0xferruccio · 2026-01-12T22:39:36 1768257576

The primary exfiltration vector for LLMs is making network requests via images with sensitive data as parameters.

As Claude Code increasingly uses browser tools, we may need to move away from .env files to something encrypted, kind of like rails credentials, but without the secret key in the .env

SahAssar · 2026-01-12T23:39:02 1768261142

So you are going to take the untrusted tool that kept leaking your secrets, keep the secrets away from it but still use it to code the thing that uses the secrets? Are you actually reviewing the code it produces? In 99% of cases that's a "no" or a soft "sometimes".

TeMPOraL · 2026-01-13T22:30:24 1768343424

That's exactly what one does with their employees when one deploys "credential vaults", so?

SahAssar · 2026-01-13T22:35:05 1768343705

Employees are under contract and are screened for basic competence. LLMs aren't and can't be.

TeMPOraL · 2026-01-13T22:36:43 1768343803

> Employees are under contract and are screened for basic competence. LLMs aren't

So perhaps they should be.

> and can't be.

Ah but they must, because there's not much else you can do.

You can't secure LLMs like they were just regular, narrow-purpose software, because they aren't. They're by nature more like little people on a chip (this is an explicit design goal) - and need to be treated accordingly.

SahAssar · 2026-01-13T22:43:10 1768344190

> So perhaps they should be.

Unless both the legalities and technology radically change they will not be. And the companies building them will not take on the burden since the technology has proved to be so unpredictable (partially by design) and unsafe.

> designed to be more like little people on a chip - and need to be treated accordingly

Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.

I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.

TeMPOraL · 2026-01-13T22:52:03 1768344723

> Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.

Very true when comparing to acquaintances, but at a scale of any company or system except the tiniest ones, you can't blindly trust people in general either. Building systems involving people and LLMs is pretty similar.

> I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.

That is, indeed, the key. My point is that, unlike the popular opinion in threads like this, it does not follow that we need to give up on LLMs, or that we need to fix the security issues. The former is undesirable, the latter is fundamentally impossible.

What we need is what we've been doing ever since civilization took shape, ever since we've started building machines: recognize that automatons and people are different kinds of components, with different reliability and security characteristics. You can't blindly substitute one for the other, but there are ways to make them work together. Most systems we've created are of that nature.

What people still get wrong is treating LLMs as "automatons" components. They're not, they're "people" components.

SahAssar · 2026-01-13T23:01:37 1768345297

I think I generally agree, but I also think that treating them like people means that you expect reason, intelligence and a way to interrogate their way of "thinking" (very broad quotes here).

I think LLMs are to be treated as something completely separate from both predictable machines ("automatons") and people. They have separate concerns and fitness for a use-case than both existing categories.

majormajor · 2026-01-14T01:01:16 1768352476

Sooo the primary way we enforce contracts and laws against people are things like fines and jail time.

How would you apply the threat of those to "little people on a chip", exactly?

Imagine if any time you hired someone there was a risk that they'd try to steal everything they could from your company and then disappear forever with you having no way to hold them to account? You'd probably stop hiring people you didn't already deeply trust!

Strict liability for LLM service providers? Well, that's gonna be a non-starter unless there's a lot of MAJOR issues caused by LLMs (look at how little we care about identity theft and financial fraud currently).

xyzzy123 · 2026-01-13T09:46:19 1768297579

One tactic I've seen used in various situations is proxies outside the sandbox that augment requests with credentials / secrets etc.

Doesn't help in the case where the LLM is processing actually sensitive data, ofc.

touristtam · 2026-01-14T10:39:51 1768387191

Can't use a tool like dotenvx?