Deeply Unsexy: SQL's Redemption Arc — with Tristan Handy

dbt introduced the idea that you should push these versions of the truth kind of centrally and govern them in source control and that rather than taking your ball and going home, if you didn't like the centralized version of the truth, you would kind of argue it out in, in public and try to come, come to some consensus that worked for everybody.

Now obviously that's imperfect, but, but I think that we've made it honestly like a reasonable amount of progress along that continuum.

It does, yeah, it does feel like that has really changed. I think when I was doing my PhD, like 15 years ago, you know, I was working with scientists, you work with scientists and their data comes in like every form you can imagine. Like it's Excel spreadsheets, it's an API, it's a database, it's CSV files, like whatever. And so like absolutely a big part of a role of like a applied statistician or data scientist was just to figure out how do you get all that data into like one nice clean representation.

And I remember kind of thinking at that time, like this, like this, like this is just so tough in science, it must be so nice to like work in an industry job where you can just like query like the single source of truth and just get it. And now that it clearly like was not the case like 15 years ago, but it does feel like we've gotten like much, much closer to that now that you can now, like in many organizations there's like a decent chance that the data you need is kind of like nicely prepared somewhere.

It's all just like different ergonomic ways to express the same ideas. And I love the idea that we're getting closer to a place where like, we've got a universal babblefish.

It is. I will say too, like, I, I worked on like a little bit of porting like dplyr and dbplyr to Python and all that translation stuff. But what's so interesting to me now, and in like 2026, is that similar to like unnesting JSON, I've been amazed at like how expressive, yeah, like SQL is in a lot of databases like DuckDB. I used to bring up the example like six, seven years ago of like, why dbplyr? And it was like, well, if I want to select every column except one, like it's a huge nightmare in most databases. But now like, you have DuckDB with like excludes or except, I always forget the, they have like all these ways to select things and operate.

So it's interesting to your point of like, fusion, taking and being able to translate like down to the AST level. It's funny, like seven years ago, I feel like I would have wanted like an, a very data frame, like our Python tool translating to SQL. But now I could see like something like you described, be sitting a lot closer to SQL, like an R wrapper around a more SQL-y dialect.

To me, dbt is, started out a little bit like Rails. Where SQL is HTML, and nobody really wants to sit there and hand write HTML, that's like not an efficient use of time. And when, so when you use something like Rails, you kind of go up a level of abstraction. And similarly, that same thing happens with dbt. You know, you can, long before the engines themselves started doing things like exclude or accept for the select list, like you could implement that as a function in dbt, and then using the macro capabilities, you could just have it.

As we continue up the layers of abstraction, it just enables people to forget the implementation details. So like, here's a thing that goes on right now. There are companies that have spent like literally millions of human hours writing Spark pipelines. And they also, in different teams, have spent millions of human hours writing stored procedures. And these things fundamentally do the same stuff. Like they are not different. They just, for whatever reason, have been built in different technologies. And so these companies maintain separate data infrastructures for these two things. And then at some kind of final stage, they kind of make them all available to each other. But that's not that sensible. And if you could just say like, well, you've got a translation layer that kind of can, you know, read and operate on these things, regardless of how they were originally expressed, then that all those walls kind of fall away.

It's also sort of echoes a bit like the sort of story of like Hadoop to Spark to now. Like I think, yeah, people just write SQL, right? Like in the early days, you know, SQL databases couldn't handle the level of, like they didn't know how to split up jobs across, you know, hundreds of machines. And you were kind of forced as like a data scientist or data engineer to do this yourself and like explicitly manage all that computation. And like, I'm sure some people enjoy doing that. But I think most people, it was just like something in the way of doing your actual job. And like, as over time, you know, the databases have become more capable, like all of that just gets swept away into the background.

Building the dbt community

dbt has this incredible community. Like, I know we've talked a lot about like dbt, the tool and, you know, a fusion, but one thing that struck me is like the dbt Slack is so hopping. And there's, there's like so much going on. I'm, yeah, I'm really curious to hear like, what, what do you think like went into kind of creating such a nice community? I'm not trying to like butter you up, but dbt Slack so happening and coalesce is so bumping. I'm, I'm curious, like, is it just like analytics engineers are wild people or like, what, what do you think makes the dbt community so nice?

So I actually don't know that much about the, our community is, is there like a place that people gather to talk about best practices and stuff like that? Or is it just like so widely used that it's like, there's a million different separated communities. Is there a place where our users get together in person?

So I, I think they're like, I mean, they were like, there was like online, which was Twitter, like that's where people like shared knowledge. The R community is like largely abandoned Twitter for fairly obvious reasons. And there's like less of a sort of central online place. They definitely, there's lots of, yeah, but, but, but apart from that, there's like quite a few like regional R conferences, you're a positive as a conference, but these are all conferences on the order of like, you know, hundreds to maybe 1500, 2000 people, but like scattered all over the place. So that there's not like a really, I don't, there's like one central R event in person.

Yeah. And I will say there's like a big hex sticker crowd, like in, in R there's like, people love hex stickers for packages and there's a real like frenzy to pick them up at conferences when they get dropped. Yeah. Yeah. And so like, if you create a package in R or dbt, you want it to be a, a real package, like it has to have like a sticker associated with it and like that. And so then at the conferences, then it's like, you know, you're kind of like trading, you trade stickers with people. And I think that that's one of the things that's like pretty unique about the R community.

It's, it's, it's hard, you know, it is, um, probably anybody who has kind of been the seed crystal for, uh, a, a reasonably large community. Uh, it's, it's kind of an emergent phenomenon and you kind of never know exactly what the things are that made it happen. Um, I would, I would say that, um, the, the, the biggest, uh, the, the biggest trait that we were talking about before with analytics engineers is that they previously often previously were data analysts and they were leveling up in their careers. And oftentimes the most, uh, common motion as they went through that was a sense of overwhelm and a sense of imposter syndrome.

And so many technical communities are, uh, uh, have, have a very, like the people who run the communities are highly technical and they, there's a sense of like, RTFM, like, don't ask this question until you've like researched to the ends of the universe and only then bother me. Um, and, and we just acknowledge the fact that like this stuff is, uh, for, for many of the people that were starting to use dbt in 2016 through 2020, we're, um, just kind of felt overwhelming and they just needed some support along the way. And so we kind of seeded the community to be helpful and supportive and friendly. Uh, and we, we were very serious about moderating out any behaviors that kind of conflicted with that. And so it, that kind of creates a virtual cycle when people feel like they've been supported in their journey, they went in and then support other people in their journeys.

Yeah. It's like, I think there's interesting parallels to the art community because I think like 10 years ago, 15 years ago, there was like, like I was like four people with PhDs and statistics by people with PhDs and statistics. And you go on the, I'll help mailing list and you ask a stupid question. Someone will tell you what a fucking idiot you are basically. Um, and so like one of the things like that, like that, when I thought about the art community, like that is something I wanted to do the opposite of basically. And, you know, as the, as sort of these technology transitions from like mailing list to stack overflow from stack overflow to Twitter, kind of every point where there's opportunity to kind of like reinvent the community a little bit and move towards like a more like friendly and welcoming environment.

And I, and I think like, but that was just a tremendous net benefit to the art community as a whole. And I think we're also like, we, we also are lucky because the art community tends to be like more diverse because there's people coming from all branches of science, you know, diverse, both in their backgrounds and the applications and that, and that like cultivating that, yeah, like that, that feeling of being welcomed. And as you said, like that virtuous cycle, like I felt like really welcomed with the community. And now I'm going to like pay that forward and like, welcome the next generation of people. Like it really led to like a pretty remarkable transformation in the, in the art community.

AI's impact on community and open source

Yeah, we were, I was saying like very positive things about AI and how I'm hopeful for its impact on analytics engineers. I am a little up in the air as far as how AI's impact on community formation.

So the, the, the funny thing about when you have communities is that they not only help people get things done, but they also build social capital. And so whereas, you know, there, there were like really meaningful social relationships built in the early days of the dbt community when I was like a super, super active member. And, but, but they, they happened as a part of asking and answering of kind of boring technical questions. And now I would never ask those types of technical questions to a community in Slack. I would go to Claude or chat GPT or whatever. And they would give me, it would give me immediate answers that were probably of as high or higher quality.

And the other thing that is you know, we still have to see how it plays out is even open source feels just a little bit less now of obviously open source on the order of like R or dbt or, you know, something obvious like Linux or like that, that, that stuff's not going away, but there's also like this entire package ecosystem that, you know, I spent a lot of time curating a package called dbt utils, which was macros to do like useful utilities. And, and like now you could, you could just like ask Claude to say like, Hey, make me a macro that does this thing. And maybe, maybe the, yeah. So I, I don't know. I worry about that stuff a little bit, but then I feel like a grumpy old man.

But I do. Yeah. I feel similar. Like the other thing I worry about, I was like, well, like we're the, we're the incumbents, right? We're the people who created the software that if you ask Claude, it knows how to do it. Like that's all in the training data. And if you're like a, like a young person, like the way I promoted ggplot2 and dplyr was like, someone would ask a question on the internet and like, I'd be there like, Hey, like I'm going to both answer your question and I'm going to be like friendly. And like, so there's that, like, you're learning something new. Like you're not going to get that from a chat bot. And there's that like interpersonal relationship, which is also like gone now. So like, yeah, it's pretty clear that's going to have like profound implication on how like these communities form and like what people get out of them.

I mean, maybe like it, it frees us to, you know, focus more on communities, like for the sake of community, not just to get, you know, to solve your R or SQL programming problems. But I don't know, like it is. Yeah. I worry. I worry about that, about that loss of connection.

Yeah. I wonder how much, like, I remember like searching a lot too, and really appreciating like finding a blog post or like finding out someone kind of did a dive into what I was looking for. And I, I feel like I remember some of those to this day, which is like, I think Hillary Parker. So one person, the R committee way back wrote about like making an R package and somehow, even though it was like over a decade ago, it's like burned in my mind.

Um, I do, I do wonder how much, if like people will still feel as like encouraged to, I mean, maybe they'll write up just as much, but I, I know like dbt also like really, I feel like distinguish itself through just so many great blog posts and like deep dives. Um, yeah, there, there are posts that we, you know, there was this post we wrote in like 2017 or 2018, which was like how we configure snowflake for our clients with dbt. And that was used to configure so many snowflake instances, you know.

Certainly you can still write that type of content as much as you ever could. But, um, the, I think the economic incentives for it are like changing very quickly. Which in some ways is not entirely like, that's also what led to like all of this, you know, like the content, like the, all of this content farming around just like creating like tons of like pretty useless content. So whenever you search for something, you know, you get someone selling ads to you like that. I'm not like, not sad to see that go, but like the, like the blogs that people like poured so much heart and soul into.

And then I think like the other thing I'm nostalgic for is like, you would read, you know, you read someone's like really cool technical blog. You're like, Oh, that's awesome. Then you go and like follow them on Twitter and like, as well as the technical side, you also get like some snapshot of like their personality and the other stuff. Which is why the, our ecosystem, I could, you know, we didn't have to get into it, but like, I could tell you what their collective thoughts were on elections, US elections, because like all of this stuff bled together once you followed somebody on Twitter.

Yeah, for better or worse.

Yeah, right. Did everyone go to Blue Sky? Is that where the community is now? Or is it somewhere else? I mean, not every, I mean, not everyone, but I think that's, it feels like there's enough of a community, there's like, there's a strong enough nucleus there that you can go and interact and people interact back. And like, you know, it's enjoyable in the same ways as like early Twitter. Like I, you know, I tried Mastodon and I never got the same. And I'm on like LinkedIn, LinkedIn, which I kind of hate everything about, but like people use it. And like, I get gotten better, like, you know, feedback there than other places.

LinkedIn was not on my bingo card as the number one social site that I use. And I'm, I'm still troubled that that's the case. But I do feel like somehow it's so tame, that I'm like, this is fine as a social place. But I do miss like, I miss like reading posts by like, an account that claims to be like a raccoon digging through garbage or like, but that actually like, yeah, I, we, I'm getting that getting there from like blue sky now, like just like weird, like personas where you're like, this is, this is just like clearly like such a totally different like person from me. And I get to like experience a little bit of that.

Personal projects and coding agents

I'm curious if either of you have a fun personal projects going on right now in data. I feel like with, with coding agents, I have been doing more personal projects than ever. My current thing is that I'm trying to create my own iOS app that pulls data out of health kit using the like highly hard to access SDK so that I can get my health data into a Postgres database and screw around with it.

I love this. I just have to say like this sentence in like four years ago would have sounded insane as a project, but that this somehow this project makes so much sense to me. Like, even if you've never done an iOS app that you could have a good usable time, make an iOS app.

Yeah. I mean, I did this. I mean, I did, I also did an iOS app, but it's just a, it's a talk timer for like, if you're giving a talk at a conference where you're cheering a stage, like I've always had in my head, this kind of like platonic ideal of what I want from a timer. And like, when you go, like, I've looked, tried so many different apps and they're always like, oh, the bloated or ugly or like full of ads. I'm like, Oh wow. Actually, I can just create this now. And I did. And it was like, so like fulfilling to like create this thing in Swift, which I'd never used before. And like, it works and I like it. And yeah, that's.

I think I mentioned this last, maybe the last episode we had, but I've been doing a lot more Cantonese study. So my dad speaks Cantonese and it's, it's a tough language because it's not written traditionally. So like it's, it's rare to have like transcribed Cantonese. It's like only spoken. And then people learn to write and read Mandarin. But it's, it's so easy today to get like whisper to transcribe Cantonese videos. That it's, it's kind of mind blowing to have like whisper transcribing material. And then agents are really good at speaking or writing, like writing out the Cantonese.

And so it's, it's been really nice to like be able to generate sentences and have like a tutor that can kind of take what vocab I've studied and kind of like remix and, but I will say the nicest thing about this activity for me has been like, right now I have like so much focus on using things like Claude code to generate. I find language studies has been really nice for almost like getting back in touch with like picking up a skill and kind of like fluency, like kind of that's felt good in the way that like coding fast felt good before too. Like just being able to produce words and like read and understand somehow like feels nice.

Wait, how, how far along is this iOS app? Are you like, did you run it any hitches or did it go off the ground pretty easy? What was the, it's gone. Okay. So far it's, it's, it's Claude code or what's the, yeah, I'm using Claude code. It's, it's not complete yet. I have, I have a late night tonight. Yeah. Actually as we're recording, I have Claude working in the background.

Right. Nice. That's I feel that constant talk. You're like, I might as well have you working while I'm doing other stuff. Last week I gave a talk, like an internal talk about like using Claude code and like someone like during and the zoom like polls the audience of like how many people are watching this talk while Claude code is doing something in the background. And this is like 20% of the people watching. So yeah, geez.

I, I actually realized like I had like this, this is definitely toxic, but I realized like I didn't for a while. I was thinking like as many of meetings as not being like real work. Like, so whenever I was in a meeting, like I didn't count as work and like one of the reasons that I found like Claude code. So like appealing, it was, it suddenly turned to the meeting felt like it was work because Claude code was working for me in background.

Tristan, have you fired Claude code at like dbt projects? Like have you had any reckoning moments where you just turn it loose on dbt? Like what, what's that been like?

It's, I mean, it's shockingly good. The, as, as you mentioned before, Hadley, there's enough dbt code in the training data that it just knows how to do that. And now, so we, we, we built an MCP server ship that in, I think April of last year that has seen very rapid adoption. And that now allows Claude to pretty straightforwardly kind of execute stuff. And, and also like test its own, like validate its own code.

So yeah, it's, it's good. We, we did, I think it was maybe two months ago or something where we were able to go from zero to a pretty sophisticated project in the space of one hour. And so there's, it, it felt like kind of a, a moment.

We have like an internal thing we call demo bot, which is like our sales team use it. So instead of where, like when they go to talk to a customer, instead of just showing them like, Oh, here's a generic, like there's a New York bicycle share data. We have like demo bar, which is like very simple Claude script. That's just like create a sample data set for this industry, make a dashboard, make an API, make a report. And like, even though the data is like completely made up, it's like so much more compelling to see something like related to your related to your industry that, that that's been, yeah. People really like people really like stuff that's like customized just for them. And like, it's easier now than, than ever to do that.

Predictions for 2026

Do you want to make some famously bad predictions for us, so in a year's time we can be like — Tristan only thought we needed one LLM for the entire United States.

I think that this is the year that Iceberg goes from a topic of conversation at CIO dinners to actually implement it in the wild. I think that AI is going to be layered on top of things that we already do. It is not going to be the death of dashboards or any such catastrophic things. Those are the two things that I mostly expect from this year.

I do think that agents, even saying this is the year of agents, I see the same thing. I think that the reason that agents have come for coding first and best is that, one, they're developed by software engineers and so it's easy to automate your own work. But two, a lot of times software is the least stateful thing and so it's actually easier to dummy up data and still do real work and this kind of stuff. It always takes a little bit longer in the world of data because state is just harder. But I think we're starting to resolve some of the kind of permission or all of these types of things that allow agents to safely get at the massive repositories of structured data that companies have.

Yeah, I think that there's a lot of companies that make tools, whether you're talking about Salesforce or Workday or whatever, these kind of purpose-filled applications. And most of these companies want you to build agents within the context of that piece of software. And so you can do that. And certainly if you do that, there's certain advantages. Probably your agents can have a lot more context around what it's operating and maybe it can also take action as well. But there's real advantages to building agents in kind of a more horizontal, generic way on top of your data lake. Because then they can access any data, not just the data in that one application, and they're a lot more flexible, etc.

But when you're gonna put an agent on top of a data lake, you have to think about an agent just like a human. You're just gonna say, go have at it. You know how to read Parquet, do whatever. So you need to make sure that you give it access to data in the same way that you would give a human employee access to certain specific data and not other data. And I think we're just starting to get there in terms of how to think about doing that.

People have a lot of expectations of agents. They think about them as an automated version of a human. Whereas previously, we would have a service token, and that service token could control things within one certain application. But an agent workflow, now you expect it to interact with four to 10 different tools. And so all of a sudden, it's got an auth profile that looks kind of like mine as an employee. You've got to map it to a bunch of different applications. So it's not trivial.

I don't know. It just feels so hard to make predictions right now. I think we're going to see a lot of change. Some of it we can anticipate, some of it it's just going to be surprising, like second-order effects. To me, when I think about AI, it's all about being nimble, continuing to experiment and try stuff out and accepting whatever I believe today might be wrong in six months' time. But compared to six months ago, I don't know, I feel more optimistic, I guess. I still feel like software engineering is valuable and useful and that there's so many of the skills we know still continue to be useful. And now it's starting to think, well, what does this mean for data scientists? What are the skills that you need to apply, even though you're maybe no longer handwriting all of the code? So I have no spicy predictions.

To me, when I think about AI, it's all about being nimble, continuing to experiment and try stuff out and accepting whatever I believe today might be wrong in six months' time.

Yes, I now have six fruit trees planted. I planted them mid-season last year. This coming year will be the first growing season. I have a big deer fence around everything. I built a bunch of raised beds. So in a year, if we talk again, I will tell you if I was successful or not.

Is this also kind of like your backup plan? Like if AI does take your job, at least you can still eat. Bushels of fruit.

I know that you're partially kidding, but the more that I go down the road of AI and everything that's happening right now, there's like a digital dysphoria that makes me more and more want to get my fingernails dirty. And so this scratches that itch.

Yeah, I'm like partially kidding and partially not. I think me and my husband are going to take a welding course later this year. Oh, cool. Because that's like a fun... I've looked at doing some metalworking stuff, but you need a lot of equipment and access to somebody else's workshop. I couldn't figure out how to make that happen. Yeah, there's just discovered there's like a cool maker space that's pretty close to us that does a bunch of stuff like that.

No, I have no needs to weld. I don't even know what to do with these skills, but it seems like a fun thing to learn.

Tristan, thanks. Thanks so much for coming on. I mean, honestly, I think it's a dream to be able to talk about dbt in this space. And like you mentioned, it is kind of like two worlds. And I think it's been so helpful to hear about the similarities and differences between these worlds. And I'm just such a big fan of dbt and all the work y'all are doing. So really appreciate you coming on. And thanks so much. It's been a lot of fun.

The Test Set is a production of Posit PBC, an open source and enterprise tooling data science software company. This episode was produced in collaboration with creative studio Adji. For more episodes, visit thetestset.co or find us on your favorite podcast platform.