Transcript#
This transcript was generated automatically and may contain errors.
Thank you. It's good to be here. So I'm going to talk about the sort of final stage of research. You've done all this great work, but you actually need to tell people about it.
So how do you communicate what you've done and your results? And I think a really powerful way to reach people is to personalize things. So there's this great, pretty classic example on the New York Times, this article called The Best and Worst Places to Grow Up. And when I go there from home, the first thing I see, the first thing I read, the lead is about where I live. So it says Benton County is pretty bad for income mobility for children and poor families. And they've drawn me in by making this about something I care about, where I live.
As another example, the Washington State Soil Health Initiative is responsible for the state of the soils assessment. And this assessment relies on the participation of farmers. So they have to agree to have their soil sampled to build out this assessment. So the way they encourage that participation and the way they reward it is that every single one of the farmers that participates, and there's about 300, receives a completely custom report that's all about how their farm compares to other farms and farmers that have participated.
And this particular, so let me go back a sec. So it's a really powerful way to connect to people to make, to give them something that's customized and personal. How do you do it is another question. This particular example was built with Quarto, and it was put together by a data scientist called J.D. Ryan at the Washington Department of Ag.
It's a really powerful way to connect to people to make, to give them something that's customized and personal.
Introducing Quarto
So I'm going to sort of talk through an example of doing this with Quarto. And we're going to start with one notebook. And when I say notebook today, I really am specifically talking about a Jupyter notebook. So I have this Jupyter notebook, and it answers the question, was it hot last month in the town that I live in, Corvallis, Oregon?
So that's sort of our one notebook. And there's some work going on in that notebook, right? I'm grabbing some data, I'm subsetting it, I'm taking a look at it, I'm crafting a little sentence that actually summarizes the answer to that question, and I'm creating a figure. We'll see that a little bit in a second. And what we're going to do is we're going to take that one notebook, and we're going to produce first sort of one report, like what does this look like in a highly polished fashion for my one city, for Corvallis? But we're going to not only just do one, we're going to do this for the top 50 cities in Oregon. So there'll be a report for Portland, and there'll be a report for Eugene. And we're going to do all of that with Quarto.
So the first thing you might be wondering is, what's Quarto? So let's start there. I should introduce myself. My name is Charlotte Wickham. I work at Posit PBC. I'm a developer educator there. And Quarto is one of our open source projects. It is a command line tool. So when you get Quarto, once it's installed, you get access to Quarto on the command line. And one of those commands is called render. So you say Quarto render, and it is expecting some kind of computational notebook. In my example, it's going to be this Jupyter notebook. So corvallis.ipynb. If you run that, what Quarto returns is something called corvallis.html, and it's just an HTML version of this notebook. And everything that was in that notebook is here now in that HTML file. So there's that figure I alluded to.
So you could think about Quarto as a tool to turn notebooks into HTML documents. But it's more than that. If you add an argument here to docx, it's now a tool to turn notebooks into Word documents. And you get out a Word document. It's got all those same components. What I'm actually going to focus on today is getting out PDF documents. So here I'm saying to types. And that returns me a PDF that's going through Typst. And Typst is a modern alternative to LaTeX.
I think there's a lot of good reasons to use it. Some of the best reasons to use it with Quarto is that Quarto bundles Typst. So you don't need to install anything else. If you've got Quarto, you can immediately go to PDF without having to manage, for instance, a LaTeX distribution. The other great thing about Typst is it's really fast. If you've ever sat there watching LaTeX run and compile your document, Typst is a much faster way to get PDFs. And I find it's much easier to work with. So there are customizations that I would create for a PDF with Typst that I would never consider even attempting in LaTeX.
Quarto document options
There are a lot of options that Quarto has for controlling how this notebook shows up as our document. And the way that you add them, if you're using a notebook, you put a raw cell at the top. Inside that raw cell, you have three dashes to open it, three dashes to close it. And between those dashes, Quarto is expecting options. One of those options, for example, could be the format. So I'm saying format Typst. That means now when I call Quarto render, I don't have to tell it I'm going to Typst. It's going to pick that up from that header. And out I get my PDF.
Another really great option for this kind of work where you're taking something computational and trying to produce something quite polished is echo. And I'm going to set it to false. And that means, like, just don't show me any code in that output, in the output document. I just want the results and any markdown that's there.
And then a final option to be aware of, particularly if you do work in Jupyter notebooks, is by default, Quarto won't run the code in the cells. It's just going to take whatever outputs are already in that notebook and use them. Here I am enabling execution so that actually when I render, that code is always being executed. That's just a great way to ensure reproducibility. I'm guaranteed that the document I get out at the end is the result of running that code right now.
The other thing you can do is set options at the code cell level. So what I was talking about just before was at the document level. You can also put options in code cells. And the way you do that is with a special comment. So you have a hash, the usual comment symbol, followed by a pipe. And then you can put in options. And one of those options is called include. And if you set that to false, nothing from that code cell ends up in the output document. So that's really nice. Like, in this example notebook, I have this call to head because sometimes I just need to see that data to know what to do next. It's a great way to kind of inspect what's there. But I do not need that in my output report. So I'm just gonna say include false. I can keep that code there for when I'm playing around developing. And it doesn't show up in my output.
What else can Quarto do?
Okay. So at this point, we sort of have one single notebook that's generating one PDF. And that PDF isn't terrible looking at the moment. I don't think. It's okay. So that's sort of what Quarto is. And if you take away one thing, like you leave right now, Quarto is a way to avoid copying and pasting into a Word document. If you've ever done that, you know what kind of hell you're setting yourself up for. So that's one way to think of it. But it's so much more. You can use Quarto to build entire websites. So this is the NBDev website, which is kind of cool because this website is built with Quarto. But NBDev also helps you make documentation websites built with Quarto. Kind of cool. There are books built with Quarto. So the web version of Python for data analysis was built in Quarto. This presentation was built in Quarto. So you can make presentations.
And another really nice thing about Quarto is that it makes publishing really easy. So the way I got my slides onto GitHub and the way I updated those slides every time I changed something was to run Quarto publish on this slides source document. And I was sending it to GitHub pages. And in fact, I didn't once I'd sent it once to GitHub pages, I don't even need that additional command there. It'll ask me if I want to update to GitHub pages. It's super easy.
Parameterized reports
Okay. But what I'm really here to talk about today is parameterized reports in Quarto. And this term, parameterized reports, is the way Quarto describes this process of using a single document to create many documents. And we're going to walk through exactly how to do that. And this is how we're going to get there. So we're going to start with a notebook that works for a single value. I have a notebook that now works for me for Corvallis. I'm going to turn the hard-coded value into a variable. I'm going to do this for one value, but you could potentially have many parameters, many variables. I'm going to make that variable parameter, and then I'm going to show you how you can render with different parameter values, and then we're going to automate that rendering so we can do this for all the parameter values that we're interested in.
Okay. So if you dived into the code in that notebook, there are three places I currently use a hard-coded value for Corvallis. There is a level one markdown heading and a markdown cell that says Corvallis. There's a line in a code cell that does a filtering to just the rows in my data where the city is Corvallis. And then when I'm creating my figure, I'm using plot nine, and I'm adding a title to the plot that says Corvallis, Oregon.
So the first thing I'm going to do is I'm going to actually just rename this notebook so it's not so confusing. This is no longer a notebook just about Corvallis. So I'm going to call it climate.ipynb, and I'm going to add a code cell to it where I'm setting up this variable, right? There is going to be a city variable that controls which report we're producing.
And then I just need to replace each of these examples with sort of a more generalized something. So for the level one heading, I'm replacing what was a markdown cell with a code cell that produces markdown. So this code cell is calling this markdown function. It's using an f-string to kind of set up that level one heading. The filtering, the subsetting is a super easy substitution. I'm just not doing it with a string. I'm just passing in the variable city. And then in that plot title, it's another case of using an f-string just to plug in the string Corvallis.
So at this point, like, it's a good, if you're working through this with an example, it's a good idea to actually change that variable. Like, what happens if I set city to Portland? Everything should work, and my notebook should reflect the Portland values.
This is just a variable at the moment, so we have to tell Quarto that this is actually a parameter. And the way Quarto does parameterized reports is it relies on another Python project called Papermill. Papermill does this for notebooks, so it creates, like, if you want to turn one notebook into many notebooks, that's what it does. So Quarto just uses that tool, and the way you set parameters there is you add a tag to the code cell where you're setting the variables that are parameters, and that tag is parameters. So that's how Quarto is going to identify that that particular variable, city, is a parameter that we might want to change when we render.
And once that's set up, we can now render with different values. So if we just run Quarto render on this climate.ipynb, we get back out our Corvallis report, because we've got that sort of as our default value. City is Corvallis. If we want to render with different parameters, we can add this P option and now say, okay, let's city be Portland. And then out we get our Portland report. It's still going to come out to climate.pdf, which isn't ideal. There's another option called output-file that we could provide and say, actually, no, put that in Portland.pdf so I don't get super confused.
Automating rendering for many cities
So to automate this, if we need to do this for 50 cities, it's just a matter of rerunning Quarto render 50 times. And that's something that is sort of up to you how you do. You probably have a favorite way of doing that. I'm going to show you one way. This is how I would do it. I would like to have some kind of object that has everything I want to do, all my cities.
So I've got a cities. This is a polars data frame. I've got a column called city. And I've got a column for the output file name. And the reason I've got both there is that cities are annoying. Sometimes they have spaces in them. Sometimes they have dots in them. I do not want them in my file name. So there's just a little processing I'm doing to take a city string and turn it into something that should be a nice file name. So that has about 50 rows in it.
The way I'm going to iterate over it is just to do this with this iterate underscore rows function. So basically I'm just running down each row of that data frame. And I am using the Quarto package. There's a function in there called render. It literally is just running the Quarto render command line. It just happens to be a Python function. So each time I'm rendering it, I'm passing in that one notebook. I'm setting up the parameters based on the city column. And I'm specifying the output file based on that output file column. And that's that. So I would then run that script, sit back, and outcome my 50 reports. So that gets us from one notebook to many reports.
Making reports look polished
The bit that's perhaps missing from this picture compared to the picture I showed right at the start is that these reports aren't particularly pretty. They are utilitarian. They work. But they're not particularly pretty. So the other thing I want to talk a little bit about is making pretty reports.
So the first thing you can do to make a report prettier is just to adjust things like colors and fonts. And the easiest way to do that with Quarto is to make use of another project called brand YAML. And this is another... This is separate from Quarto. Also built by Posit. And it's just sort of a description of how you might write brand information in a file. So this is an example of a brand.yaml file. And you can see we do things like set up some colors. So there's a color palette that just lists some, in this case, hex colors and gives them names. And then we can say things like, well, the foreground should be charcoal gray and the background should be white. We're going to see in a moment that's going to be translated roughly to like the page color is white and the font is charcoal gray. You can do things like describe fonts. So here there's a font being described that's Montserrat from Google. And I'm going to use that in the headings. And it's going to be forest green. So forest green is another one of those colors we set up in the color section. And you can do things like describe where your logos are living.
So if you just drop that file, so an underscore brand.yaml file, into the same folder as this notebook that's being rendered, Quarto will pick it up. So it will automatically see it. And that's going to change everything in that output Typst PDF that was generated with markdown-like elements. So this will affect, in this case, the heading. Corvallis is now forest green. It's using that Montserrat font. The little summary statement is in that charcoal gray. The page is white and the logo is on this page. There's a little bit more control you have over that logo for the Typst format. And that's something that you can do in that document header. So here I'm saying put the logo up in the right top. And it should be about an inch in size.
The thing that Quarto can't do for you is change the colors in the things you're producing in your code cells. So there it's a little bit more on you, but there's a lot of help you can get. So there's a brand-yaml package that will read in that same file. And then you can pull out the elements from your brand and use them in your code. So for instance, I had a variable called highlight underscore color and I am now setting that to the secondary color of my brand that I defined in that brand file. That happened to be an orange color. So that's what I'm using for the trace for this year that's showing up in orange. I'm pulling that color from my brand. Also in the code, I'm not showing it in the code right there, but I'm pulling out the forest green to use as those lines that are last 30 years. So that's kind of a simple way to get a little bit of customization in that PDF.
The way to get things looking really sharp is to dive into Typst. And in the interest of time, I'm not going to teach you everything about Typst. I'm really going to say, here's an example of the kinds of things you can do. So for instance, like now this report, the big change is there's this big green banner at the top. Corvallis is now white on that banner and there's this little map showing you where Corvallis is in the state of Oregon. So that little map is being produced by code in my notebook. It's Python code to produce it. But it's Typst code that's saying, put it up in the top right inside this big green banner. So Typst is super powerful for that kind of customization and so much more approachable than LaTeX. You can have a look at the source for this particular one in the repo. And there are a few bonus tips at the end of the slide deck about how to work with Typst combined with Quarto.
Accessibility of PDFs
So it's important to sort of acknowledge here the accessibility of PDFs. So I think we should all be aiming for accessible PDFs. There is in fact legislation coming for some of us that will mandate accessible PDFs. But currently neither format Typst or format PDFs, so the one that goes through LaTeX, produce tagged PDFs. So PDFs produced this way will not currently meet these standards.
There are some solutions. So I've seen some people have quite good success by not targeting PDF directly, so not using Typst or PDF, but going to Word. Word first and then using Word's tools to output a PDF. And that generally creates PDFs that will meet these standards. And the other option is to not use PDF. So PDF is great for these really sort of visual static documents, possibly even ones you are literally going to print on a printer. But if you don't need that, use HTML. That is generally much easier to get to an accessible level. And starting with Quarto version 1.8, websites produced with Quarto will pass the AXCOR checks by default. So AXCOR being one tool for assessing the accessibility of something.
Summary and other features
Okay, so that's sort of Quarto for parameterized reports. The advantages are like you only have to manage one notebook, and you can be playing around in that notebook tweaking things. It's a really nice way to work. You have one notebook that you tweak until you've got a really nice report. You can render to one or many formats. So I showed you rendering to PDFs, but you might use the same notebook to render to PDF and HTML. And it's really sort of you do you automation, right? You have to call that Quarto render many times if you want to do it with many values of the parameter. But there's lots of ways to do that. And it's sort of you do the way you'd like to do it.
There are lots of other things that I kind of haven't been able to talk about but are or work great with parameterized reporting. So tables work great. For instance, tables that come out of the great tables package work great with Quarto. They work great in these kind of reports. There's a thing called the include short code. And that is a way to include other, for example, markdown files in your notebook without like basically if you don't want to have a ton of markdown in your notebook, you can use this just to pull in that markdown content from somewhere else. There's also this thing called the contents short code that just lets you rearrange content. The biggest use case for that is you have a code cell that creates some output rather than having it show up just where you put that code cell. You can take that output and put it somewhere else. And often that means in the markdown cell you want to include it there.
You can also show content conditional on format. So if you're doing two versions of a report, one that goes to HTML and one that goes to PDF, you might want different things. So it might be that you want some kind of interactive plot in your HTML version versus just a static plot in the PDF version. So there's a way to have those objects only show up in one format while still using that one notebook. And then everything I've shown you has been using a Jupyter notebook as the source document. But Quarto also has a plain text format that has an extension .qmd. It's a variant of markdown. The great thing about that is that it's much easier to use version control with. It's much easier to copy and paste examples if you're trying to teach people about it. So if that exists, consider it.
And that's all I have. So thank you very much. There's some links here about how to get started with Quarto, where to find these examples. If you've got general Quarto questions, the GitHub discussion is a great place to ask them. If you have anything here you want to ask me, feel free to email me. A big thanks again to JD Ryan for her inspiring example. And she gave a talk a couple of years ago that is a great way to see if you want to see how this looks in the R ecosystem. So thank you.
Q&A
Okay. Excellent talk. We have quite a few questions that are in the Slack. And so for those of you joining, please go ahead and reply to the talk announcement in the track general channel with any questions you might have. Going by emoji responses, we have a question from Ariana. So in this example, you show simple operations like filtering and access labeling as your parameterization. But can you parameterize more complex operations like using an entirely different dataset or running a model with maybe multiple different tuning parameters?
Yes. I think the answer is yes. I kept this example intentionally simple so you could see how it sort of looks. But anything that you can translate to a single variable value could be a parameter. It could be many parameters. So, yes. I think is the answer.
And we also have a question from Jay. When previously trying to parameterize reports like this, I struggled to use the parameter in the YAML metadata. For example, getting the metadata title to be Corvallis and Portland. Is something like that possible?
That is an excellent question and something I also struggled with. There are some fields in the YAML metadata that you cannot use a parameter in because it gets processed before the computation happens. So you sort of saw one approach here. The approach with the header was to just find a different way to set the title. You can set titles by setting level one markdown headers. That's one approach. There is another possibility where you can actually create YAML blocks. You can have more than one YAML block in your document. So it's possible that you could create another YAML block. But again, it depends. Some of these things Quarto just processes before you hit the computation. So it depends. Depends on the actual option.
This is a question for me. You mentioned that you built this talk with Quarto. Very cool. You showed also the Quarto render. But I also happen to know that there is a Quarto serve. So if you're building a slide deck like this, can you use Quarto serve to render your talk live in the browser while you're writing it? Or can you only generate the slides statically afterwards with Quarto render?
I only showed you Quarto render. In this instance, Quarto serve and Quarto preview would do the same thing. I tend to use Quarto preview because the Quarto extension and VS code has a preview shortcut. And yes, that's exactly how you use it. So when I'm working on my slides, I did these in a QMD. So I had that open. I was running Quarto preview. And there's a setting where it will render on save. So I'm actively editing the slides, hit save, the preview of the slides updates instantly. Almost instantly.
Very cool. Okay. So going down the list now, we have a question from Lauren. Does Quarto have a way to render to HTML that is specifically formatted for email? Yeah. It does. With a very large caveat. And that caveat is that it's designed for emails that will be sent in Posit Connect, which is one of Posit's professional products.
And then we've got a question from Kadar. Is the Quarto CLI installation necessary before you can use Quarto inside of notebooks? Yes. Yeah. It is. But there's more than one way to get it, right? You can go to Quarto.org and grab an installer. It's also available through pip. So you can pip install Quarto CLI. I think also Conda Forge? Yes. Yeah. Awesome.
Okay. Joshua asks, can Quarto also render ordinary markdown.md files? And what about collections of markdown and IPython notebook files that make a single document? Was the last part into a single document? I think that's what it was intended to mean. Like, so can you take a collection of multiple, like, markdown or I think it actually has to be like Q markdown files and IPYNB files and make those into a single document. So, like, taking multiple inputs. Yeah. So, okay. Let me answer those in part. So, you can definitely use Quarto to render a .md, like a plain markdown document, for sure. You can have a project that mixes markdown and Python notebooks, and Quarto will be happy with that. If you want to get them all into a single document, that's where things may get more complicated. You can certainly render them all into a single website where those documents are separate pages. It will be a little bit more work to make them all into a single page.
Okay. Great. So, there's still a few more questions, but unfortunately we are at time. So, maybe you can jump in the Slack and answer those if you have any time. But let's thank Charlotte once again for a fantastic talk.
