Data Exploration Quick Wins with VisiData

VisiData turns five!

[...] even some things you might not think of at first: like filesystem metadata and API results and packet captures. Because everything is data.

VisiData is useful to me in so many situations that "just throw it to VisiData" has become an instinct.

Tour of a quick win - browsing Python package information

I was looking at the Python Package Index recently to browse release history for the Cloud Custodian package. Some contributors were discussing the total number of releases, and I was curious about the count from the PyPI side. I noticed that the release history page listed all releases but didn't show a total count. What next?

Step 1 - Browser Dev Tools

I was already in the browser, and the data I needed was right there. So I opened Firefox dev tools and found a usable CSS selector for a release label:

And checked how many times that class showed up on the page:

That answered my immediate question, and would have been a fine place to stop.

But I figured this would be a handy thing to check from the terminal. And PyPI has a JSON API. That makes my VisiData sense start to tingle.

Step 2 - Throw it to VisiData

I can get a JSON blob of Python package details by hitting PyPI's /pypi/<project>/json endpoint. If I already know what I'm looking for, I might use a combination of HTTPie and jq to get it. But in recon mode, I blindly throw it to VisiData:

vd https://pypi.org/pypi/c7n/json

And as is often the case, it just works! It gives me a single row of data for the c7n package, including a count of releases.

The bits of nested data are all collapsed by default, which effectively provides a summary view. If I want to dig deeper I can open one of those cells in its own sheet, or expand its children into their own rows/columns.

Again, this would be a fine place to stop.

Step 3 - Make it easier for next time

Up until this point I wasn't really sure how useful VisiData would be for this use case. But it feels like a pretty good fit, and I want to make it a bit smoother for next time. VisiData is extensible, and one of the simplest ways to extend its functionality is to add little bits of helper code to a ~/.visidatarc file. For example, I can use this snippet to tell VisiData how to handle a URL with a pypi scheme:

from visidata import VisiData, vd

@VisiData.api
def openurl_pypi(vd, path, filetype='json'):
    sheet = vd.openPath(vd.Path(f'https://pypi.org/pypi/{path.name}/json'), filetype=filetype)
    sheet.name = f'pypi_{path.name}'
    return sheet

And then, I can run:

vd pypi://c7n

And VisiData knows what to do - I get the same sheet I saw before, only with a more descriptive name and less typing.

The compounding value of quick wins

One great thing about VisiData is that it's easy to accrue quick wins over time, and their value can stack. As one example of this, consider that the JSON structure we get from the PyPI API has some embedded Markdown in it. That's viewable in VisiData as-is:

But it certainly looks prettier rendered using something like glow!

And that's quick to slot into my workflow because a while back I wanted to look at cell's with an external program. As in this post, I started with a little helper in ~/.visidatarc. Eventually it seemed generally useful enough to justify a life outside my home directory, so I split it out into vpager.

And that's how it can often go with VisiData. There's a beautiful exploration and evolution path, from local fiddling to personal helper code to full-fledged plugins. You can stick to the out-of-the-box experience, or take a ride on the extensibility train for as many stops as you want. VisiData doesn't judge, it just handles whatever you throw at it and leaves you with a lollipop and a smile. Enjoy :).

References

26