The letter S in a light blue, stylized speech bubble followed by SpeakBits
SpeakBitsThe letter S in a light blue, stylized speech bubble followed by SpeakBits
Trending
Top
New
Controversial
Search
Groups

Enjoying SpeakBits?

Support the development of it by donating to Patreon or Ko-Fi.
About
Rules
Terms
Privacy
EULA
Cookies
Blog
Have feedback? We'd love to hear it!

Parsing Protobuf Definitions with Tree-sitter

relistan.com
submitted
a year ago
byjosephtoprogramming

Summary

Protocol Buffers (protobuf) can really save time, and headache by parsing your definitions. The usual tool for doing that is protoc. It supports plugins to generate output of various kinds: language bindings, documentation. But, if you want to do anything custom, you are faced with either using something limited like protoc-gen-gotemplate or writing your own plugin.

In Go, the bindings are not really native structs and require a lot of work. We use a mapping layer to paper over this and to make these easier to work with. We were maintaining custom mappings by hand. That’s a waste of time and even getting GPT to write the transformations back and forth is annoying.

Tree-sitter has bindings that enable parsing languages and data formats. The library supports various methods of access to the parsed tree, but the one we’ll use here is a query expression that will extract only the data we care about. We can use an Sexpression to query the parsed trees.

If I highlight things in the document, I see them reflected in the tree, and vice versa. From the :InspectTree panel, you can open the query editor by typing :EditQuery. This brings up another pane where we can type queries and see them reflect in the original document.

This is an example of a single query that will extract all of the required data from the protobuf definition. We can then walk the results to generate a structure more easily to reference in code. Here you can see me traversing a query that I built, and how the editorhighlights the matches.

The query and cursor code is the last piece of code to show. Here we define the query, ask Tree-sitter to kick off the query. We then weloop over the matches, inspecting their name and then building up the maps. That should look something like this:

QueryTree runs a Treesitter query over a pre-existing tree. We can call ParseMessage() and we get back a Message{} struct that is populated with our message name, fields, and enums. It’s up to you what you do, but that gets you started.

 bow tie, bow-tie, bowtie wool, woolen, woollen banjo sombrero-0
12

3 Comments

3
throwschen
a year ago
Everywhere I've worked th steel thread schema definitions have been the pipe dreams that never happened, I would love to see how they accomplished that orchestration!
2
josephOP
a year ago
It's always a dream for anyone dealing with data but like you said, really hard to orchestrate. I've never been able to get everyone on board.
2
justadev
a year ago
Always been scared of relying on proto buffs but only somewhat understood their importance.