Data exchange standards

Reposting the discussion about existing catalogs from Code for All Slack:

Boris van Hoytema:

Hey everyone. I’m working with some Dutch and Italian public/gov programmers to create a standard publiccode.yml file for metadata in codebases of public code projects. This can cover the legal context, who is maintaining it and until when etc. Some Italian collaborators could madate this for italian public administrations if we can create a working version before March. I’d really love your contributions (in issues or pull requests)


It seems very close to Civic Json which has been developed since 3-4 years. How about reusing it? It seems current development is under Code for DC :

Boris van Hoytema [11:44 AM]:

The publiccode.yml is more aimed at public administrations and institutions and so is more about usability for other org’s, which is not really solved by this civic.json. However, it is worth looking at and thinking about how to learn/combine.

Milo van der Linden [11:51 AM]:

flashback -> see the #global-collaboration channel. We have been discussing civic.json and the g0v.json formats before. I even sent a i18n pull request to the g0v repository with an i18n proposal. So there is g0v.json, civic.json a proposal for i18n and publiccode.yml One more wheel and we can build a cart! :partyparrot:

I’d propose following next steps:

  • discuss things that we want to model and the user needs
  • co-create schema for each model (putting aside serialization, whether it will be YAML or JSON or other)
  • co-create an API
1 Like

We’ve massively used Wikidata as a source of information on software in the OGP Toolbox.
We use a wikidata field to put the URL of the matching Wikidata item (for an example see Loomio, under “Additional informations”).
We documented this approach with a table between our properties and the matching properties in Wikidata
Loic Dachary helped us as part of his Wikiproject on software

We also have a sourceCode attribute for tools documented on, which is used to add repo URLs such as GitHub’s, and a developer attribute* that can be used to add the name of organizations or individuals that are known to contribute code for the tool. Most of the data that we have for these attributes comes from scraping Wikidata, but not always.
For example for Your Priorities (I documented it myself a while ago):
Developer: Citizens Foundation

*In fact our tool attributes are community-driven, anybody can create them and use them as they please to describe tools; it’s even more open than OpenStreetMap!

What to you guys think about using Wikidata as a common ground for our different projects?

Why not? But that puzzle fits the “hosting + API” slot, right? The schema is still to agree between us.

I’d step back and discuss first what we want to catalog and why -> Civic tech catalog’s goals & user needs

We (mercifully) don’t have anything to add in the standards battle, so will look to see which gets the most adoption as we consider how to integrate. Wikidata is particularly interesting, to the extent it covers the data fields we’re looking to collect. As most of you know, we’re in the midst of figuring out how to upgrade to a more database-driven platform, and the discussion of technical exchange standards is very useful context as we decide.

1 Like

This is the API now (browsable, HAL). I would love to use Wikidata standards. How does this work?

@johan how do you map in Wikidata the “Used for” and “Used by” relations? Also do you store on Wikidata “Pros & Cons Arguments” ?

I’m wondering do you have any distinction between what data do you host externally on Wikidata and what do you host internally.

I wonder how do you store user data: collections, tag votes, etc.

Wikidata is just for factual information, I wouldn’t use it for arguments, tags, or even use cases.

Wouldn’t be wonderful if every catalogs, toolboxes, databases of tools could just share a common base of informations on software (name, logo, description, URL, developer, license…) instead of manualy adding from scratch and, worse, maintaining it?

Wikidata is just a central structured database of Wikimedia projects, and first and foremost Wikipedia. For example compare Loomio on Wikipedia and Wikidata.

I think the best way to approach this is the way Framasoft did with their Framalibre catalog I mentioned earlier. They created a Framalibre ID property in Wikidata and just added the ID to every software items that matched in Wikidata (tools like Mix n match help to automate that part).
After this match is done, interoperability is possible and you can harvest data from Wikidata to update your own database. (execute this request to see a list of wikidata items with the Framalibre ID property)

The idea is that an ID in Wikidata allows everyone of us to benefit from the wealth of information already in Wikipedia/Wikidata, while at the same time preserving what makes us complementary, because some of the properties/fields in our databases aren’t in Wikidata or other databases (use cases, special info, and whatnot) and we might want to keep it that way because every one of our projects respond to special needs!

Something I still have to explore is the possibility, in return, to update Wikidata from an external source with bots.

Bottom line: Wikidata is neat. :blush:

So that would be for the “tools” right, the things of type “software” (I don’t have software in Clarity). How about things of type “use cases”, any ideas for that?

(hello all!)

if you’d want to stick to Wikidata, you’d have to think of proxies to “use case”. as far as I understand it Wikidata is an inherently materialistic database. it’s a database of things and their properties. if users would willing to put in the work you could do things like “is used by” and then you could filter based on that.

but there’s also a “use” item and a “use” property on Wikidata both of which could be subverted to fit our preferred ontology.

use cases could also subclass “event” items (no link because new user)

1 Like