At the end of IODC16, Roderick Besseling of Cordaid asked me a simple question: hurricane Matthew has hit Haiti, with well over 800 casualties reported already. Can we see in the IATI data which humanitarian responses have been started? Can we make that data available on HDX, the Humanitarian Data Exchange?
Sure, I thought, with the data stores and interfaces available, that should be possible, right? It turns out to be “more complicated”.
My first call was to d-portal. I had a look at all activities in Haiti that are active. It’s possible to download that list as a spreadsheet, but then you still need to figure out which ones are a response to Hurricane Matthew. 952 activities to go…
I also had a look at the Generator for widgets, but couldn’t find a way to answer my question there either.
There also is the IATI datastore. I first used the query builder, trying to get a list of activities. Again, I basically can only look for all activities in Haiti, this time not even “just the active ones”. So now I have a spreadsheet with some 7,047 activities to filter further…
The datastore also has an API with more options: basically, the query builder just helps create the API call for a limited set of scenarios.
So lets limit it to activities where Haiti is mentioned as recipient-country, and that were started after September 15. Or, if you, like me, prefer XML:
curl -o haiti.xml "http://datastore.iatistandard.org/api/1/access/activity.xml?recipient-country=HT&start-date__gt=2016-09-15"
It only results in one activity, and that’s not a response to Hurricane Matthew. (At least it’s easy to check that for a single activity…)
I used the OIPA API with the same question: activities in Haiti started after September 15. OIPA does JSON, and I haven’t found yet how to indicate the return format in the link:
curl -H "accept: application/json" -o haiti.json "https://oipa.nl/api/activities/?recipient_country=HT&planned_start_date_gte=2016-09-15"
OIPA also says there is one result (the same as the IATI data store above). A search with actual_start_date rather than planned_start_date again returned the same activity.
OIPA also lets you search for text, so I did an alternative search for any activity with a planned start date from September 15 on, with the text “Matthew” in either the title or description.
curl -H "accept: application/json" -o matthew.json "https://oipa.nl/api/activities/?q=matthew&q_fields=title,description&planned_start_date_gte=2016-09-15"
This yielded no results (and the same question with actual_start_date_gte also gave no results).
Update Mon Oct 10, 12:40:
Vincent reported that OIPA now does return results:
— Vincent (@vwestende) October 10, 2016
It also helped me discover how to ask for more results and in JSON format via the URL:
curl -o matthew-2.json "https://oipa.nl/api/activities/?format=json&q=matthew&q_fields=title,description&planned_start_date_gte=2016-09-15&page_size=100"
And it also makes it possible to download the information in a spreadsheet.
My own data snapshot
I then had a look at my own daily snapshot of IATI files: I updated my local database, and did the text search for all activities with the word “Matthew” in the title or description, starting after September 15.
//iati-activity[contains(lower-case(string-join((title,description)," ")),"matthew") and activity-date[@type=("1","2","start-planned","start-actual") and @iso-date gt '2016-09-15']]
This actually results in 14 activities! All are published by a secondary publisher: 13 through Global Giving, 1 through Inter Action’s NGO Aid Map.
I looked at some additional queries: anything with the humanitarian flag? No new activities found. Anything in Haiti with the start date after September 15? Also no new activities.
In my visualisation, each activity is linked to its page on d-portal, but the activities in red seem to be missing there at this moment. Perhaps because the activities were published after d-portal took a snapshot, and before I did: we’ll see if they are available in the coming day(s).
Update Mon Oct 10, 12:40: all activities are now on d-portal as well, so that indeed was probably due to a timing difference in making the snapshot.
It does lead to some further questions:
- Maybe the datastore and OIPA don’t publish data from secondary publishers? (I couldn’t find information on this quickly).
- How does everyone deal with errors? I know my database accepts anything that is valid XML, so probably including more activities than others who validate against the IATI Schema as well, or do even more checks.
- How many IATI activities are out there now…?
- if I ask for all activities at OIPA, it says there are 604,583 results
- if I do count(distinct-values(//iati-identifier)) on my snapshot, I get 632,905 activities
- if I ask the datastore via http://datastore.iatistandard.org/api/1/access/activity.xml it reports 571,746 results
- if I look at the IATI dashboard, it says there are 625,729 unique activities (and 648,920 in total, which is close to the 648,633 in total in my snapshot: the difference could be due to the timing of the snapshot again)
Roderick’s question is simple. Getting an answer is not (unless I am missing something here). The answers themselves leave things to be desired. We’re not serving this use case very well.
And I think that’s a problem for IATI…
As an IATI community of practitioners, we need to discuss how to serve users better: we’re not done when we have produced XML files that contain relevant data. We need to provide simple and reliable interfaces too.
— John Adams (@johnthegeo) October 9, 2016
— Leigh Mitchell (@leighhmitchell) October 9, 2016
Update Fri Oct 28, 20:00: there is now a call-out to the IATI publishers with recommendations on how to add classification information to their data.
Also, already after my previous update, Wendy suggested how to do text search on d-portal:
— Wendy Rogers (@whrogers) October 11, 2016