Skip to content

2010

Let’s build a “Debian for Development Data”

I just returned from an intense week in the UK: an IKM Emergent workshop in Oxford, and the  Open Government Data Camp in London had me almost drowning in “open data” examples and conversations, with a particular angle on aid data and the perspectives of international development.

As the result of that, I think we’re ready for a “Debian for Development Data”: a collection of data sets, applications and documentation to service community development, curated by a network of people and organisations who share crucial values on democratisation of information and empowerment of people.

“Open data” is mainstream newspaper content now

Mid 2009, after the 1%EVENT, a couple of innovative Dutch platforms came together to explore the opportunities of opening up our platforms: wouldn’t it be great if someone in an underdeveloped community had access to our combined set of services and information?

We had a hard time escaping new jargon (federated social networks, data portability, privacy commons, linked open data, the semantic web) and sketching what it would look like in five years. But then again, suppose it was five years earlier: in mid 2004, no-one could predict what Youtube, Facebook and Twitter look like today, even though many of us already felt the ground shaking.

  • The technical web was embracing the social web, of human connections.
  • The social web pushed “literacy”: people wanted to participate and they learned how to do that.

A year and a half later, “open data” is catching up with us, and going through a similar evolution. Governments and institutions have started to release data sets (the Dutch government will too, the UK released data on all spending over £25,000 on Friday). So when will the social dimension be embraced in open data?

A week of open data for development

At an IKM Emergent workshop in Oxford, on Monday and Tuesday, around 25 people came together to talk about the impact of open data on international development cooperation. We discussed when we would consider “linked open data” a success for development. One key aspect was: getting more stakeholders involved.

Then at Open Government Data Camp (#OGDCamp) in London, on Thursday and Friday, around 250 people worked in sessions on all kinds of aspects of open data. Several speakers called for a stronger social component: both in the community of open data evangelists and in reaching out to those for whom we think open data will provide new opportunities for development.

At IKM, Pete Cranston described how his perception of access to information changed when a person approached him in a telecentre to ask how the price of silk changed on the international market: he was a union representative, negotiating with a company who wanted to cut worker salaries because of a decline in the market price. Without access to internet or the skills to use it, you don’t have the same confidence we have that such a question can be answered at all.

Then at OGDCamp, David Eaves reminded us that libraries were (partly) built before the majority of the population knew how to read, as an essential part of the infrastructure to promote literacy and culture 1.

Telecenters fulfil a role in underdeveloped communities as modern-day libraries, providing both access as well as the skills to access information and communication tools via the internet.

But we don’t have “open data libraries” or an infrastructure to promote “open data literacy” yet.

How open source software did it

It shouldn’t be necessary for people to become data managers just to benefit from open data sets. Intermediaries can develop applications and services to answer the needs of specific target groups based on linked open data, much as librarians help make information findable and accessible.

There are also parallels with open source software. Not every user needs to become a developer in order to use it. Although it is still to think otherwise sometimes, the open source movement has managed to provide easier interfaces to work with the collective work of developers.

The open data movement can identify a few next steps by looking at how the open source movement evolved.

Open Source Open Data
Software packages (operating systems, word processors, graphics editors, and so on) are developed independently. Each software package can choose the programming language, development tools, the standards and best practices they use. Data sets (budget overviews, maps, incident reports) are produced independently as well. The data formats and delivery methods can be chosen freely, and there are various emerging standards and best practices.
Communities around software packages usually set up mailing lists, chat channels and bug trackers for developers and users to inform each other about new releases, problems, and the roadmap for new versions. The mantra is “many eyes make all bugs shallow”: let more people study the behaviour or the code of software, and errors and mistakes will be found and repaired more easily. Data sets mainly are published. As Tim Davies noted in one of the conversations, there don’t seem to be mailing lists or release notes around data sets yet. To deliver the promise of a “wisdom of the crowds”, users of data sets should have more and better ways to provide feedback and report errors.
Open source software is mostly used via distributions like Debian, Redhat, Ubuntu, separating producers and integrators. A distribution is a set of software packages, compiled and integrated in a way that makes them work well together, thereby lowering the barrier of entry to use the software. Distributions each have a different focus (free software, enterprise support, user-friendliness) and thus make different choices on quality, completeness, and interfaces. Perhaps the current data sets released by governments could be considered “distributions”, although the producer (a department) and the integrator (the portal manager) usually work for the same institution. CKAN.net could be considered a distribtion as well, although it does not (yet?) make clear choices on the type and the quality of data sets it accepts.

Software distributions make it possible to pool resources to make software interoperable, set up large-scale infrastructure, and streamline collaboration between “upstream” and “downstream”. The open character stimulates an ecosystem where volunteers and businesses can work together, essential to create new business models.

Towards a “Debian for Development Data”

To sum up several concerns around open data for development:

  • Open data is currently mainly advocated for by developers and policy makers, without a strong involvement of other stakeholders (most noteworthy: those we like to benefit in underdeveloped communities). It tends to be driven mostly by web technology and is mostly focused on transparency of spending. It does not take into account (political) choices on why activities were chosen, and also lacks a lot in recording the results.
  • Data sets and ontologies are hard to find, not very well linked, with few generic applications working across data sets, and examples of good use of multiple data sets. Once you want to make data sets available, it is hard to promote the use of your data, provide feedback loops for improvements, administer dependencies, and keep track of what was changed along the way and why.
  • There are hardly any structural social components around current open data sets, repositories and registries.

So why don’t we start a “Debian for Development Data”?

  • A Social Contract and Open Data Guidelines like those for Debian can capture essential norms and values shared by community members, and inform decisions to be made. The contract can for instance value “actionable opportunties” over financial accountability. The Agile Manifesto is another example to draw from.
  • The community should set up basic communication facilities such as a mailing list, website, and issue tracker, to ease participation. Decision-making is essentially based on meritocracy: active participants choose who has the final say or how to reach consensus.
  • The data sets should be accompanied by software and documentation, to take away the problem of integration for most end users. Each data set and tool should have at least one “maintainer”, who keeps an eye on updates and quality, and is the liaison for “upstream” data set publishers, offering a feedback loop from end-users to producers.
  • The CKAN software (powering the CKAN.net website mentioned before) draws on the lessons from distributions like Debian for its mechanisms to keep track of dependencies between data sets, and has version control, providing some support to track changes.
  • Ubuntu divides packages in categories like “core”, “non-free” and “ restrcited” to deal with license issues, and to express commitment of the community towards maintaining quality.

We stimulate the social component by providing more stakeholders a point of entry to get involved through socio-technical systems. We stimulate literacy by offering the stakeholders ways to get open data, publish their own, experiment with applications, and learn from each other. And we circumvent the tendency towards over-standardisation by doing this in parallel with other initiatives with sometimes overlapping goals and often different agendas.

1A quick check on Wikipedia indicates this seems to have mainly been the case in North-America, though.

Smarter crowdsourcing

Paul Currion has written a critique on Ushahidi and crowdsourcing in humanitarian crises. I think he misses quite a bit of what actually went on, it’s like me judging the effectiveness of institutional aid based on what I see and hear on TV. Robert Munro has answered Paul’s critique with a more in-depth review of what happened and didn’t show up on Ushahidi.

I do agree with Paul’s (somewhat hidden) observation that tapping into an existing infrastructure (in the case of Haiti: the Open Street Map community) is a next step. I’d generalise that: tap into an existing social infrastructure. Consider the Haitian diaspora as such.

One way to look at crowdsourcing is as "a random group of people connected by technology figuring out processes to address a one-off goal". But that’s still a rather centralised view: an unconnected mass of people coming together like a flash mob.

A better way would be to consider socio-technical architectures: groups of people connected by technology, establishing (new) patterns of collaboration for on-going goals. That’s more a peer-to-peer view: an ad-hoc configuration of groups of people with different skills coming together to address a complex situation.

ResRap 3, 2, 1, Go!

Two weeks ago the #resrap 2009-2010 project kicked off at the Dutch Ministry of Foreign Affairs: the biannual reporting of results of Dutch international development aid. It’s the second time the Ministry works together with civil society (this time at a more ambitious level through Partos 1) to report on our joint Dutch contributions to the Millennium Development Goals as completely as possible.

Earlier, I used the 6-minute film A Case For Open Data In Transit” to illustrate my drive as member of the #resrap web advisory group, to not just collect data for analysis, but also make it available as raw data. Using the approach presented by Joshua Robin at the Gov 2.0 Expo 2010, last May: Focus on 3-2-1.

The current focus in the plan is to collect raw data, and then

  1. Let various groups analyse the data and write chapters on each of the Millennium Development Goals.
  2. Publish those as a book and a website by September 2011, and
  3. Then to have a look at how to further make things available in a joint website, perhaps enabling others to download the data in raw form.

But why let the still mostly untapped reservoir of positively motivated talents and expertise (data analists, programmers, journalists, and so on) wait a year? They can come into action right now!

  1. Make the raw data available while collecting it. The tedious task for the ICT and Monitoring&Evaluation employees to provide data dumps turns into an engaging conversation on what is needed to make the data available publicly, in real-time, and what then can be done. Let’s start our own sector-wide /Open campaign!
  2. More people can help debunk myths, find new angles to present what we do, compare between countries, create benchmarks on topics, provide new services to organisations or companies.
  3. And we will have more tools, presentations, engagement, and insights available by the time the ResRap is ready.

To quote Tim O’Reilly once more: governments should be a platform, not a service. “Do the least possible, not the most possible, to enable others to build on what you do.”

We are already quite a bit on the way, so it’s not a radically different approach, just a radically different output.

  • Data on outputs and financial inputs of projects is relatively easy to get (of course, still with plenty of problems to solve). It will help to compare what is available and try to build mash-ups, to feed the discussion towards joint standards.
  • Data on outcomes is harder to aggregate, and requires a priori agreements between parties on how projects will be evaluated.
    • Within the Ministry, there are discussions on how to establish indicators that might be required from grantees under the Dutch MFS2 Programme 2.
    • Also, several people involved in the field of Monitoring & Evaluation formed an informal network to discuss “M&E 2.0” and how to create new ways of reporting results in a peer-to-peer and low-treshold manner.

Let’s go!

Next Thursday at PartosPlaza, transparency and open data are hot topics. There will be a workshop with the “M&E 2.0” people, and one with our ISHub friends and fans. And Tony German of AidInfo will explain why the International Aid Transparency Initiative is relevant for private aid organisations. Then in November, Dutch open data fans will go to the Open Government Data Camp in London. Plenty of opportunities to make progress on this topic!


1 “Partos is the national platform for Dutch civil society organisations in the international development cooperation sector”

2 In November, the Ministry will announce the budgets allocated to around 30 coalitions of organisations for the period 2011-2015, in total up to around 450 million Euro per year.

“A Case For Open Data”

Yesterday, Adam DuVander wrote on ProgrammableWeb about “A Case For Open Data In Transit”, a 6-minute film about public transit agencies opening up their data. The Streetfilm production provides some excellent examples and quotes to also make the case for (more) open data in international development aid. As Tim O’Reilly puts it: government should be a platform for society to build on.

The Dutch government spends quite a bit of money on international development aid (0.8% of GNI, one of a handful of countries to live up to their commitment 1). As everywhere, people want to know more and more how that money is spent, and what results are achieved. So the Ministry of Foreign Affairs regularly produces a Resultatenrapportage (Results report, or ResRap).

The report-writing for the period 2009-2010 has just begun, and should also result in “an interactive website”. As a member of the web advisory group, I hope to help push the notion of “interactive” a bit beyond clickable maps and animated charts.

Following initiatives like DFID’s project information in the UK, the Open Government policy as applied in the US, it would allow Dutch information on development aid to be aggregated in detail in websites like AidData, or build on work done on standards and datasets in groups such as OKFN’s Working Group on Open Knowledge in Development.

“ A Case For Open Data In Transit”:

  • Transit agencies for a long time felt that they needed to be the source of information for their customers. But once they opened up their data, customers suddenly got a lot more choice in getting the information they needed. Without extra cost for the transit agency.
  • Would you have thought of developing an electronic sign board for a coffee shop to show the times of upcoming buses at the bus stop in front of them?
  • The New York MTA went from suing people for re-using their data to engaging with developers.
  • Tim O’Reilly asks government to move away from seeing themselves as a service vending machine, but let others deliver those services. Instead: be a platform. “Do the least possible, not the most possible, to enable others to build on what you do.” You’ll create “the capabilities for people to say: ‘We did it ourselves’”

1 Both within the UN and the EU, countries have pledged to spend at least 0.7% of their GNI on official development assistance, http://en.wikipedia.org/wiki/Development_aid

Tasktop to improve a knowledge worker’s productivity

I’ve been using Mylyn for quite some years now. Mylyn introduced the concept of task-focused work: activate a task in your to-do list, and only see the files relevant to that task. Tasktop, the company behind Mylyn, extends Mylyn as Tasktop, with even more features, and promises “improved productivity, guaranteed.”. It works great when I am developing software, and also could support me as knowledge worker, for instance by managing bookmarks and browser tabs in Firefox. But I’d like to see it offer more support for task management within Firefox too. A bit like this.

Mylyn and Tasktop

If you’re a developer, you probably at least had a look at Eclipse at some point. And perhaps at the Mylyn extension, to connect it to various bug trackers to populate your to-do list. It provides a standardised way of keeping track of issues in bug trackers, and helps to focus on the files relevant to a specific task. Switching from one task to another becomes lots easier, and so the cognitive overhead of managing tasks (especially finding the files associated with a task) is reduced significantly. As Mik Kersten, one of Mylyn’s masterminds, demonstrated in his PhD 1.

Mik founded Tasktop, a company that takes the “task-focused desktop” approach even further, connecting Mylyn also to email (Outlook, Gmail, IMAP) and to the Firefox web browser.

And Mylyn and Tasktop have a lot more features, many of which I am not – or perhaps: not yet – using.

From developer to knowledge worker

These days, I hardly get to spend time on programming, but instead spend most of my time as knowledge worker. Firefox is now my predominant “desktop”, and I find myself flipping back and forth between Eclipse and Firefox quite a bit.

The people at Tasktop have been very responsive to feedback, and thereby also encouraged me to make regular contributions to their issue tracker, mostly with suggestions for improvements or new features. So I filed a suggestion to expose more of Mylyn and Tasktop in a Firefox extension.

I recently decided to try out Wireframesketcher, another Eclipse plugin, developed by Petru Severin, to sketch a bit how I think an extended TaskTop addon for Firefox might help me better.

Use case: stumbling upon a new thing for my to do list

../../../assets/posts/1d38dfe92a46d352bd21fc468c10fcd0_MD5.png

Step 1: See an interesting page

I somehow end up on a page in my browser for a conference, event, or topic I am interested in. For instance via email, a chat, a twitter message, a phone call, a visitor in my office. I want to quickly make a new task to capture what I’m looking at.

../../../assets/posts/e7c043f1bf8cff670187fa7c14122d24_MD5.png

Step 2: Collect information

Once I created the task or activated it, I want to perhaps add a few pages to the task context, and perhaps some notes or a copy of some text on a page. And maybe schedule it or add a due date, for instance a deadline for registration, or for submitting a proposal or a paper.

Ideally, the task context could also add downloaded files to the context.

../../../assets/posts/11b8f5199af1a4d7291f9fecfe576b95_MD5.png

Step 3: Back to previously active task

And then I want to go back to the task I was working on, or another one on my schedule.

In summary, my wishes:

  • create a new task or switch to another task within Firefox, without switching to Eclipse;
  • have access to notes, task context (especially bookmarks and visited pages), and scheduled and due dates
  • be able to add or remove pages that I have open to my task context.

1 Mik Kersten, “Focusing knowledge work with task context” (Vancouver, BC, Canada: University of British Columbia, 2007), http://www.tasktop.com/docs/publications/2007-01-mik-thesis.pdf.

All hands on deck: building civil society 2.0

I’ve been invited to talk at the World Congress on IT 2010, in the eGovernment track. Together with Beth Noveck, Ivo Gormley, and Greg Clark, we’ll have a session and panel called “Hey gov, can you hear me?”, moderated by Dom Sagolla. Arnout Ponsioen invited me to present a case from the perspective of civil society, and I chose to illustrate the possibilities of people all over the world working together in a moment of crisis: the Haiti earthquake. Here is what I had to say.

Ivo Gormly showed a summarised version of his film “Us Now”, with many examples of people working together in new ways. I want to add to that with a more in-depth look at one particular case, from the perspective of a “world citizen”.

On January 12, an earth quake destroyed Haiti, killed hundreds of thousands of people and devastated the lives of many more.

An international emergency response was immediately launched. We know the sort of images that come with that: get people and supplies to Haiti. But the disaster also had a new type of first responders: citizens from around the world helping from their own homes, offices and schools.

The internet and mobile phones have made it possible to contribute to disaster response from anywhere on the world, because, as concerned citizens, we can help in three areas:

  • We can help collect information from various sources.
  • We can help map that information and make it available and useful to the first responders at the scene.
  • And we have the means to self-organise, mobilise the skills and talents we need, and distribute the tasks to people who want to help.

Here’s how it worked for Haiti.

Data collection

Ushahidi is a platform that came into existence after the elections in Kenya, in 2007. Violence broke out, people had to flee their homes, and it was hard to get an overview of what was happening. A couple of programmers set up a system to gather eyewitness reports coming in through SMS, email, twitter, and the web, and place the incident reports on a map 1.

The software for the platform has been made freely available, and has been used numerous times since.

Within hours after the earthquake, a group of people at Tufts university had set up an Ushahidi platform, and worked with a mobile provider to set up a shortcode, 4636. Radio stations then helped to spread that number.

Messages came in from people trapped under buildings, asking for help. And also for instance from a hospital where they had 200 beds free, but no victims coming in yet. Being able to filter the messages in various ways helped people on the ground, and assessing the reports can be done by people around the world, for instance by giving a “thumbs up” or “thumbs down”.

Often the messages were in Kreyol, the local language, which few first responders spoke. And often indicating their location based on landmarks that had disappeared. So Ushahidi reached out the diaspora community to help translate and contextualise reports. For instance, they had a Skype chat channel open, to quickly get translations for urgent-looking messages.

Mapping

The OpenStreetmap community came together to produce up-to-date maps of the area. OpenStreetmap is a “Wikipedia for maps”, and started in the UK, to make street maps available for free 2.

OpenStreetmap already showed the power of an online community in January 2009, when Israel invaded the Gaza Strip. The maps of Google and Microsoft were not detailed enough, and the OpenStreetmap was actually even worse. After a call-out to help improve the map, it took around 48 hours to make OpenStreetmap the most detailed map.

Al Jazeera set up an Ushahidi system with the OpenStreetmaps, to get information for their reporting.

In the case of Haiti, various companies and organsations such as GeoEye, DigitalGlobe, and Google, quickly made their satellite images available to the OpenStreetmap community. The next day, someone wrote a program to have those images imported into OpenStreetmap automatically.

It took around 6 hours to get the community started, and someone in Germany set up a server the next day, so that the available maps would be updated every 5 minutes, and also made them available to be downloaded on the Garmin GPS devices used by search and rescue teams. Those search and rescue teams also started indicating which areas they were going to move into next, to help the community around the world prioritise work.

UNOCHA asked specifically to look out for new refugee camps appearing on the images. You can just barely see a few camps on the image here. It also posed a new problem: the GPS devices did not understand the tag “earthquake:damage: spontaneous_camp”, so the community quickly decided to also tag them as “tourist campsites”, to have them appear as little tents on those GPS devices.

The result was that in 3 weeks, one of the best maps of Haiti was produced by people who mostly never had been there. Here’s an image of the Situation Room at the World Bank, with a huge printout of that map on the wall.

The World Bank understood the power of such communities, and also did something new for the damage assessment. They normally have a small team look at photos of before and after the disaster, to make a detailed damage report. This time, they worked with various universities around the world, where experts each took on part of that huge task. Normally, such a process takes 6 weeks or 2 months to complete. This time, it was done in a few days. 3

Which shows the potential of the third element.

Mobilising and organising

It has become easy to mobilise and organise.

The owner of the domain name haiti.com quickly made that site available as a starting point for people to find ways to help.

People who had been working on maps before already organised themselves in for instance the Crisis Mappers Network, and a lot of the initiation and coordination of activities happened through their mailing list.

Wikis help to quickly organise information, and, as you can see here, there was a whole ecosystem of communication tools: websites, Google groups, Twitter hashtags, IRC chats, and groups on Ning, Facebook and LinkedIn.

And you can also see so-called CrisisCamps: people gathered for a day or a weekend, in schools, offices, homes, to work together to sift through the information, work on maps, and so on. With name tags, because many of those people had never met before. These happened in at least 25 cities around the world.

The Extraordinaries provide a platform for what they call micro-volunteering: things you can do for the greater good, even if you only have 10 or 20 minutes. So they started pulling in pictures from Flickr, with the tag Haiti, to filter out the ones that may contain useful information about people or places. You just answer a few questions, like “Is this picture related to the Haiti quake?”, and you can even do one or two on your smart phone while waiting for your coffee to be ready.

Another, bigger system is Sahana, a whole suite of disaster management software first developed in Sri Lanka after the 2004 tsunami. It helps to create maps, know which organisations are working where, and help match supply and demand between those, for hospitals or for other types of requests, such as aggregators, fuel, transport, and so on.

It helped for instance produce a list of 697 organisations working in Haiti, with contact details and so on, to help coordination.

Summarising

So hey gov, can you hear me? What can you do to tap into this potential? First and foremost: actively break through the currently dominating view of “us” and “them”, of supplier and client.

  • Work with people’s passion. Whether it is in response to a crisis, or in relation to the local or hyperlocal environment in which we live, we want to collaborate. Share responsibilities.
  • Make information available. Suppress the reflex to keep data inhouse. We already payed for it once, through our taxes, trying to monetise it again generally doesn’t work out for government.
  • Be what politics should be about. Affinity groups and self-selecting communities tend to create their own echo chambers, and reinforce their own beliefs. Actively connect those echo chambers, facilitate debate and bring opposing views and conflicting interests together.

Most of all: make legislation that enables the technological innovation that drives this type of citizen engagement. Sometimes legal solutions are easier and more effective than

technical solutions.

One satellite per child

In the meantime, citizens will develop new ways to use technology. This picture doesn’t show a blueprint for a bomb, but rather a set of materials put together by the MIT Medialab, and already jokingly called the One Satellite Per Child project. It enables even kids in for instance the slumbs of Lima, Peru, to start making aerial photographs and map their own neighbourhood. The sky is the limit.

1 “Ushahidi,” in Wikipedia, n.d., http://en.wikipedia.org/wiki/Ushahidi.

2 “OpenStreetMap,” in Wikipedia, n.d., http://en.wikipedia.org/wiki/Open_Streetmap.

3 Justin Mullins, “How crowdsourcing is helping in Haiti,” New Scientist, January 27, 2010, http://www.newscientist.com/article/mg20527453.600-how-crowdsourcing-is-helping-in-haiti.html?full=true.

Switching from Beagle to Tracker and solving the performance problems

When I update my laptop to the next version of Ubuntu (Lucid Lynx or 10.04 this time), I usually have a look at the general direction for some of the “ desktop core elements”, like desktop search. I decided to switch from Beagle to Tracker and hopefully have tackled the performance problems it seems to come with.

The Ubuntu community has been shipping Tracker desktop search for some time already. It seemed to often freeze up my computer completely while it was indexing files. Beagle, the best alternative, did not, and also seemed to have a better feature set. For instance, it indexed my chat logs in Pidgin nicely.

But Beagle doesn’t seem to be developed very actively, whereas Tracker, as part of the Gnome desktop, seems to be going actively towards supporting the Nepomuk semantic desktop. Which then paves the way to let other applications use Tracker to retrieve information. And store information!

Adding tags or a description to a photo in one application will make it available to other applications as well. Instead of letting each photo application build their own application-specific database.

All these applications also periodically want to go through my files and directories to see if there is new content that they can handle. It just adds to potential performance problems. But switching back to Tracker also meant switching back to its performance problems. As soon as some sort of disk-intensive activity started, my whole system froze.

But Ralf Nieuwenhuijsen gave an explanation about the background of the problem in the Ubuntu brainstorm.

“Currently, what happens is that linux saves a last-read-timestamp on every file. So when tracker indexes it, it also has to write it. Hence the trashing. This has become worse over time. Although most of you associate this with tracker, all file-io with lots of small files is horrible at the moment in linux. Nothing tracker-specific about it.”

That lead me to explore this “last read timestamp” a bit more: do I need it anyway? Apparently not: a pointer to discussions in the Linux community suggest that it might be switched off by default in the future, and let me to an article by Kushal Koolwal explaining the different options atime, noatime and relatime.

So I edited /etc/fstab, replaced relatime by notime, and remounted the disk. And started Tracker again. Had Rhythmbox running. Asked Eclipse to compare a project in CVS with its repository. All tasks that read (sometimes a lot of) files on disk. Without any hickups so far.

Lets hope the search results that Tracker delivers are useful too, in practice 🙂

chevron_left OpenOffice as (blog) writing tool

All hands on deck: building civil society 2.0 chevron_right

Tools that work: OpenOffice as a blog writing tool

Syndicated

This is a version of my earlier post, made to the short-lived "Tools that work" blog.

As my first contribution to the Tools That Work blog, why not present a tool to make the process of writing a blog post easier: t he Sun Weblog Publisher for OpenOffice. I still prefer to write text in a word processor, with the best tools for spell-checking, the simplest ways to add links, and Zotero to manage references to sites and literature.

![29c332a9cb651cb393554ed5d4a36f44_MD5.png](https://toolsthatwork.wordpress.com/wp-content/uploads/2010/04/screenshot-send-to-weblog-2.png)

Uploading a post as draft to a blog

But then the text needs to go into the blog: copy-pasting creates endless battles with an online “WYSIWYG” (What You See Is What You Guess) editor, reformatting lists and headers, or losing carefully crafted sentences through browser hick-ups and poor form handling.

Editing the HTML itself isn’t fun either. OpenOffice produces poor HTML output, “Export as” even worse than “Save as”. An unappealing alternative: coding HTML in a text editor, adding things like by hand. I’ve looked at HTML editors that would let me focus on writing instead of coding. KompoZer (follow-up to Nvu) is nice for producing reasonable HTML, but yet another tool and not really supporting the writing process. And it then still is an effort to get the proper part of the HTML into the blog.

I also tried ScribeFire, which lets you write a blog post and interact with the blog software from within Firefox, but it offers no simple way for simple structural mark-up like a

header. And, no support for references.

The Sun Weblog Publisher extension for OpenOffice seems to change the way I work: it adds a button to publish a document to a blog, using a variety of protocols to support different blog software, like WordPress. And another button lets you download existing posts from that blog, to edit them. The process of pushing a post into the blog has become a lot smoother.

![315a16b228566d9b9c94671a68c57fc5_MD5.png](https://toolsthatwork.wordpress.com/wp-content/uploads/2010/04/screenshot-3.png)

Upload and download posts from a blog

There is still some online processing to do, like adding tags and minor HTML cleanup. And I’m not sure how well it will handle images or complex layouts. But writing a post and pushing it into the blog has become a lot easier!

Edit

OpenOffice as (blog) writing tool

I’m a geek. So when it’s not a writer’s block keeping me from producing a blog post, I’ll dive into tools and techniques to “optimise” my writing experience before I start typing out sentences. Lets call it preventive productivity: getting a lot of related things done in order to be more efficient later. Like getting the tools and the work flow right. Perhaps I managed that, now that I can really use OpenOffice to write blog posts, with Zotero to manage my reference, and the Sun Weblog Publisher to push the result towards my website.

A writer’s technical block

I don’t really have a lack of stuff to write about (yet). But so far, I never was happy with how the writing went, as a process, as a work flow.

  • I keep quick notes in TomBoy: easy-to-use, always at hand, just enough formatting and wiki-style linking to add a bit of structure. Even somewhat useful for basic writing.
  • But then it needs to go into the web: endless battles with the “WYSIWYG” (What You See Is What You Guess) editor in Drupal and loosing my content over a disconnect; or hand-coding HTML in TomBoy or a text file with a regular editor.
  • And not to mention adding images, links and references… usually this means more hand-coding, copy-and-pasting, and so on.

Adding it all up, I usually end up spending 2 to 3 hours to get a blog post done, and around half of that is on the technical stuff.

That creates an additional problem: I don’t always spend those hours in one session. I want to be able to stop now, and continue later.

Failed contenders

I’ve looked at specific tools, like HTML editors that would let me focus on writing instead of coding. ScribeFire offers a few nice features in publishing a blog post from within Firefox, but offers no simple way for simple mark-up like

. KompoZer (follow-up to Nvu) is nice for building complete HTML, but yet another tool and not really focused on writing.

The most obvious choice to write texts is to use something like OpenOffice, especially when using Zotero to easily add bibliographical references. It has the best tools for spell-checking, the simplest ways to add links to sites, and so on.

But in its basic form, it doesn’t do a great job in producing HTML. “Save as” produces better HTML than “Export as”, but then you loose the special fields that allow you to later change references etc. And I’d still have to copy-and-paste the proper part of an HTML file into a new blog post.

Combining (new) tools

As part of today’s preventive productivity, I found the Sun Weblog Publisher extension for OpenOffice. Which adds a nice button to publish a document to a blog, using a variety of protocols. And also to download existing posts from that blog to edit.

It still produces problematic HTML, but at least my writing experience is improving a lot: quickly adding links, inserting references and footnotes, and so on.

In addition, having my preliminary blog posts in OpenOffice files makes it easier to use Tasktop, which promises better productivity by making task switching a lot easier. Their Firefox add-on tracks the sites I visit while working on a specific task.

A smoother process

Whenever I have an idea for a blog post, I add it as a task in Tasktop. That enables me to do a bit of planning for it, and also to activate it whenever I am looking for information to include. If I can’t finish it all in one session, I just stop the task, and when I reactivate it later, my browser tabs and the file I was editing are back in focus.

OpenOffice allows me to focus on writing texts while adding both links to sites and bibliographical references, and checking spelling and basic layout of headers, tables, and so on. I still need to decide whether a good template will help improve this step.

The Weblog Publisher extension lets me push my text to my website as a draft post, repeatedly if needed. I then need to do the last part of publishing on my website: adding appropriate tags, cleaning up HTML, and perhaps adding one or more images.

Does it work? Yes!

This is the first post I did this way. No references in this one, but adding the links was definitely lots easier. And the publishing part now only took 15 minutes (cleaning up some of the superfluous HTML, mostly). It definitely feels a lot smoother.

FOSDEM 2010, getting up to speed again

../../../assets/posts/5187bcb73db0da20810292ba75c7dd92_MD5.jpgRacing back to Amsterdam at 270 km/h, time to consolidate my takeaways from this year’s FOSDEM in Brussels. More geeks (5,000+ expected), more lectures (200+) and more topics I wanted to follow: succeeded with OpenOffice, Drupal, and CouchDB, but not with Mozilla and XMPP. A geeky overview of my takeaways.

OpenOffice

Already some 5 years ago, we put PDF and Word output of surveys into our WebEnq online survey software, and recently we added more extensive reporting options as well, giving partners in IICD’s Monitoring and Evaluation programme a text document with all major calculations and tabulations already filled in. And right now, Jaap-Andre and Bart are working on a way to generate hundreds of personalised PDF reports from student data on courses they took and evaluated.

In our first approach we worked with Latex, since it offered the kind of formatting options we were looking for (keep a question and all answer options on a single page, for instance). Latex is a complex beast to tame, and the conversion to PDF and Word is far from flawless. Bart found ODTPHP, a library that might help us use OpenOffice documents as reports templates instead, which definitely would help — but again, a lot of functionality to be desired.

Yet, the XML-based ODF standard feels like the best way forward. What I learned at FOSDEM:

  • Sun apparently offers a commercial “convertor in a box” to transform documents from one format into another.
  • Alfresco seems to have something like that as well.
  • The code for OpenOffice is slowly being split to between the filters and the UI parts, to allow headless services to be built more easily.

Bart Hanssens gave a high-level overview of some ODF aspects, with my takeaways:

  • ODF version 1.2 is still a draft specification, although OpenOffice 3 already uses it. The book OpenDocument Essentials by J.D.Eisenberg is freely available, and although starting to get outdated, still a good introduction into working with ODF.
  • It enables RDF enhanced meta data, XML-DSIG digital signatures, syntax and semantics for open formula, and front-end database functions. All things we could use sooner or later.
  • Officeshots.org is a web service to compare document renderings in various word processors (like Browsershots, Browsercam or Litmus app for web pages)
  • Apache Lucene has a Tika project to extract metadata and content from “almost anything”: ODF, Microsoft Office, HTML, PDF, mbox, multimedia files, …
  • He referred to OSOR, the Open Source Observatory and Repository: a European Union repository of projects aimed at public administrations.

Svante Schubert (Sun) talked about the ODFDOM project:

  • ODFDOM tries to break down complexity of getting stuff into ODF, and create testable parts. It separates an ODF schema layer from an ODF package layer.
  • They develop the API from test cases and test scenarios, which keeps the discussion about API elements focused on concrete cases with business value, rather than nice designs.
  • He also mentioned schema2template, a tool or library to create models from schemas, that enables easier comparison of schema versions.

And he mentioned LpOD, a project about which Jerome Dumonteil (Ars Aperta) spoke next. lpOD is written for Python, Perl, Ruby, and partly focuses on using ODF for XML repositories, more than on individual documents. Think of large multimedia sets like in the Louvres, with combinations of text, film, photo, and so on.

Drupal

The list of takeaways from the Drupal talks I attended is considerably shorter:

  • Thanks Károly Négyesi for an update of how Drupal 7 is different from 6. The debate whether Drupal is a CMS or a programming platform will probably not die for a while, and it is interesting to see the direction Drupal is taking, compared to Typo3 (which started by developing a whole new framework for their next version CMS). Given where Drupal 7 and Typo3 5 both are in their development and future direction right now, I think I’ll concentrate on Drupal as the platform of choice.
  • Roel de Meester offered a few new tips (masquerade, reroute email, and schema modules), but mostly I’m already using more or less the same toolset he proposed. And together with catching a last bit of the “upgrading” talk, I have to conclude that it’s still mainly a mess to get a real development-staging-live work flow running. DevelopmentSeed seems to be furthest with this.

CouchDB

Stephane Combaudon gave a nice introduction to CouchDB for people used to working with SQL/RDBMS. Document-oriented, working with JSON, RESTful, and then written in Erlang and using MapReduce functions… It definitely seems to make a lot more sense than the current struggle with database tables, but I haven done any functional languages or lambda calculus since my university courses.

No XMPP, no Mozilla 🙁

The XMPP room was full when I arrived there, so I thought I’ll then have a look at Mozilla’s talk about HTML5, but ended up in an even bigger crowd not able to fit inside that room. Mozilla’s update on Thunderbird, on Sunday, turned out to be rescheduled to earlier when I arrived there in time, and Mark Surman’s talk about Drumbeat in Europe was at the same time as a Drupal talk I wanted to see.

So only heard a few things about the XMPP sessions from a friend who did get in, and exchanged a few words with Mark in the hallway, but even our plan to hook up at one of the many parties at night fell through.

Thanks, FOSDEM

Again a year of getting up to speed with the latest and greatest in a short time, in enjoyable Brussels.