I just got an email complaining about web scraping on a common data presentation portal to fetch data for the android app “Global Nuclear Watch” and bragging about some steps they had taken to stop it. I felt an answer was apt:
“Just my 0.02$ thought – and (partially) not the official view of NRPA
There is a saying that goes like “You should be the best source of your own data” – if you are not that, and the data are interesting for someone, somewhere are going to present them in a better way.
Are the data public or not? If they are public, we shold just accept that things like that happens – in fact, at nrpa we are currently cooperating with some people doing web-scraping taking the stance that they are giving our data an added value and we are in fact happy that someone finds so much interest in our data that they actually do some work on them. (and also making the data better availble for our own use, those of us that have android phones have Global Nuclear Watch installed) We are also planning to offer the data as xml or json or some other format to make it easier for those adding value to our data.
If the data are not public – well then they should not have been made public in the first place, thenthey should have been hidden in a password protected system. Even though at the moment the present way of “scraping” has been stopped, if the interest is big enough, someone will find out how to do it in another way .- and this time they will probably not contact and try to cooperate. In worst case, if the interest should be high enough, someone may start a project hiring low salary people or starting a kind of internet action where people around the world start to manually read and type out the data – how to stop that? – that may also be pretty bad pr for our organisations “They say they share the data, but they only want to have it presented their own way, what are they hiding??”
Of cource each of you decides (within the limits of your local legislation) what to do with your data – but as soon as the data are out there, in my opinion, it is much better to cooperate with those who try to make something that we had not forseen or planned out of them. If there are “non legal” application around using our data, we can theoretically hunt them down and try to stop them, but that will cost a lot of resources and will probably at the end only give us even more bad pr. If we make the data available with some kind of requirement for use, of cource someone may still take the data and use them in other ways, but there will probably be even more people who use them according to our requirements which will probably more or less drown out the few people who (in our opinion) misuse our data.
For an example, see the Norwegian met office’s page on their data policy: http://www.yr.no/verdata/1.6810075 (if it comes up in Norwegian, there is a link on the middle a bit down to switch to the English version)”