Super-viral Creative Commons licenses

As Richard notes, the new Yahoo! Creative Commons search returns any matching result which includes a link to one of the Creative Commons licenses, including his posts (and mine) talking about someone else’s use of a particular license. Creative Commons’ own search apparently only looks for embedded RDF licenses, but as a result finds fewer, as more and more people abandon the hack of putting RDF in comments in HTML, both for licenses and for TrackBack. So since people indicate their licensing by just linking to the license, Yahoo! assumes anyone linking to a license is licensing something under it, and is licensing the broadest possible URL: if they catch you with an entry linking to a license on your main page, they’ll return that as being what’s licensed. Makes for a rather viral license, if you can’t even mention it with a link without being infected.

Eiffel Tower picture by Norman Walsh

If everyone who was actually using a license followed the Creative Commons advice to use their logo, display it prominently, and clearly explain exactly what is and isn’t licensed, that wouldn’t be too big a deal. However, the example that Yahoo! gives in their documentation, searching for Eiffel Tower, returns as its first result an entry on Gothamist, which links to its license with a teeny tiny copyright symbol, and no discussion of what is and isn’t licensed. I happen to know that I’m looking at a weblog entry, and since the linked comment policy doesn’t mention licensing your comment under their license, whatever that copyright symbol intends to CC license, it doesn’t cover comments, but I’ll bet that J. Random “What’s a Blog?” Searcher doesn’t know that, and would think he could use text from a comment under the same license as any other text on the page. Then, there’s the photo: I suspect that it isn’t actually CC licensed, because it isn’t actually Gothamist’s to license. That’s why I grabbed a copy of Norm’s Eiffel Tower picture from which to create my derivative work. Even there, I’m on rather shaky ground: the HTML page links to a by-nc Creative Commons license (which I’m then required to link to as well, though only that one picture is under that license, no matter what a search engine tells you), but the metadata he extracts from the RDF he embeds in his JPEGs says “All rights reserved.” Luckily, I don’t expect him to sue me, and I’m at least reasonably sure that he does hold copyright in that picture; the Gothamist picture is certainly copyrighted by someone, but I have no idea by whom, or what rights they do or don’t reserve.

I don’t expect the ocean to boil, and I don’t expect people to add precise and accurate metadata: after all, even Flickr, which is quite capable of getting it right, claims in the embedded RDF of that page that the CC license applies to <Work rdf:about="">, which says that it is the HTML of the page including the photo that is licensed, rather than <Work rdf:about="">, which would say that it’s the photo which is licensed.

But if I was the Yahoo! lawyer who vetted their Creative Commons search, and let it loose without any disclaimer that “Yahoo! makes no assertion about what, if any, content in these results is actually offered under a Creative Commons license” I’d be hanging my head in shame.


Comment by Geodog #
2005-03-30 01:04:29

S/He will be hanging his/her head in shame as soon as your post makes its way through the blogosphere.

Comment by Norman Walsh #
2005-03-30 09:58:57

Hmm. More recent photographs explicitly give the CC license. I’ve made a note to update the license for all the old photos too.

You’re welcome to the picture under the Creative Commons Attribution-NonCommercial License. :-)

Comment by Firas #
2005-03-30 10:26:12

Where should the RDF go instead? I mean, if one insists on using RDF and not link rel="license" or meta name="copyright" etc.

Comment by Mike Linksvayer #
2005-03-30 11:32:40 finds far fewer licensed pages mostly because it is a small project and has only crawled a few million pages so far (the current index has 1.2 million licensed pages).

Of course you can link to a license without licensing the content containing the link. Any search result that includes such content when doing a CC-only query is inaccurate. The Y! search for CC is just a (very valuable) start.

Regarding imprecise metadata (subject is always the current page) — it’s the best we can do without deep integration into whatever publishing software a user is using. It’d be great to have explicit license statements about every image and other non-page resource licensed, but requiring users to generate such statements would prevent most users from publishing metadata at all. See my related comments here and here.

As a partial workaround the CC search engine takes dcmitype assertions as hints that a page contains licensed images, video, audio, etc. (That’s what the ”format” if chosen restricts on).

Firas: One can link to an external RDF file in a page’s head, but this is beyond most people. I hope to replace admittedly ugly embedding RDF/XML in HTML comments with RDF-A when that is finalized.

Comment by Firas #
2005-03-30 13:06:48

Ah. I was concerned about a tool I write, throwing a link to an RDF file in the head and making it output the RDF upon a request on that link would be quite simple. Will do.

Comment by Phil Ringnalda #
2005-03-31 00:49:20

Some very nice things you can do when you’re dynamic, aren’t there?

Given that most people who CC license a weblog actually intend to only license the body of the entries (and whether or not they intend to license the text of the comments, rarely actually have the right to), I wonder if it would be worthwhile also to generate a just-the-title-and-entry HTML view which could then be the subject in the <Work rdf:about. Getting included images right without it being a PITA would take some doing, but it would be very, very nice for the hypothetical future consumer that we write RDF in expectation of.

Comment by Firas #
2005-04-01 21:55:58

So the HTML links to RDF which links to yet another HTML document—the UA follows to the end and then compares the document in rdf:about to the current document to see which parts are the same and hence under the license?

(I kinda like the idea of inserting an arbitary attribute in elements–defining meaning for a certain value of class, for example–which means that anything in here is subject to soandso license, but the end result is the same as with your proposal.)

Sounds like we’re putting (cc) under burdens that the venerable but past-its-prime (c) never had.

On the other hand, I do realize that (cc) brings murkiness into the matter by its very nature; (c) means don’t go beyond fair use with anything here, (cc) says there’s something here which is mine which you can take, some things that are other people’s.. and by using RDF you’re pretty much committing to taking the burden of precision.

Comment by Phil Ringnalda #
2005-04-01 22:50:35

Roughly, though I’d put it a little differently: the HTML (say, this very page) links to RDF, which contains statements about resources, because after all that’s what RDF does. So, something, like the combination of the words Eiffel Tower and a <link rel="license"> found by a search engine smart enough to recognize those, told you that you might be interested in content licensed here. You send your RDF consumer to see what statements are being made, and find that (hypothetically) it says that ”the license for is Creative Commons Attribution/Noncommercial, and the license for is CC:By/NC/ND and the license for is All Rights Reserved” and since I don’t do things by half-measures (well, I don’t intend to), also license statements about the CSS, and the MT templates, and the JavaScript. What the user has to do to get from ”ooh, pretty, me want!” to the right license would depend on how good their agent’s programmer is: if it was something that I wrote, they’d probably get a page of links to each separate resource with the associated license, and they’d have to work out which was what by guessing at the URLs and then looking for themselves; if it was someone better than me, they’d probably get to right-click the part of an HTML page they want, and get a list of the things that make it up: ”Image: by-nc; CSS: (c)”

You can make statements about URIs including fragments, saying that has a particular license, but I’m not sure exactly what that would mean. The entire DOM node is under that license, including any inherited CSS? How does JavaScript cascade down into a particular DOM node?

If the part of Creative Commons licensing that appeals to you is the ability to tell people what of yours they may use without needing to ask, rather than the warm fuzzy of belonging to the tribe that displays CC badges, then you really need to have separate resources that you are licensing, with uniform resource identifiers that precisely identify just what you are talking about.

Or, never ever use anything that isn’t purely yours, never even quote anyone or mention their trademarked names, and put absolutely everything under one license. I guess that would work, too.

Comment by Shelley #
2005-03-30 12:00:32

Mike, we discussed this at Practical RDF, and I also discuss this at my weblog in a new entry today — I’m not sure why you all are fixating on RDF-A, or even that there’s any activity underway on this. Anywhere.

When you say ’finalized’ — by whom?

And I think, bluntly, you’re relying on technology to hide critical details from the user and then using ”But the users would reject this”, primarily because no one wants to do the work to do this right.

A CC license should require the user to go through extraordinary means to apply it — it’s a very serious legal document with very serious consequences.

Comment by Shelley #
2005-03-30 12:06:26

You know, using RDF/A is about equivalent to replacing MySQL with an Excel spreadsheet as the data storage mechanism driving Phil’s weblog–just because the latter is ’easier’ for the users to understand.

What say, Phil? Want to port to Excel?

Comment by Phil Ringnalda #
2005-03-30 15:36:08

Oy. I don’t know why I keep forgetting what things are inescapable frustration-sinks.

Based on following the RDF-in-XHTML Working Group’s mailing list since June 2003 (!), I think the only reasonable solution is don’t. If you want to say something in RDF, say it in RDF, and <link> to it from your HTML. RDF/A may be workable in the XHTML 2 timeframe, but that’s outside my planning horizon. The things it lets you say, and the way it makes you say them in XHTML 1 or HTML 4 just don’t suit me at all.

I wouldn’t say it’s quite Excel, so much as a db that only runs on Longhorn, and only serves content to IE 8.

And while of course I know linking to a CC license isn’t really viral, it’s still quite likely to have a viral effect; I can’t imagine that nobody will ever get confused by the difference between a post that links to a license for licensing and a post that links to a license for illustration. The CC vision, machine readable licensing that tells you what content you can use without asking, appeals to me. If the CC reality is going to be machine readable flagging that indicates that some content somewhere near it may or may not be usable, and you have to ask to find out what’s licensed, and how, then good or bad, best-possible or not, it’s just uninteresting.

Comment by Shelley #
2005-03-30 16:07:52

Well, all a person could do is put our a warning and a tap on the shoulder, and a nudge and hope for the best.

What can I say? I enjoyed your post and my own little Technorati discovery, and hope that it caused some thought. So don’t feel frustrated.

BTW, I created a refigerator magnet just for you…I’ll post it online in the next day or so.

Comment by Mike Linksvayer #
2005-03-30 16:33:57

Regarding ”A CC license should require the user to go through extraordinary means to apply it” — well, they have to explicitly choose it and assuming their software doesn’t support it directly, they have to manually copy the license HTML into their web pages/templates. I could understand the argument that we should make people click ”yes I understand, I really want to do this” a bunch of times (an extreme version would present the chosen license with checkboxes next to each clause, and a quiz at the end to make sure they understood), but requiring people to learn about metadata is just not relevant and is completely unusable on a planet that has overwhelmingly voted for GUI and WYSIWYG as opposed to CLI and markup.

RDF/A is not dumbed down, it is just another RDF serialization.

The machine readable stuff helps you find potentially licensed stuff. You may or may not want to visit the site to see if there’s a matching visual license notice, you may or may not want to verify by writing the putative copyright holder, you may or may not want to get a signed contract with same, all depending upon your risk profile.

CC metadata is not meant to make any guarantees (though it’s possible it might be annotated with further assertions concerning provenance signed by thir parties, or whatever), nor is it meant as an enforcement mechanism. It is merely a helper.

Even if most people made explicit assertions about how each resource on a page is licensed (as I hope they will, as enabling software is adopted), you’d still want to take one or more of the steps above depending upon your risk profile, as the publisher could be making a false assertion.

In other words, metadata precision and trustworthiness/legal certainty are (mostly) orthogonal. I’m happy to see progress on both fronts so long as they are add-ons that don’t impact the license choosing experience for joe user.

Comment by Shelley #
2005-03-30 16:51:06

No, you misunderstood, Mike (or more likely, I wasn’t clear). When I said the the user should go through extraordinary efforts to add a license, I didn’t mean pushing a button and clicking ”I do” a bunch of times.

I meant, as Phil wrote at my weblog comments, there should be something that highlights all the entries in the page, so they’re aware of what the CC covers. People should be precise with that they’re adding the license to. Is it the writing? The images? The stylesheet used for the page?

You’re interesting in making the CC license easy to add; I think I would be more interested in making them more precise.

And, no I didn’t say that RDF/A is dumbed down, necessarily. I do think that the growing reliance on the rel attribute is dumbing down semantics, generally.

Comment by Mike Linksvayer #
2005-03-30 21:22:59

Right, you weren’t clear. :-)

Highlighting all the resources on a page, either for licensing or seeing what is licensed, is a good idea. Someone should write those apps. I will add to our badly in need of an update tech challenges page. I don’t see the first being the primary interface for choosing a license, at least in the forseeable future, but would be a neat feature to be built into a CMS or similar. Would be useful not just for licensing, but for other types of annotation as well.

All that said, thanks Phil and Shelley for bringing this issue up. I’m all for more precise metadata (it will make our search better, for one thing), and you’ve gotten me thinking about ways CC can encourage more precise metadata without paying a high usability cost.

Trackback by Andreas Haugstrup #
2005-04-12 19:09:03

Creative Commons HTTP Header

Why not just use the same technique that pingbacks use? Why isn’t there a Creative Commons HTTP header which designates a license for any web resource? It would be dead simple – All you need to do is to point to the license adress.

2005-11-06 21:59:15

[…] You might remember that back when Yahoo launched their search for Creative Commons licensed content, it would return anything which linked to a Creative Commons license, whether it was under that license or just talking about the license. […]

Name (required)
E-mail (required - never shown publicly)
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.