Got bandwidth?

Can you stop hotlinking of MP3s in embed/bgsound? I can’t see a way, offhand.

9 Comments

Comment by Roger Benningfield #
2005-04-09 23:49:26

An easy way? No. But how ’bout storing the MP3 outside the webroot, and accessing it via your scripting language of choice? From there, leverage whatever session management you’ve got available to restrict access to local users.

In CF, it’d look something like:

mymp3.cfm
=========
<cfif session.allowDownload>
<cfcontent type=”audio/mpeg” file=”c:somedirfoo.mp3” /><cfabort>
</cfif>

mypage.html
===========
<cfset session.allowDownload = true />
<a href=”mymp3.cfm”>download</a>

Now, the hotlinker in question could still get around it if he’s determined, but it would stop casual abuse.

Comment by Phil Ringnalda #
2005-04-10 00:07:45

Unfortunately, ”local users only” won’t work for Jim, since the file in question is (repurposed as) a podcast, so all sorts of random UAs that haven’t ever shown their face before are approved of; it’s just IE (since Firefox apparently (at least mostly) sends a referrer with embed requests) requesting, in its particularly tacky way (how nice! a request with a fake UA string, where the body is ignored, only made for embedded media, usually the biggest and most expensive thing to send, or, Ghu help you, generate), an embed/bgsound on someone else’s page.

 
 
Comment by Jim Kloss #
2005-04-10 01:31:33

Just to clarify, here are a few more details:

I did spend several weeks at one point trying to get between the client’s request for an .mp3 and Apache sending it. I took control of all client requests for .mp3 files with my own PHP scripts. Although it worked, it was a miserable failure for the following reasons:

1) I was basically having to write a server and I’m not sophisitcated enough to handle all the various HTTP headers; specifically Partial Content requests were difficult. I also couldn’t figure out how to correctly respond to all the cache variations. There were so many different clients (podcatchers as well as browsers) sending all sorts of HTTP headers that I couldn’t figure out how handle them all. It gave me real appreciation for what Apache does under the covers. I could only get iPodder, IE and Firefox to work reliably – and only under the limited versions I have here on my test machine.

2) PHP was much slower at transmitting the large audio files than just letting Apache handle it. It also bogged down the server CPU significantly when 10-20 people were simultaneously running the PHP script to send the MP3s. I don’t think simple PHP fput commands are optimized for sending data over the internet quite as well as Apache’s TCP/IP threads etc.

Although I’ve scrapped that project, one of best advantages was that I was able to dynamically decide exactly what audio content (if any) I would send to the client based on their realtime download history I was keeping in a database. (”Oh, you get the ’Quit Abusing Our Bandwidth’ sound clip” or ”Hi Phil – this one is just for you” prepended to the actual MP3 requested. I had dreams of creating dynamic MP3s made of nothing but clips or music all strung together based on a passed URL. I still get excited thinking of those possiblities!)

Another great advantage was that I was able to track exact number of bytes transmitted. Apache has a problem with its logging when the client cancels a request. Apache still logs the full filesize as being transmitted instead of reporting exactly how many bytes were sent before the receiver canceled.

I’d love to find a well-written, robust PHP script I could modify that communicates directly with the client at HTTP header level and correctly / efficiently handles sending mpeg type content out to a variety of clients. I sure wasn’t able to write one…

In the meantime, I run a PHP script that analyzes Apache logs after-the-fact, finds and locks out/redirects abusers by dynamically modifying .htaccess in the audio files directory. (Scrolling to the bottom of http://www.wholewheatradio.org/wwrss.php shows detailed stats I just started keeping for each podcast, including aborts, partial contents, and abuse attempts.) The number of podcast aggregator sites that keep coming in every few hours, doing a GET followed by an immediate cancel (instead of the more appropriate HEAD command) for every MP3 in our library insures that the .htaccess file has a long and growing list of denied IPs. Not to mention all the podcatcher clients that are unwilling to keep a simple list of downloaded basename($thefile) by URL so that even if the user is subscribed to 50 feeds that share some of the same MP3s, they won’t keep re-downloading the same ones over and over and over.

Comment by Roger Benningfield #
2005-04-10 01:43:39

Jim: If you’re having CPU load problems due to the script, then that’s probably a deal-breaker right there.

But out of curiosity, what would happen if you had the script:

(1) Check for a ”contype” user-agent.
(2) If it sees ”contype”, add the IP to a FIFO list of ”bad IPs”.
(3) Check to see if the current IP is on the list, and if it isn’t, send the file.

If you combined the above with an exemption for clients that have an active session, it seems like you’d have a block that only impacts hotlinked files served to IE.

 
Comment by Roger Benningfield #
2005-04-10 01:50:51

”Not to mention all the podcatcher clients that are unwilling to keep a simple list of downloaded…”

Jim: Are they ignoring your guids?

Comment by Jim Kloss #
2005-04-10 01:59:42

Some are, some aren’t Roger. A few people (hackers?) have written simple clients that just look at feeds and shell out to ”wget enclosure.mp3” for every enclosure they find. I don’t think they’re keeping any sort of history. I’ve seen the same IP come in and download the same MP3 pointed to by a single feed several hundred times over a 3 month time span.

 
 
 
Comment by Jim Kloss #
2005-04-10 01:53:53

Yup Roger, that would do it. If only I could write the rest of the handler (all the HTTP header stuff) or find a great one already written. But I suspect you’re right – rewriting Apache in PHP is a deal-breaker…

 
Comment by Matthias Bauer #
2005-04-15 07:59:36

You can use mod_env to set an environment variable if the User-agent is ’contype’. Then, if that var is set, call a script that adds the current IP to a .htaccess file (Deny from 1.2.3.4). Subsequent requests will be blocked.

Problem solved?

Comment by Phil Ringnalda #
2005-04-15 22:48:11

Whee! That sounds like fun. Your script needs to be capable of rewriting your .htaccess several times per second (among other things, Xanga is used by teenage girls writing about their sex lives: that can be quite popular), all the while not colliding with your other script that expires IP addresses (because not only might I be getting an IP from the dialup pool that was last used by a dirty Xanga-reader, I might be checking out a Live Bookmark bug with a Xanga feed, see a link saying ”background music ganked from Whole Wheat Radio” and realize that I’m now banned until I beg Jim to let me back in), and possibly sometimes needing to complete in negative time: just fooling around with requests for empty files, with logs that don’t do microtime, I was sometimes seeing the non-contype request show up a second before the contype request. Maybe contype got poorly routed, or maybe it was stuck in the slower of two persistent connections, but either way, it doesn’t look like a pretty problem with a neat solution.

 
 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.