Thanks anyway

It’s remarkable how well asking for help works, even when you don’t get an answer, or even finish asking. Probably everyone is familiar with the way that you can struggle with a problem for hours, finally give up and post a question on a message board, and then within a couple of minutes after posting, figure it out for yourself. I’ve managed to extend that to the point where I just start to ask (this post started out asking for help with a regexp to convert any number of mixed tabs and/or spaces, plus a newline, to a <br>), and in the process of asking figured out what I was doing wrong. So thank you for the help you would have given, if I’d needed to finish asking.

Converting Marcus’ Comment Archive and Interchange Format as produced by YACCS to dotcomments format, if you’re curious. For all its advantages, XML is a PITA to parse in PHP. Give me a single line with a separator character that I can just explode() into an array any day (yeah, I know, I know, I don’t really mean it).

10 Comments

Comment by Hossein #
2002-04-20 16:46:01

I love writing XML, but I hate reading it :) It’s a pain to parse in nsd too, even with the really nice ns_xml module that does a lot of the work for you.

Let me know when you’re done with your conversion script; I’ve already added 3 new export formats to YACCS. I’ll link to yours as well, when you’re ready.

I kind of feel bad that I wrote direct output to other systems instead of a more useful CAIF to xxx system export, but I have had a lot of requests for exports, so I wanted to get them done this weekend. Of course, now that I have half the work done, maybe I’ll go back and do the CAIF conversion as well.

 
Comment by Phil Ringnalda #
2002-04-20 17:01:10

It’s done but…

Unless I’m being stupid, you didn’t actually get CAIF output obeying display order yet, so that part of my script is in flux. Looking at my Pro FAQ comments as CAIF, that looks like newest-first, and it’s set to oldest first.

I guess if you’re linking to it, and I’m making it public, I should throw in a UI with a choice of input order and output order and input filename (right now, you edit and run the php script, and that’s it: the UI consists of the word ”Done”).

 
Comment by Hossein #
2002-04-20 17:17:50

Well, before you make it public, let me ask you a question. I’ve been concerned with how I’m handling the CDATA ever since I wrote CAIF output. Supposedly, I could output CDATA as straight HTML, and it would be acceptable. But I don’t do that, because it seems that one ]]> would ruin the output. Right now, I’m double-encoding, which is almost certainly incorrect.

What should I do? no encoding? single encoding? For Snor/Blogkomm/Blogback PHP, none of them support all of the tags that YACCS does, but I just formatted the data like I would in YACCS (supported tags in straight HTML, non-supported tags are encoded) and exported it, and it looks fine. It might be confusing for a visitor to see something in bold, even though the comment system doesn’t support it. But I think it’s better than seeing a bunch of extraneous tags.

Oh, and I’ll look at the CAIF order again, and make sure it’s fixed this time.

 
Comment by Hossein #
2002-04-20 17:34:56

Ok, CAIF ordering should be fixed now.

 
Comment by Phil Ringnalda #
2002-04-20 17:48:20

Oh, rats. Perhaps I should have looked at what was happening to html before I made my exported comments live? There will be a brief pause while I fix the fact that my Pro FAQ comments are now full of <i> and <a href…

All modestly aside, I’m really damn good at testing other people’s stuff. Why is it that I’m so incredibly terrible at testing my own? Perhaps it’s related to the lack of modesty.

 
Comment by Phil Ringnalda #
2002-04-20 18:32:57

Sigh. Better now. That was truly awful, and incredibly stupid of me. I was so busy parsing out crap like all the whitespace after the </text> that the PHP XML parser thinks should be a part of text’s cdata even though it’s after the damn closing tag… never mind. It’s fixed.

I’m not quite sure whether html in the CDATA section should be encoded or not, but here’s what I did to fix it as quickly as possible: I just un-encoded it all, and then pasted in the dotcomments lines that deal with html in input, so I wouldn’t end up with <style color=”lime”> being interpreted. (Okay, first I failed to un-encode quotes, and ended up with an even worse mess, but after I fixed that, then it worked slick as a whistle.) Right this moment, I’m not sure whether that means you should just export straight html, and expect anyone importing to deal with it in their own style, or that you should export entities, and anyone importing should un-encode and then re-encode. I think I’ll think about it with a beer.

 
Comment by pixelkitty #
2002-04-20 19:05:52

k10k.net is BACK! WOOTZOR

yes Im spamming every comment board I can find!

 
Comment by Phil Ringnalda #
2002-04-20 19:46:58

You young kids with your good vision. I looked at it this morning, squinted, put my nose on the screen, and said ”Huh. Something for the young folks, I guess.”

 
Comment by ruzz #
2002-04-20 20:44:24

I think I’m young and I too looked at the site and said, yeah pretty, but not readable, and the navigation blows.

 
Comment by Marcus #
2002-04-21 06:57:06

Yeah, whenever I spend a large amount of time parsing XML with PHP I always end up wishing it was never invented.

There’s a lot of stuff to be sorted out with CAIF and, as everyone knows, I didn’t get round to it… yet. I even set up a Yahoo! Group months ago that compiled everyone’s comments so far, but I never mentioned it to anyone.

So… does anyone want to join?

There’s some changes I definitely want/have to make to the format: #1 The root tag. #2 The datetime format. #3 The thread ID. #4 The TEXT element.

My own conversion scripts aren’t even that intelligent because they’re expecting a certain ordering (as far as I remember) and can only work with a single thread. I intended to handle the data in an array at some point.

As for k10k.net, I always like it but could never put my finger on why. Perhaps it’s just the whole pixelly ethos.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.