in which there are too many circles

Science yearly gives out awards for the best science images and visualizations. Here is one of this year's runners up:

This is “Visualizing the Bible” by Chris Harrison and Christoph Romhild. So they started at the left with Genesis, and put each bible verse along the white line toward the right. If any bible verse refers to a previous one, they draw a half-circle connecting them. The circles are color-coded by how far apart the verses are.

So it's very pretty, but I don't feel like I learn anything new about the Bible by looking at at, and that annoys me. Okay, so scholars have found lots of places it refers to itself, but I knew that. It seemed like you could get a lot more interesting information with this technique.

Okay, here is my much less pretty, much less complicated version. :) To make this picture, I wrote about my day in an
informal, conversational sort of way for about eight hundred words. Then I put all the words I wrote from left to right across
the bottom of the picture, and every time I used the same word twice (and it wasn't a word like "the" or "of") I drew a circle
connecting the spots with those words. (Bigger circles are more transparent; otherwise you wouldn't be able to see what
was underneath them). So you can see, when I'm talking, where I'm repeating words I've already said a lot, which tells you
how related my sentences are to each other.

The section of the graph about two thirds of the way through where it gets really gray and there's a lot of repeating is where
I'm talking about all the different crows I see on my way home from work, but I use a lot of similar words to describe each
crow - "fly" and "caw" and "beak" and things. So it connects all those words up and the area gets pretty shaded.

So we'll call that "normal".

Here I do the same thing with a section from an unpublished story by gement . I picked gement's story because she
writes lots and lots of witty, intricate dialog. Almost every word in this section was said by one character to another, with
clever banter. And see how different it looks! There are a lot MORE circles drawn, which means that more words are
repeated in the story, since every circle connects a word that was said twice. This isn't a surprise at all. The characters are
responding to each other, playing off each other, making jokes based on wording, or answering questions the others have
asked... lots of reasons for words to come up twice.

It also makes sense that more of the circles are shaded darkly, which means words come up twice only a sentence or two
apart. That's what you'd expect since a lot of the repeats are character B responding to somethign character A just said.

Here's another go, with some erotica. I do have permission to use it, but the author is anonymous. :) For most of this story,
it looks like the "normal" image we looked at first especially at the very beginning, which sets up the story and the chance
enounter with the hot member of your preferred gender. But as you get closer to the end it gets greener and greener. The
story gets more repetitious because there are really not very many ways you can say "she verbed my noun with her noun
and it felt sooo adjective" - so words are repeated as you get away from the plot and into the sex. I think most erotica would
come out looking like this. :)

Here's another piece of fiction, one of caladri's "Lies" pieces, which are stream of consciousness meanderings on
events that never happened. There aren't very many "big" circles in this one; after a sentence or two she's already drifted to
a new topic. However, the small connections are so insanely tight you can't see most of them. This is an insanely dense,
poetic, self-referential bit of writing.

And now for some actual poetry! This is tfabris's and vixyish's song Thirteen. See those bits that are all parallel and
connected to eachother? Those are the three choruses. (This song is only like 200 words, instead of 1000, so we're
zoomed way in). You can also see that the bit at the very beginning is mostly unconnected to anything - it's sort of an intro,
but that the rest of the verses are all linked up.

Here's another of their songs, Apprentice. (I picked it 'cause it was long. For a song, anyway.) Once again we have three
choruses, though there are more verses between the second and third than the first and second, and you can see that.
Interestingly, in this song, it's the middle verses that mostly stand alone, and the first and last set are both all connected to
themselves, and also connected to eachother. This is obviously pretty deliberate on Vix and Tony's part, with lines like:

One night I stole into her garden
And I overheard her sigh...

at the beginning and:

She taught me to look beyond the garden
And she never said goodbye..

at the end. The first bit is "she is so awesome!" the second is "here is our story" and the third "she is so awesome and I
miss her," so the first and third are pretty similar. :)

Oh, and finally here's some C++ code of mine that renders simulated nitrogen uptake in a stand of trees. It is very, very
repetitive. (I counted a->b as two separate words.)

So that was fun. It's very primitive, obviously - I'd like to get a word list and start looking at repeating patterns of synonyms,
instead of just the same word each time, and look at different parts of speech and sentence structure and stuff. caladri did some
diagrams of her stuff last night, too, which might also amuse you. She's looking at what words occur
near eachother in a sentence.

Does anyone have a writing sample they wouldn't mind me playing with and posting, of a type I don't have yet? Recipe?
Fiction that is neither stream of consciousness nor dialog-based? Something you've written in another language?
Rhyming poetry not intended to be sung? Blank verse? Non object-oriented code? Noncode technical writing? Ideally
complete and 500-800 words.

ETA: novalis made an online version for you to play with!

Current Mood: busy
I wrote it in Processing because I wanted to see what all the fuss is about - one of my work collaborators will not stop talking about Processing ever. I see now! The code is like sixteen lines1, and it would be very fast to turn it into a web toy.

[1] Not counting preprocessing the text, and generating the list of words so common I should ignore them, both of which I'm doing with a separate script.

I've got 1500 words on IMAP4 right here:


And here's a little Field Notes I wrote about wackiness with DSL:


Neither have code, tho' the IMAP4 article has IMAP4 keywords in it.

Thank you!

It occurs to me that the original American edition of Dorothy L Sayers' first novel, Whose Body?, has fallen out of copyright and is available here. Lord Peter Wimsey is has a vast vocabulary, but he's also rather amusing ;) A bit longer than you asked for, though.

I'm sure I can find a reasonably self-contained section. Thanks for the pointer.

Mmm. This is just _cool_. And I only skimmed! I must read later when I'm not trying to escape work.

I am soooooo shocked you like this. Shocked, I tell you! Shocked!

wispfox assured me I needed to read this, and I did. :) (My thesis was on discourse analysis. I used the occurrence of repeated references as an indicator for software design; it looks at more cumulative statistics rather than the fine-grain structure visible here.)

Great use of this visualization! I'd seen it applied to music before, but not text. Putting it in Processing makes it available to a lot more people.

This of course brings to mind a ton of other things that can be added to it -- stemming words (so goose and geese link up), doing word sense disambiguation (or at least POS tagging), etc. There are some FOSS projects doing tagging, and possibly stemming...haven't looked at it in a while.

You might look at things like sonnets, or poems with specific structure: Villanelles would show a really strong structure...

novalis got it up online and available to everyone way before I did, which is probably for the best. I probably would have wanted to tinker endlessly before actually making it accessible. I'm considering making the baseline almost circular and the connecting lines straighter arcs.

The music visualizations I've seen compare variable-length phrases, which is more complex and interesting. I've been thinking about looking for phrases or other longer increments using something similar to Huffman encoding on a much longer text.

Thanks for your other suggestions. It's a fun problem. :)

It seemed like you could get a lot more interesting information with this technique.
yeah, i was really unimpressed; we already know about probability distributions and markov walks -- not a legit tech term, but ykwim -- in writing.

Does anyone have a writing sample they wouldn't mind me playing with and posting, of a type I don't have yet?
i'll send you a link to a huge corpus i wrote. :)

I'm kind of offended someone decided that was one of the year's best visualizations. I thought maybe it was part of an interactive application or something that picked more pattern out, but it doesn't seem to be. It's a simple, obvious technique that's been done better by others years earlier in ways that display more information, and interesting, nonobvious information at that.

This is amazing. Thank you.

I wonder, somewhat guiltily, how much less beautifully bubbly it would look if you didn't link all the instances of their proper names. But thank you for making my words look so pretty.

Even if that's a contributing factor (I will go down and rerun it stripped of names tonight and post the new image), it is an important characteristic of your story and your writing style that there are only two objects of importance.

In my "normal" story there are lots of different nouns: doorways and crows (each with his own name) and horses and bosses and the tropical plant greenhouse and jeans and busses and the virus lab. In your story, the only context each character needs is the other, and that's most of the nouns we (that aren't being spoken)It's not "cheating" and you shouldn't feel guilty - it's how your story is put together, and it reflects the uniqueness and strength of your writing style. :)

I did strip both "he" and "said," so you may actually have a bubble handicap. :)

Edited at 2008-10-23 03:41 am (UTC)

You are welcome to use my one complete and posted short story, which has almost no dialog and is not stream of consciousness. Actually it would be really fun to see. It's broken across two posts:


I wish I had things to say in rhyming poetry that took up that many words. :)

Awesome! I will definitely do that, thanks. :)

neat. will have to refer to this later when i have more time.


that's just awesome. I'd love to see what Monty looks like. :-)

Will do! :)

I don't think works need to be used with permission. You're not re-publishing their work or even making a derivative work. Additionally, this falls under fair use (for research purposes.)

It still seems polite. :)

This is very cool. You're welcome to use anything I've written (I'm not clever enough to pick something out myself).

Will do! I expect you come out looking halfway like the "normal" story and halfway like gement's "nothing but dialog" story. Though maybe I'll grab one of your folktale ones because they have very interesting word choices and see if there's structure there.

I'd be curious to know what you would do with one of my year's end most-liked book review posts:

http://maribou.livejournal.com/158349.html (not coding cause I'm too sleepy and I will break it)

I always feel like they are super repetitive, since I tend to enthuse in fairly similar ways, and I'm curious about whether I'm right.

W00t! I definitely need more prose. I will run that through. Though probably not tonight, as we're having the house inspected tomorrow and I should clean. :)

Uh, as far as I know, I have never met you, and I'm not sure it's legal or anything, or if you'd even want to, but could you consider marrying me?

This is so cool and fun!

You are made of marzipan! How could anyone not marry you, or at least nibble on you? :) Thank you.

