rolanni: (Clan Korval's Tree and Dragon)
[personal profile] rolanni

OK, I don’t have much experience coordinating a project this big. My general approach, when confronted with an Enormous Project is to break it down into bite-sized pieces, but! I’m not a project manager and have none of that Foo.

Below, is what I’ve got — comments and advice welcome. I’ll also be looking for volunteers, but that’ll come later.

I’m soliciting ideas on how to implement in order to achieve the goal without loss of life, and without anyone having to bear an enormous burden of work.

The Goal:  A list, for each novel, of all the Liaden-and-other-Weird-Words that appear in that novel, AND a list of Liaden-and-other-Weird-Names that appear in that novel.

What the lists would look like:

1. Title of Book, Edition
a. Word One, Page Number
b. Word Two, Page Number
Lather, rinse, repeat

2. Title of Book, Edition
a. Name One, Page Number
b. Name One, Page Number
Lather, rinse, repeat

I’m guessing that there ought to multiple eyes on each novel, in order to make sure that the maximum number of Weird Words (henceforth WW) are captured. Some of the WW will be English words (we use a smattering of obsolete English words, Just Because), some of the WW will be Terran slang, Delgadan words, and the ever-popular etcetera.

For the names — I’m guessing another buncha eyes for each book, so that the maximum number are captured.

Question: Planet and ship names — Different lists? Or folded into the Names List?

Also needed, someone or someplace to receive, and coordinate, the lists.

Ultimately, the lists will be used by Lee and Miller for Something Really Cool, and will play an important role in the Web Pronunciation Guide Project.

There is some time limitation on getting this together, but at the moment, the deadline is squishy.

So! What’s the best way to set this up?

EDITED TO ADD: Please nobody build anything yet. We're still in brainstorming mode. I appreciate everyone's input.


Originally published at Sharon Lee, Writer. You can comment here or there.

Date: 2012-03-29 12:27 pm (UTC)
From: [identity profile] intuition-ist.livejournal.com
if you have electronic versions that are reliably paginated (i.e., *not* Word), running them through a spell-checker would probably get most of the non-English ("weird") words. there may be variations on this that a more-ept programmer than I, might tell you about.

Date: 2012-03-29 03:33 pm (UTC)
From: [identity profile] adina-atl.livejournal.com
I used Word on a 23,000 word story with lots of WWs and WNs, removing my existing custom dictionary and creating a new one for that document only. Five minutes of clicking "Add to Dictionary" got me a text file with 123 new words and names in it, including related forms like possessives. Doesn't give page numbers or locations, but does give a word list that could probably be fed into an indexer or even an ebook reader's search function to give the page numbers.

The only thing this doesn't find are names or words that are spelled like standard English words but used for something different, like a person named Cat. The name Jeeves is not flagged by spellcheck, for instance.

There's a free-trial program that will find all the unique words in a document or text file. I haven't downloaded it or tried it yet. You'd wind up wading through a lot of and, or, but, the, a, and said, but it would catch the names that are in the dictionary.

ETA: The free-trial software is Word Patterns by Mysoftwarefactory.net.
Edited Date: 2012-03-29 03:35 pm (UTC)

Date: 2012-03-29 11:08 pm (UTC)
From: [identity profile] intuition-ist.livejournal.com
word's spellcheck is pretty functional, true, but its ability to paginate consistently is ... erratic. especially if you open it on different computers with different printer drivers installed.

Date: 2012-03-29 04:16 pm (UTC)
From: [identity profile] schulman.livejournal.com
Agreed. Doing this manually is a Herculean task. Doing it by using spellcheck or just doing a /usr/dict/words lookup is a fairly simple task, and will also make it easy to have frequency counts.

The only part you'd have to do manually is to separate Names from Words.
Edited Date: 2012-03-29 04:17 pm (UTC)

Author Needs Help

Date: 2012-03-29 01:11 pm (UTC)
From: [identity profile] spudsmom.livejournal.com
So I would assign two people to each book/work and then have them go through picking out everything and noting it with page numbers and definition then have one person go back and organize it in to separate lists preferably in a spread sheet then have someone else take all of the spreadsheets and create separate ones for each type of word name. Access would probably be the best thing to use for organizing the data but everybody probably doesn't have that on the other hand if everybody used excel and then you had the last person use Access to load the Excel Files that might work. I think the Definitions for Ship should include ownership like Juntava Ship and maybe who it was carrying and or where it fit in the story, too much?
Book Title
Page Number Word/Name Definition
Toni

Re: Author Needs Help

Date: 2012-03-29 01:28 pm (UTC)
From: [identity profile] jessica brown (from livejournal.com)
If she wants fan help, I would just accept help from everyone, some people literally have nothing to do, and could spend hours a week working on this. Put out what form you would prefer the information in, and let us do it! Do you have a very large email capability, and a good virus protector? << those are important if you are accepting and opening things from random people.

Re: Author Needs Help

Date: 2012-03-29 02:02 pm (UTC)
From: [identity profile] spudsmom.livejournal.com
You have a good point, the lists would get done faster and with many hands, more thoroughly, it will making putting it all together more challenging but with volunteers its probably the way to go. I more used to paid people who pretty much have to do the work.
Toni

Re: Author Needs Help

Date: 2012-03-29 03:21 pm (UTC)
From: [identity profile] sb-moof.livejournal.com
GoogleDocs would be the best place for this if you want multiple people to contribute. I believe it's even keeps versions (for some definition of version) that can be reverted to if someone manages to make a total hash of it. I'd be happy to look into it further, a little later today if the many hands approach is deemed A Good Thing (TM).

Date: 2012-03-29 01:16 pm (UTC)
From: [identity profile] fullmetal-al.livejournal.com
I'll happily volunteer for WW duty if need be once the details have been nailed down.

I happily volunteer too!

Date: 2012-03-29 01:26 pm (UTC)
From: [identity profile] jessica brown (from livejournal.com)
A spreadsheet would be easy to keep in order. You could have a "main" spreadsheet, and accept spreadsheet emails, open them, copy and paste in to the "main" one. Do you need every sighting of each word listed? As in, a city name written in the same book 15 times, or do you want only the first sighting the first time per book? I saw some other comments and I don't know how safe it would be for anybody to edit information from other people, most people would only help, but there could always be that "one" person who just likes to go around giving out bad days.

Google Docs Form?

Date: 2012-03-29 01:34 pm (UTC)
From: [identity profile] alan miller (from livejournal.com)
If you have a Google account, you can create a form in Google Docs, which then feeds data into a spreadsheet of the same name. Let (many?) people enter data on the form, then sort the resulting form by word, book, etc.

Date: 2012-03-29 02:10 pm (UTC)
From: [identity profile] 1stleadingedge.livejournal.com
I think it will be easiest to work with digital files. Using the browser's Find/Find Again will work quickly for most instances. The key is the collation.
I would go with a spreadsheet for collating. I would not, however, go with Google docs - too many cooks, etc
I volunteer to work on the project.

Date: 2012-03-29 02:26 pm (UTC)
From: [identity profile] ebartley.livejournal.com
I own your ebooks and can export them to text and run them through a spelling dictionary. Can probably write a Perl script to get some more information than that from there (not specific edition information, of course, but the context each word appeared in and the like), but I can't figure out details at least until lunch hour.

Most words and names will appear repeatedly in books; I presume you want the first time each is used? Or?

Date: 2012-03-29 06:56 pm (UTC)
From: [identity profile] ebartley.livejournal.com
So ... I spent my lunch downloading your books and a Unix dictionary and throwing together a Perl script. I now have a huge file which should be a good place to get started ... but how do I get it to you?

Author Needs Help

Date: 2012-03-29 02:28 pm (UTC)
From: [identity profile] spudsmom.livejournal.com
I think you'll get lots of volunteers, you and Steve being well beloved and all that but the real question is, "Who is going to herd the cats?". You'll want to get that established first, probably, because that person should choose the tools with which they are the most comfortable etc.

Just once? or every appearance

Date: 2012-03-29 02:30 pm (UTC)
From: [identity profile] psw456.livejournal.com
The magnitute of the task will depend on answering Jessica's question above - are you wantiing to capture every appearance of each WW or only it's first appearance?

Since I also agree that using volunteers to spread the work is the way to go, the next complication will be to determine the "format" of each book so that page numbers make sense. That means to me, that either you supply them, or there is another task to determine if various electronic versions, e.g. ePub and Kindle editions, match on pagination.

I also strongly recommend a trial run, after we THINK we have all the bases covered. That is - set up the spread sheet a volunteer will work with, the final repository, and try it with a handful of folks for ONE book, before letting loose the hordes of volunteers.

I, also, would be happy to be one of those - volunteers, that is...

Werid Word Coordination

Date: 2012-03-29 02:33 pm (UTC)
From: [identity profile] sydnie montgomery (from livejournal.com)
Have you heard of PBworks Wiki? We use it in my graduate studies program to collaborate with multiple people. Each person gets a login, the person who is in charge gets to assign the roles of the people who are collaborates.

You could give each book a page and assign two or three or however many people you like to each page. they in turn list the words and such on that page.

I would recommend this as I've used PB works wiki and it works well. It's not perfect but it does well with this sort of collaboration with people who aren't in the same geographic location.

Here's a link to the wiki of my class from last Spring: http://digitaltextuality.pbworks.com/w/page/35113284/Syd%20loves%20Learning

Date: 2012-03-29 03:24 pm (UTC)
From: [identity profile] sb-moof.livejournal.com
I am not familiar with it, but there must be software out there to create an index. Seems like if we can run the novels through that, then we just need to discard all the normal words. I will test out my google foo later and see if I can find indexing software.
Edited Date: 2012-03-29 03:25 pm (UTC)

Date: 2012-03-29 04:58 pm (UTC)
From: [identity profile] sb-moof.livejournal.com
Ok, I've done a bit of Googling. for Windows, there is the shareware Index Author program, http://www.sttmedia.com/indexauthor-download . Lucene seems to be a Java library that a programmer could use for this, http://lucene.apache.org/core/ . Cindex is for both mac and windows, http://www.indexres.com/home.php, but it is very expensive ($500) OTOH, they offer an indexing service, but they must be contacted for pricing. They do have a free demo that is limited to 100 words, so maybe if enough of us get together... (jk)

http://www.anindexer.com/about/sw/swindex.html has a good overview of other software options. An interesting one is HTML Indexer at http://www.html-indexer.com/

Now, I'm off to a meeting...

Date: 2012-03-29 04:57 pm (UTC)
From: [identity profile] rnjtolch.livejournal.com
1. Your requirements need to be refined. For example the phrase Page Number: hardcover, paperback ebook or what and if ebook, which ereader format? Make sure you understand and are able to convey to others the exact meaning of each, call it datum, in your, call it database.
2. And so, we called it it database and that is what we should use for this task. It seems a perfect relational database application. Something like Microsoft Access (is there still a Microsoft Access?) will be able to do the job very handily. Some programming will be required.

I am not volunteering because I have been retired for way too long and I are obsolete and too old to play anymore. And "way too long" in this business could be matter of months, not years.

So what is needed, aside from humans to implement this, is a computer accessible to all those humans and on which this database (LiadenWord) resides.

Good Luck, and as Groucho said, "Stay Warm."

Date: 2012-03-29 09:27 pm (UTC)
From: [identity profile] silverdragonma.livejournal.com
I agree a database is the way to go. Taking the information in in spreadsheet form should be possible.

brainstorming

Date: 2012-03-29 11:36 pm (UTC)
From: [identity profile] ednaemode.livejournal.com
I agree on electronically as much as possible and I would do it "book" by "book", cross-referencing previous books as necessary so that you only define/pronounce a word ONCE. For example, Korval is going to be evaluated in the first "book", made known to whatever level of detail you wish (which is why you want a DATABASE not a list). Detail: the Word, what books it appears in, where,how is it pronounced, what language is it, plural forms, etc. The thing is, you don't know what level you will need in the future...a database allows you to add to this. Might you want to know publisher? What about ???

Sounds scary now doesn't it? Start small. Think first about what your readers want (pronouncations and definitions) and what you and Steve want/need (hey, why shouldn't you benefit from this adventure) in terms of what you NEED right now and a Wishlist for the future. Start with a chapbook and learn a bit about it and get someone design the database and entry forms (volunteers). Then populate the database with the results of a script that reads the electronic book. If you play your cards right, all you (the authors) will have to do is define and pronouce each unique item not in English and enter it into the database.

Once you have the database (such as MS Access as mentioned above), you can ask it to print out the lists Any Way You Want. The good news? I see you asking your readers for help on remembering when characters showed up and what you said about them. If this gets done properly, you'll be able to have the answers immediately because you will know what page and what book.

beth

Date: 2012-03-30 12:09 am (UTC)
From: [identity profile] enleve.livejournal.com
I would suggest creating a wiki and making it available to your readers on the web, and then they can contribute to it and edit it.

The same software that runs Wikipedia is open source. The only downsides would be the initial work needed to set it up, and dealing with spam (there might be a way to reduce this.)

For the list of words do you want every time it appears in the book, or just the first time?

Date: 2012-03-30 02:09 am (UTC)
From: [identity profile] ebartley.livejournal.com
Heh. I see the update, and sorry for having jumped the gun. That said, having written something over my lunch hour, here's what jumps out as utterly trivial to use a Perl script to to get from text copies of each ebook:

1) A nonstandard word list -- lots of false positives, but the stuff that isn't in a list of standard English words
2) The paragraph each word is in.
3) The word count for each word (i.e. word #49473 in a book.)

The advantage a Perl script has over other computerized means of accumulating a list is that it's more flexible and easier to tweak for exactly what you want -- if you wanted the first three times a word appeared, for example, that's very doable.

Very Large Project

Date: 2012-03-30 03:58 am (UTC)
From: [identity profile] catherine ives (from livejournal.com)
I volunteer to read Ghost Ship and Dragon Ship and keep a list of the Liaden words and names and other weird words and names if any in those books. I plan to read Ghost Ship again before reading Dragon Ship. I don't have an e-reader though. I would have to e-mail you my list.

As for the other books....that would be too much like work. I will leave those books to your expert commenters who no doubt have your books on their e readers ....

Date: 2012-03-30 03:58 pm (UTC)
From: [identity profile] zola.livejournal.com
I personally suggest this gets done programatically to make sure it's done in an orderly way.

Do a list of the books and a page count on each, that's your first database entry. (Ghost Ship, paperback, X pages, Ghost Ship, hardcover, X pages).

Take that and generate a list of links for each book, one link per page.

Phase 1--the users go to the links and click and get a form, and they put in the word "None" or write a comma separated list of the words and hit submit.

As pages are examined, we show ONLY pages that haven't yet been checked, so we insure that every page has been read by someone.

When there are no links left on any books, we go into Phase 2. Again, you see the list, but this time when you click the link, it shows you what was put in the first time. You click "yes, this is correct" or "no, this is incorrect". If it's incorrect, the link will show up on another page needing review. If it's correct, once two or three people verify the page is correct, the link is finished.

Once all links are checked and the ones that need reviewing get reviewed, we just ask the database to generate the list, and it can go into a spreadsheet or a word doc or whatever.

This would be about six hours of programming, it is, believe it or not, a fairly simple job and would let the maximum number of people help. And I have time this weekend... :D


Date: 2012-03-30 10:10 pm (UTC)
From: [identity profile] vythe.livejournal.com
I know, it's all a trick to philter out people who don't know simple words like "advertence".

And "philter", too. :-)

Date: 2012-03-30 10:18 pm (UTC)
From: [identity profile] vythe.livejournal.com
On the practical side: LJ lets you edit your posts. So, you make a post for each novel with alphabetical list of their Weird Words and call on everybody to leave comments with the word reports. Once a day or so you visit each post and transfer the catch into the post bodies. That's all.

It will help if you also respectfully remove all processed comments to make it obvious where there is something new.

Whereas using overly-smart software will leave you forever worrying that you didn't do it quite right.

Volunteering, anytime, anywhen

Date: 2012-03-31 02:08 am (UTC)
From: [identity profile] kat ayers mannix (from livejournal.com)
I am definitely in to help collate.

Project Suggestions

Date: 2012-03-31 11:04 pm (UTC)
From: [identity profile] elaine bushore fisher (from livejournal.com)
Word may not have reliable pagination, but electronic forms of the book can be fed into FrameMaker, which has a simple command to mark a word for indexing (also for making a glossary, should you wish to do that also). You can then use the spell-checker to find WWs and index them. Also, the resulting index has hypertext links, so you can easily jump from the index entry to the referenced passage.

This would be in addition to actual readers. You would still need people to manually/optically search for the obsolete and unusual English words, which may not be picked up by spell-checker.

In this instance, the Hard Work would be the initial (manual) pagination. I know this because, once upon a time, this was a project of mine, intended as a surprise to the both of you. Then you asked the FoL to start working on a wiki, and I figured anything I could do would be duplication of that effort.

In any case, I agree with those suggesting Google Docs for the submissions. Rather than trying to keep the records separate, I further suggest you have somebody to compile all of the submissions into the same edition of the book, so you have a single mark-up that includes all of the submissions.

Authors Need Help: Brainstorming Session

Date: 2012-04-03 12:54 am (UTC)
From: [identity profile] stillsorting.livejournal.com
How about building a concordance. There are free concordance programs, though I haven't look recently. I can look into it if you decide it's worth the time. Concordance program builds an alpha list of all words in the book with page #/locators. You should be able to get a text file out of it which you can run through Word spell checker.

This gives you both the list and where the words/names appear.

There are concordance programs that handle multiple books, but last time I checked, those were pay software. Might have changed in the last 5 or years.

Anybody out there know more than me about this?

Thank you very much for all the enjoyment.

Judy

December 2025

S M T W T F S
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 252627
28293031   

Most Popular Tags

Expand Cut Tags

No cut tags