I have a list which is currently one big file, divided by book. (It's trivial to split it into separate files for separate books.)
I also have a list of all words, sorted alphabetically, which I've used to derived a list of words to be suppressed -- things that are utterly typical in-genre (e.g. hyperdrive, empath), clear typos (e.g. imaginatiion), apparent dialect (e.g. checkin', cutesey, damnfool), apparent omissions from my wordlist (e.g. fatcats, flowerbed), sound-representations (e.g. ahhh, fwuummps), or proper nouns not within the story proper (e.g. baen, bujold.) Given what you say here, the list probably isn't inclusive enough; I should probably have included all compound English words (e.g. betold, bloodprice, blueglow.)
Do you want me to make a second go-round adding to the words to be considered English? Or email you the list of additional English words? Or email you the list of all words and you can make your own selections from there? Or, heck, email you the whole shebang so that if a bus hits me tomorrow you have what I've done to date?
Automagician
Date: 2012-04-12 06:16 pm (UTC)I also have a list of all words, sorted alphabetically, which I've used to derived a list of words to be suppressed -- things that are utterly typical in-genre (e.g. hyperdrive, empath), clear typos (e.g. imaginatiion), apparent dialect (e.g. checkin', cutesey, damnfool), apparent omissions from my wordlist (e.g. fatcats, flowerbed), sound-representations (e.g. ahhh, fwuummps), or proper nouns not within the story proper (e.g. baen, bujold.) Given what you say here, the list probably isn't inclusive enough; I should probably have included all compound English words (e.g. betold, bloodprice, blueglow.)
Do you want me to make a second go-round adding to the words to be considered English? Or email you the list of additional English words? Or email you the list of all words and you can make your own selections from there? Or, heck, email you the whole shebang so that if a bus hits me tomorrow you have what I've done to date?