I was considering a similar problem of my own, just the other day.
1) Get a dictionary list file. easily available, but note that ones intended for password cracking (brute-force, rainbowing, etc) will have things like "abc123" and "qwertyuiop". You can
use a list with all these, I suppose, because any that survive any of the further stages can be spotted (no longer in the midst of half a million other words, some of which may look as invalid only because you don't know them) but maybe you want to look for a "scrabble" list, if available, because that's got only valid words (including "aa", "qi", "re" and other such game-winners) to start with. Anyway, you can still normalise to lower-case (if necessary) or remove those whose capitalisation indicates being proper nouns (if you don't want them).
2) For each word in the dictionary list, blindly apply the length*25 letter changes, (length+1)*26 letter additions and (length) letter removals to see if any/many match another word in the list. You can discard any that have no matches.
3) Any that do, extract this and all first-order matches from the initial dictionary list and store in a mini list of its own (within a list of such lists, or keyword (perl-hash or python-dictionary or whatever your system calls it) index containing linked lists - if the latter, I suggest using the alphabetically-first word as key, but other systems can work as well). Also bring those dictionary-ripped words to the front of the testing queue (depends on how you're shuffling your data).
4) Repeat, but once you've started putting word-groups into whichever sublists you search in all sublists (except the one that a current word is
in), for previously words not already tested (suggestion: postfix an asterisk or other meta-character on the ones that caused
a grab of others, then skip over them again when seeking matches, though allow them to be sought; though you can asterisk the ones only
grabbed, instead, if you want, with reverse intent). If a sublist item finds a word in another sublist, merge the sublists, as well as dragging in any more words still in the main input stack.
5) Once you've tested every word (nothing left in the input stack, no sublist has any word as yet untried) you now have bunches of related words. "Quick" and "quack" will be (the only two?) words in one. You can then work with each smaller-than-whole-dictionary bunch to make sure you have the full net of relationships. (If you didn't, during the above processing, already store all direct neighbourships (backwards and forwards) alongside the word entry. But that depends if you're happy to deal with lists of lists of word+list structures.)
6) You can analyse them for various metrics. Number of words in each bag (related, though most probably not neighbouringly so), looking for longest non-intersecting chains (related to the salesman problem, if you wish to weight by themparticular inter-word transform used), looking to prove or easily throw out the possibility of a Euler path (Konigsburg bridge problem, every relationship travelled just once, words may be revisited multiple times), and even find such a Hamiltonian for this subset only
If you decide to remove a nonsense word from a bunch (after discovering it) you'd have to reprocess the bunch in case it splits into two or more now-unlinked lists. If you find more words then just push them into the (emptied) input dictionary to allow them to be checked for mergers. It's fairly resilient. The initial sorting could
be made more intelligent to save some of the brute forcing and save processor cycles, but cycles are relatively cheap (as is storage space, these days, so you don't even have to worry about that overhead) and it's how you derive the bunch (or subset of a bunch) that is down to what you then want to derive, and how.
Depending on how you store your listed data in memory structures, I'd suggest exporting the bunches to files. Apart from redoing with word additions/removals, you are now at leisure to use a subsequent script to import each of those bunches to do the bunch-analysis. Practice on the small ones (which you can visually and manually check, to see that your netting and net-reading gives what you want) then once you're happy target the largest
one. It'll either break things (too much time or too much memory, because you still aren't being smart enough - then you know you need to improve your methods) or it quickly churns out the most likely maximal solution to your problem, which can be used to truncate "how low you can go" down the list back to the bottom, before ruling out any more solutions that could beat it.
(Adapted and adopted from a thought-project of similar scope considered on a Boxing Day walk. Not yet put into practice, so obviously
there are optimisations and time-saving methods I haven't discovered as possible yet, and I only have a basic idea about the Order of the time/space complexity I'd encounter. It should
all work without issue other than coding errors, though.)