<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Atoms of English</title>
	<link>http://kybernetikos.com/2007/12/03/atoms-of-english/</link>
	<description></description>
	<pubDate>Tue, 07 Feb 2012 23:28:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.2</generator>

	<item>
		<title>by: kyb</title>
		<link>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-6114</link>
		<pubDate>Sun, 09 Dec 2007 20:03:30 +0000</pubDate>
		<guid>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-6114</guid>
					<description>Update: I noticed that some of my parsing was off - most of the words were fine, but some were a bit messed up.  I'm not going to bother rerunning it just now, since it takes ages.  Anyway, I'm more interested in visualising the graph...

John:  Do you want the list for programming with, or for looking at?  And how is sorting it according to java hashcode not useful?  :-)  Actually, you've made me want to publish a dictionary where the words are on the page of their hashcode according to a simple human computable method.  I ran a couple of tests with features humans could easily spot (number of descenders, number of vowels, etc) and then modding it by the intended number of pages (500 seems reasonable), but I'm still not down to a reasonable maximum number of words per page.  If anyone can help me, we can share the enormous profits from the publishing (lulu).

Fantastic, a dictionary you can only use if you already know how to spell the word and can do mental arithmetic.  Could make it even more evil if you included aspects of the meaning into the hash code, then you could only use it if you know what the word means and how to spell it.  You know it'd be a hit with elitist geeks.</description>
		<content:encoded><![CDATA[<p>Update: I noticed that some of my parsing was off - most of the words were fine, but some were a bit messed up.  I&#8217;m not going to bother rerunning it just now, since it takes ages.  Anyway, I&#8217;m more interested in visualising the graph&#8230;</p>
<p>John:  Do you want the list for programming with, or for looking at?  And how is sorting it according to java hashcode not useful?  :-)  Actually, you&#8217;ve made me want to publish a dictionary where the words are on the page of their hashcode according to a simple human computable method.  I ran a couple of tests with features humans could easily spot (number of descenders, number of vowels, etc) and then modding it by the intended number of pages (500 seems reasonable), but I&#8217;m still not down to a reasonable maximum number of words per page.  If anyone can help me, we can share the enormous profits from the publishing (lulu).</p>
<p>Fantastic, a dictionary you can only use if you already know how to spell the word and can do mental arithmetic.  Could make it even more evil if you included aspects of the meaning into the hash code, then you could only use it if you know what the word means and how to spell it.  You know it&#8217;d be a hit with elitist geeks.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: John</title>
		<link>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5985</link>
		<pubDate>Wed, 05 Dec 2007 14:03:44 +0000</pubDate>
		<guid>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5985</guid>
					<description>Would it be possible to sort the list alphabetically?</description>
		<content:encoded><![CDATA[<p>Would it be possible to sort the list alphabetically?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: kyb</title>
		<link>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5942</link>
		<pubDate>Tue, 04 Dec 2007 11:13:28 +0000</pubDate>
		<guid>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5942</guid>
					<description>I would have liked a dictionary that gave me word trunks, but ideally in English.  You can have the java source if fancy trying it on German.  However, I'm not totally sure that all derived versions of words are necessarily redundant.  You can never predict 100% the way a derived word will be used just from the derivation, so there is some level of distinctiveness in each derived word, and that distinctiveness may include an atomic concept.

However I'm not claiming this technique is even reasonable, just the best I could come up with in a few hours the other night...

I'm planning to do a directed graph visualisation of all the words needed to define other words at some point, and I'll post it here when I do.  Have to finish detecting edges first though.</description>
		<content:encoded><![CDATA[<p>I would have liked a dictionary that gave me word trunks, but ideally in English.  You can have the java source if fancy trying it on German.  However, I&#8217;m not totally sure that all derived versions of words are necessarily redundant.  You can never predict 100% the way a derived word will be used just from the derivation, so there is some level of distinctiveness in each derived word, and that distinctiveness may include an atomic concept.</p>
<p>However I&#8217;m not claiming this technique is even reasonable, just the best I could come up with in a few hours the other night&#8230;</p>
<p>I&#8217;m planning to do a directed graph visualisation of all the words needed to define other words at some point, and I&#8217;ll post it here when I do.  Have to finish detecting edges first though.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: woodly</title>
		<link>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5930</link>
		<pubDate>Tue, 04 Dec 2007 07:25:40 +0000</pubDate>
		<guid>http://kybernetikos.com/2007/12/03/atoms-of-english/#comment-5930</guid>
					<description>what school of thought are you referencing exactly?
&quot;blow&quot; and &quot;blown&quot; are redundant for example, you should work on word trunks like canoo.net :-)</description>
		<content:encoded><![CDATA[<p>what school of thought are you referencing exactly?<br />
&#8220;blow&#8221; and &#8220;blown&#8221; are redundant for example, you should work on word trunks like canoo.net :-)
</p>
]]></content:encoded>
				</item>
</channel>
</rss>

