<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Andi&#039;s Blog</title>
	<atom:link href="http://sahits.ch/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://sahits.ch/blog</link>
	<description></description>
	<lastBuildDate>Thu, 10 May 2012 18:39:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>The Truth by Clell65619</title>
		<link>http://sahits.ch/blog/?p=1149</link>
		<comments>http://sahits.ch/blog/?p=1149#comments</comments>
		<pubDate>Thu, 10 May 2012 18:39:47 +0000</pubDate>
		<dc:creator>Andi</dc:creator>
				<category><![CDATA[Harry Potter]]></category>
		<category><![CDATA[Cell65619]]></category>
		<category><![CDATA[en]]></category>
		<category><![CDATA[Plausibilisierung]]></category>

		<guid isPermaLink="false">http://sahits.ch/blog/?p=1149</guid>
		<description><![CDATA[On his 16th birthday Harry gets rescued by Dumbledore from starvation only to sign some documents that will solve all problems. The soulution is a group marriage to Romilda Vane, Susanne Bones, Mariatta Edgecomb and Millicend Bullstrode. This group marriage heightens Harrys power as well as his spouses. Over the next year Dumbledore orchestrates their [...]]]></description>
			<content:encoded><![CDATA[<p>On his 16th birthday Harry gets rescued by Dumbledore from starvation only to sign some documents that will solve all problems. The soulution is a group marriage to Romilda Vane, Susanne Bones, Mariatta Edgecomb and Millicend Bullstrode. This group marriage heightens Harrys power as well as his spouses. Over the next year Dumbledore orchestrates their life so everyone thinks Harry has become Dark when he steals Voldemorts dead body and declares revenge.</p>
<p>As it turns out Dumbledore is in reality Nicholas Flamel who creates Dark wizards like Grindlewald, Voldemort and Harry so he can defeat them. The story is available as <a href="http://sahits.ch/blog/wp-content/uploads/2012/05/The-Truth-by-Clell65619.pdf">PDF</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sahits.ch/blog/?feed=rss2&#038;p=1149</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>String comparison in Java</title>
		<link>http://sahits.ch/blog/?p=1140</link>
		<comments>http://sahits.ch/blog/?p=1140#comments</comments>
		<pubDate>Sun, 25 Mar 2012 10:02:49 +0000</pubDate>
		<dc:creator>Andi</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programmieren]]></category>
		<category><![CDATA[compare]]></category>
		<category><![CDATA[measure]]></category>
		<category><![CDATA[similarity]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://sahits.ch/blog/?p=1140</guid>
		<description><![CDATA[I have tons of files laying around, mostly of the manually named in a iffy. Naturally this results in some misspellings. If you want to use the file name as an input for further automated processing, like categorizing of images, this greatly diminishes the value of your collection. Therefore I looked into methods of adaptive [...]]]></description>
			<content:encoded><![CDATA[<p>I have tons of files laying around, mostly of the manually named in a iffy. Naturally this results in some misspellings. If you want to use the file name as an input for further automated processing, like categorizing of images, this greatly diminishes the value of your collection. Therefore I looked into methods of adaptive correction. I choose an adaptive approach because I do not want to use a prefabricated dictionary, where certain words might be missing or in my case, the use of different languages makes this approach useless. Of course these techniques are not limited to file names but can also be applied to a text.<br />
<span id="more-1140"></span></p>
<h3>Building an adaptive dictionary</h3>
<p>This first step is quite easy and is the basis for later operations. Parse all filename for words and make up a dictionary. The important part is that you keep track of the number of occurrences. The more often a word occurs the more probable it is, that the word is correct.The basic assumption is, that errors occur occasionally. For example sometimes I mistype &#8216;the&#8217; as &#8216;teh&#8217;, but if I were to mistype it every time it could not be recognized.  In the end you will have a bunch of words that occurred only once, these might be misspelled or only be used once. There is no way to distinguish them, but these are the words you have to look at. Another group of misspelled words to look out for is the group with multiple occurrence which is very similar to another word with an even higher occurrence count. For example you mistyped &#8216;teh&#8217; four times but did it correctly 10 times.</p>
<h3>String similarity</h3>
<p>Investigating methods to find similar strings I stumbled on Ralph Allan Rice<a href="http://coderz4life.wordpress.com/2011/01/30/string-similarity-library/"> string-similarity library</a>. He uses the <a href="http://en.wikipedia.org/wiki/Dice%27s_coefficient">Dice</a> and<a href="http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance"> Jaro-Winkler</a> algorithm to compute the similarity. I checked them out and only found Jaro-Winkler of any use. This algo has some deficits, most notable its focus on matching beginnings of words: Fed is not very similar to Ned</p>
<h3>Eliminating related words</h3>
<p>This aspect I did not investigate further. The basic gist is that want to bring similar word together that are related like &#8216;see&#8217; and &#8216;sees&#8217; or &#8216;get&#8217; and &#8216;got&#8217;. This would best be done with <a href="http://lyle.smu.edu/~tspell/jaws/index.html">JAWS</a> and <a href="http://wordnet.princeton.edu/wordnet/">WordNet</a>.</p>
<h3>Some further theoretical thoughts</h3>
<p>These are things that I did not fully implement, the the theory seems sound, however there are probably some problems with suitable abort criteria for efficient computation. First some definitions:</p>
<ul>
<li>Operation: An operation transforms string A into string B. Operations can be queued/concatenated (°). An operation is a simple action. Each operation has its complexity. Concatenated operations are not necessary commutative, meaning executing a°b on A does not result in the same string as b°a.</li>
</ul>
<ul>
<li>Complexity: The operation with the complexity 0 is the Identity. The complexity is a floating point value. The complexity is measured against the length of the string.</li>
</ul>
<ul>
<li>Minimal length: The only operation executable on a string of length 2 is a Transposition. Therefore the minimal length of a String is considered 2 characters.</li>
</ul>
<ul>
<li>Similarity of order: String A and B are similar if there exists a sequence of Operation that transforms String A into String B and there combined (added up) complexity is below the specified order.</li>
</ul>
<ul>
<li>Success: To measure if an operation leads to a more similar result, the similarity computed with one of the above algorythms must be greater.</li>
</ul>
<ul>
<li>Upper-Lower-Case: Most languages do not distinguish upper and lower case words. To my knowledge the is only a link between upper-lower case in the grammatical sense but not the semantical. Therefore this can be ignored as a no-issue.</li>
</ul>
<p>Many of the operations will use common sub sequences to limit their range (already common sequences must not be transformed). Identifying the common causes for errors helps defining operators:</p>
<ul>
<li>Transposition: swapping two characters in the string. The closer the characters the lesser the complexity.</li>
</ul>
<ul>
<li>Replacement: change a single character with another.</li>
</ul>
<ul>
<li>Misstyped: During spelling out the word the key besides the intended was hit. Different complexity for the direction of the two keys.</li>
</ul>
<ul>
<li>MissingCharacter: character is missing. Higher complexity within the word (when typing rapidly, it may happen that the first letter of the second string is the last letter of the first, this would be a transposition of the space and the letter, spaces however were removed during parsing)</li>
</ul>
<ul>
<li>Duplication: Duplication of an existing character</li>
</ul>
<ul>
<li>Reducing: Reduce double characters</li>
</ul>
<ul>
<li>Removal: Remove a character. Higher complexity within the word. Reverse of MissingCharacter.</li>
</ul>
<p>The code is part of my <a href="http://sourceforge.net/projects/sahitsutil/">utilities project</a>, but not yet part of the latest released version (1.2.3)</p>
]]></content:encoded>
			<wfw:commentRss>http://sahits.ch/blog/?feed=rss2&#038;p=1140</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>End of Pure England</title>
		<link>http://sahits.ch/blog/?p=1133</link>
		<comments>http://sahits.ch/blog/?p=1133#comments</comments>
		<pubDate>Mon, 27 Feb 2012 09:57:27 +0000</pubDate>
		<dc:creator>Andi</dc:creator>
				<category><![CDATA[Harry Potter]]></category>
		<category><![CDATA[DrT]]></category>
		<category><![CDATA[en]]></category>
		<category><![CDATA[short]]></category>

		<guid isPermaLink="false">http://sahits.ch/blog/?p=1133</guid>
		<description><![CDATA[This short story shows what happens if the pure bloods of England enact on their believes, by cutting them selfs of from the rest of the magical population: Separation of Part of England from the British magic community,  decrease in economic power, decrease in population. All this ends 50 year after it starts when they [...]]]></description>
			<content:encoded><![CDATA[<p>This short story shows what happens if the pure bloods of England enact on their believes, by cutting them selfs of from the rest of the magical population: Separation of Part of England from the British magic community,  decrease in economic power, decrease in population. All this ends 50 year after it starts when they have all sold them selfs out to the Goblins.</p>
<p>Available as <a href="http://sahits.ch/blog/wp-content/uploads/2012/02/The-End-of-Pure-England-by-DrT.pdf">PDF</a> or <a href="http://www.ficwad.com/story/168414">online</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sahits.ch/blog/?feed=rss2&#038;p=1133</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When Vernon didn&#8217;t miss by DrT</title>
		<link>http://sahits.ch/blog/?p=1128</link>
		<comments>http://sahits.ch/blog/?p=1128#comments</comments>
		<pubDate>Thu, 23 Feb 2012 14:27:35 +0000</pubDate>
		<dc:creator>Andi</dc:creator>
				<category><![CDATA[Harry Potter]]></category>
		<category><![CDATA[Alternate Universe]]></category>
		<category><![CDATA[DrT]]></category>
		<category><![CDATA[en]]></category>
		<category><![CDATA[full feature]]></category>

		<guid isPermaLink="false">http://sahits.ch/blog/?p=1128</guid>
		<description><![CDATA[This fic starts after Harry&#8217;s first year. The starting event is the successful appliance of physical violence against Harry by Vernon, almost killing him. He is then rescued by an ancient Brotherhood and nursed back to strength. While the basic events are the same Harry manages to achieve the goals more easily: Pettigrew is caught [...]]]></description>
			<content:encoded><![CDATA[<p>This fic starts after Harry&#8217;s first year. The starting event is the successful appliance of physical violence against Harry by Vernon, almost killing him. He is then rescued by an ancient Brotherhood and nursed back to strength. While the basic events are the same Harry manages to achieve the goals more easily:</p>
<ul>
<li>Pettigrew is caught during the summer, freeing Sirius</li>
<li>Riddle is defeated more easily, though with a greater trauma to Ginny, who repeats her first year and is sorted into Ravenclaw</li>
<li>Courch Jr. escapes prematurely and helps Voldemort embody himself a year earlier</li>
<li>Voldemort is killed at the end of the Triwizard turnament</li>
</ul>
<p>On the side line there are other changes. Harry and Hermione come together and include Luna. Pansy strikes Hermione with a poisoned dagger almost killing her. As the Brotherhood all have Horocruxes, Harry and Hermione join them.</p>
<p>There is a <a href="http://sahits.ch/blog/wp-content/uploads/2012/02/When-Vernon-didnt-miss.pdf">PDF</a> available.</p>
]]></content:encoded>
			<wfw:commentRss>http://sahits.ch/blog/?feed=rss2&#038;p=1128</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Publishing Artifact to maven repository</title>
		<link>http://sahits.ch/blog/?p=1121</link>
		<comments>http://sahits.ch/blog/?p=1121#comments</comments>
		<pubDate>Fri, 30 Sep 2011 10:53:59 +0000</pubDate>
		<dc:creator>Andi</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programmieren]]></category>
		<category><![CDATA[en]]></category>
		<category><![CDATA[maven]]></category>
		<category><![CDATA[publish]]></category>
		<category><![CDATA[release]]></category>
		<category><![CDATA[repository]]></category>

		<guid isPermaLink="false">http://sahits.ch/blog/?p=1121</guid>
		<description><![CDATA[Since some time I set up new Java projects with Maven. Additionally I migrated some old ones to Maven. The main issue was dependency handling and building. No more manual writting/copying of Ant build.xml&#8217;s. This works well as long you need libraries/artefacts that can be found in maven repository and/or the ones not available are [...]]]></description>
			<content:encoded><![CDATA[<p>Since some time I set up new Java projects with <a href="http://maven.apache.org/index.html">Maven</a>. Additionally I migrated some old ones to Maven. The main issue was dependency handling and building. No more manual writting/copying of Ant build.xml&#8217;s. This works well as long you need libraries/artefacts that can be found in maven repository and/or the ones not available are your own. In the latter case they reside in your local repository and are accessible by other projects. Up until now this worked well since such dependencies only existed inside one project and could be resolved. However in the OpenSource world there comes a time where this is no longer possible. This happened, when I wanted to use <a href="http://sourceforge.net/projects/sahitsutil/develop">ch.sahits.sahitsUtils</a> in my game project <a href="http://openpatrician.sourceforge.net/">OpenPatrician</a>. The logical conclusion was to publish the artefact to a maven repository. This however proved a little more difficult.<span id="more-1121"></span></p>
<p>The first step in such things is always to search the net, where I found some few hints. Some brought me a step further. The first realisation I had was that though the software is OpenSource the way into a repository is not. But it is not that complicated. You have to decide on a repository where you want to deploy your release into. Most repositories are interconnected, meaning they synchronize new artefacts, so your stuff should be promoted through the whole Maven ecosystem within some days. For OpenSource software the obvious choice seems to be the <a href="https://oss.sonatype.org">OSS repository of Sonatype</a>. Whatever repository you choose the first step is an administrative one: Create an account and state your intentions on adding a project.</p>
<p>The following links helped enormously in the process to get my artefact into the repository:</p>
<ul>
<li><a href="http://blog.soebes.de/index.php?/archives/332-The-unknown-creature-The-Maven-Release-Cycle.html">Description of the Maven Release Cycle</a></li>
<li><a title="How To Generate PGP Signatures With Maven" href="https://docs.sonatype.org/display/Repository/How+To+Generate+PGP+Signatures+With+Maven">How To Generate PGP Signatures With Maven</a></li>
<li><a href="https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide">Sonatype OSS Repository User Guide</a></li>
</ul>
<p>Normally the repository has some policies as to the minimal requirements of your pom file. I figure the ones of Sonatype to be reasonable. In the case of the sahitsUtils project this resulted in the following pom.xml:</p>
<pre>&lt;project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&gt;
	&lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt;
	&lt;groupId&gt;ch.sahits&lt;/groupId&gt;
	&lt;artifactId&gt;sahitsUtil&lt;/artifactId&gt;
	&lt;version&gt;1.2.4-SNAPSHOT&lt;/version&gt;
	&lt;name&gt;sahitsUtil&lt;/name&gt;
	&lt;description&gt;Utilities used in different projects of Sahits GmbH&lt;/description&gt;
	&lt;parent&gt;
		&lt;groupId&gt;org.sonatype.oss&lt;/groupId&gt;
		&lt;artifactId&gt;oss-parent&lt;/artifactId&gt;
		&lt;version&gt;7&lt;/version&gt;
	&lt;/parent&gt;
	&lt;developers&gt;
		&lt;developer&gt;
			&lt;id&gt;hotzst&lt;/id&gt;
			&lt;name&gt;Andi Hotz&lt;/name&gt;
			&lt;organization&gt;Sahits GmbH&lt;/organization&gt;
		&lt;/developer&gt;
	&lt;/developers&gt;
	&lt;licenses&gt;
		&lt;license&gt;
			&lt;name&gt;GNU Library General Public License (GPL), Version 2.0&lt;/name&gt;
			&lt;url&gt;http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt&lt;/url&gt;
			&lt;distribution&gt;repo&lt;/distribution&gt;
		&lt;/license&gt;
		&lt;license&gt;
			&lt;name&gt;Lesser General Public License (LGPL)), Version 2.0&lt;/name&gt;
			&lt;url&gt;http://www.gnu.org/licenses/old-licenses/lgpl-2.0.txt&lt;/url&gt;
			&lt;distribution&gt;repo&lt;/distribution&gt;
		&lt;/license&gt;
	&lt;/licenses&gt;
	&lt;build&gt;
		&lt;plugins&gt;
			&lt;plugin&gt;
				&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
				&lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
				&lt;configuration&gt;
					&lt;source&gt;1.5&lt;/source&gt;
					&lt;target&gt;1.5&lt;/target&gt;
				&lt;/configuration&gt;
			&lt;/plugin&gt;
			&lt;plugin&gt;
				&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
				&lt;artifactId&gt;maven-javadoc-plugin&lt;/artifactId&gt;
				&lt;configuration&gt;
					&lt;show&gt;private&lt;/show&gt;
					&lt;nohelp&gt;true&lt;/nohelp&gt;
				&lt;/configuration&gt;
			&lt;/plugin&gt;
			&lt;plugin&gt;
				&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
				&lt;artifactId&gt;maven-release-plugin&lt;/artifactId&gt;
				&lt;configuration&gt;
					&lt;tagBase&gt;https://sahitsutil.svn.sourceforge.net/svnroot/sahitsutil/tags&lt;/tagBase&gt;
				&lt;/configuration&gt;
			&lt;/plugin&gt;
			&lt;plugin&gt;
				&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
				&lt;artifactId&gt;maven-gpg-plugin&lt;/artifactId&gt;
				&lt;executions&gt;
					&lt;execution&gt;
						&lt;id&gt;sign-artifacts&lt;/id&gt;
						&lt;phase&gt;verify&lt;/phase&gt;
						&lt;goals&gt;
							&lt;goal&gt;sign&lt;/goal&gt;
						&lt;/goals&gt;
					&lt;/execution&gt;
				&lt;/executions&gt;
			&lt;/plugin&gt;

		&lt;/plugins&gt;
	&lt;/build&gt;
	&lt;dependencies&gt;
		&lt;dependency&gt;
			&lt;groupId&gt;junit&lt;/groupId&gt;
			&lt;artifactId&gt;junit&lt;/artifactId&gt;
			&lt;version&gt;4.8.1&lt;/version&gt;
		&lt;/dependency&gt;
	&lt;/dependencies&gt;
	&lt;organization&gt;
		&lt;name&gt;Sahits GmbH&lt;/name&gt;
	&lt;/organization&gt;
	&lt;scm&gt;
		&lt;url&gt;https://sahitsutil.svn.sourceforge.net/svnroot/sahitsutil/trunk&lt;/url&gt;
		&lt;connection&gt;scm:svn:https://sahitsutil.svn.sourceforge.net/svnroot/sahitsutil/trunk&lt;/connection&gt;
		&lt;developerConnection&gt;scm:svn:https://sahitsutil.svn.sourceforge.net/svnroot/sahitsutil/trunk&lt;/developerConnection&gt;
	&lt;/scm&gt;
	&lt;distributionManagement&gt;
		&lt;repository&gt;
			&lt;id&gt;sonatype-nexus-staging&lt;/id&gt;
			&lt;url&gt;https://oss.sonatype.org/service/local/staging/deploy/maven2&lt;/url&gt;
		&lt;/repository&gt;
		&lt;snapshotRepository&gt;
			&lt;id&gt;sonatype-nexus-snapshots&lt;/id&gt;
			&lt;url&gt;https://oss.sonatype.org/content/repositories/snapshots&lt;/url&gt;
		&lt;/snapshotRepository&gt;
	&lt;/distributionManagement&gt;
&lt;/project&gt;</pre>
<p>For details on the working of the <a href="http://maven.apache.org/plugins/maven-gpg-plugin/">maven-gpg-plugin</a> see <a title="How To Generate PGP Signatures With Maven" href="https://docs.sonatype.org/display/Repository/How+To+Generate+PGP+Signatures+With+Maven">How To Generate PGP Signatures With Maven</a>.</p>
<p>The standard procedure for releasing an artefact would be:</p>
<pre class="brush:shell">mvn release:clean release:prepare release:perform</pre>
<p>The prepare goal checks your project and prepares it for releasing:</p>
<ul>
<li>Check that all the local resources are committed</li>
<li>Check that all dependencies have a defined version (no SNAPSHOTs allowed)</li>
<li>If your version is a SNAPSHOT version, remove the SNAPSHOT part</li>
<li>Execute the tests against the changed pom.</li>
<li>Commit the modified POM</li>
<li>Create a tag using the configuration specified in scm (Note that tagBase was supplied in the configuration of maven-release-plugin)</li>
<li>Increase the version</li>
<li>Commit the modified POM</li>
</ul>
<p>If you do not run <code>mvn release:prepare</code> with the batch option -B, you are asked for:</p>
<ul>
<li>release version</li>
<li>SCM release tag or label</li>
<li>new development version</li>
</ul>
<p>The proposed values most often are good defaults, so you can run this in batch mode. However if you have specified a passphrase for your private key, you cannot use batch mode, unless you specify it with <code>-Dgpg.passphrase=thephrase</code>. The prepare step produces the artefacts for the pom, binary, source and javadoc along with the signature files.</p>
<p>Since this project was converted, the checkout in <code>mvn release:perform produces</code> a non-default layout so this step fails. I therefore found it easier to bundle the artefacts in a jar and upload them manually to the repository. This worked perfectly as <a href="https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide#SonatypeOSSMavenRepositoryUsageGuide-7b.StageExistingArtifacts">described</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sahits.ch/blog/?feed=rss2&#038;p=1121</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

