<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:st1="urn:schemas-microsoft-com:office:smarttags" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><o:SmartTagType
namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="PostalCode"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="State"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="City"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="place"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="PlaceName"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="PlaceType"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:"Gill Sans MT";
        panose-1:2 11 5 2 2 1 4 2 2 3;}
@font-face
        {font-family:"Book Antiqua";
        panose-1:2 4 6 2 5 3 5 3 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
p.MsoAutoSig, li.MsoAutoSig, div.MsoAutoSig
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
span.EmailStyle18
        {mso-style-type:personal;
        font-family:"Gill Sans MT";
        color:windowtext;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Gill Sans MT";
        color:navy;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 color=navy face="Gill Sans MT"><span
style='font-size:10.0pt;font-family:"Gill Sans MT";color:navy'>Hi all,<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face="Gill Sans MT"><span
style='font-size:10.0pt;font-family:"Gill Sans MT";color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face="Gill Sans MT"><span
style='font-size:10.0pt;font-family:"Gill Sans MT";color:navy'>Here’s my
second attempt to post my response to the IVC site on the listserv…<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face="Gill Sans MT"><span
style='font-size:10.0pt;font-family:"Gill Sans MT";color:navy'><o:p> </o:p></span></font></p>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face="Times New Roman"><span style='font-size:12.0pt'>
<hr size=2 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2
face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'> Stacy Rebich
[mailto:stacy@geog.ucsb.edu] <br>
<b><span style='font-weight:bold'>Sent:</span></b> Wednesday, January 25, 2006
11:44 PM<br>
<b><span style='font-weight:bold'>To:</span></b>
'visinfo-bounces@zydeco.mat.ucsb.edu'<br>
<b><span style='font-weight:bold'>Subject:</span></b> response to week 2
reading</span></font><o:p></o:p></p>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><a name="OLE_LINK1"></a><a name="OLE_LINK2"></a><font
size=3 face="Times New Roman"><span style='font-size:12.0pt'>Response to </span></font><a
href="http://iv.slis.indiana.edu/sw/" target="_blank">Information Visualization
CyberInfrastructure website</a>:<o:p></o:p></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>I think that spending some time going through a lot of the content on this
website helped to give me a clearer picture of the general steps I’ll
need to go through to create information/knowledge from my data. While I
can’t say I found definitive answers to a lot of my questions here, I do
feel that it helped me to develop more specific questions about how to approach
my data filtering and organization issues. I’m going to throw in a
few of these questions here, and if anyone has some good insight/advice,
I’d be happy to hear it.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>I realize the first step that I need to take once I have my marked-up
data is to create a list/matrix of topics that I can use as inputs for the
spatialization algorithm(s) I choose. Does anyone have a recommendation
for a program that does good stop word removal and stemming? It seems
like TMG (see link below) will do these things and construct a matrix…any
thoughts about this package? <a
href="http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/">http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/</a>
<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>(Jenn, is this the same one you’re using?) I ask about stop
word removal and stemming because there is another type of topic extractor
I’d like to try as well (described below), but I think it may require
some preprocessing of this sort.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>So the thing I looked at in more detail is the <a
href="http://cog.brown.edu/~gruffydd/papers/SteyversGriffiths.pdf">Griffiths
and Steyvers Topics Model</a> listed on the IVC page. This model uses <a
href="http://www.cs.berkeley.edu/~blei/lda-c/">latent Dirichlet allocation</a>
(see <a href="http://www.cs.berkeley.edu/~blei/papers/blei03a.pdf">Blei et al.</a>
paper). Another application that seems to do basically the same thing is
this one at <a href="http://www.arbylon.net/projects/">knowceans.org</a>.
Anyway, from what I understand of this approach, it’s different from
other types of topic identification approaches in that it’s based on a
Bayesian probability model. The algorithm works by first establishing
probability estimates for word frequency using a sample set of documents and then
uses these probability estimates to identify words/topics in the documents of
interest whose frequency exceeds their predicted frequency. It seems that
this approach could be a good way to help identify the unique features of each
document (uniqueness being determined by the set of sample documents used to
establish probability estimates).<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>There was also some discussion in the reading that I did about the need
to decide how many topics should be extracted when using this method. It
seems that if you ask for a number of unique topics that is too small, the
categories turn out to be too general. If you ask for too many, topic
groups start to include collections of words that have little obvious
connection. Does anyone know if there are any guidelines that can help
you to choose a relatively good number of topics to start with?<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Did anyone else try to download and install the IVC software? I
tried, but when I unzipped the download folder, I couldn’t find the files
discussed in the installation instructions. I think the instructions were
actually written for an earlier version, but it wasn’t obvious to me how
to do the install with the latest release.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face="Gill Sans MT"><span style='font-size:
10.0pt;font-family:"Gill Sans MT"'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face="Gill Sans MT"><span style='font-size:
10.0pt;font-family:"Gill Sans MT"'><o:p> </o:p></span></font></p>
<p class=MsoAutoSig><font size=2 face="Book Antiqua"><span style='font-size:
10.0pt;font-family:"Book Antiqua"'>~~~~~~~~~~~~~~~~~~<o:p></o:p></span></font></p>
<p class=MsoAutoSig><font size=2 face="Book Antiqua"><span style='font-size:
10.0pt;font-family:"Book Antiqua"'>Stacy Rebich<o:p></o:p></span></font></p>
<p class=MsoAutoSig><font size=2 face="Book Antiqua"><span style='font-size:
10.0pt;font-family:"Book Antiqua"'>Graduate Student<o:p></o:p></span></font></p>
<p class=MsoAutoSig><font size=2 face="Book Antiqua"><span style='font-size:
10.0pt;font-family:"Book Antiqua"'>Department of Geography<o:p></o:p></span></font></p>
<p class=MsoAutoSig><st1:place w:st="on"><st1:PlaceType w:st="on"><font size=2
face="Book Antiqua"><span style='font-size:10.0pt;font-family:"Book Antiqua"'>University</span></font></st1:PlaceType><font
size=2 face="Book Antiqua"><span style='font-size:10.0pt;font-family:"Book Antiqua"'>
of <st1:PlaceName w:st="on">California</st1:PlaceName></span></font></st1:place><font
size=2 face="Book Antiqua"><span style='font-size:10.0pt;font-family:"Book Antiqua"'><o:p></o:p></span></font></p>
<p class=MsoAutoSig><st1:place w:st="on"><st1:City w:st="on"><font size=2
face="Book Antiqua"><span style='font-size:10.0pt;font-family:"Book Antiqua"'>Santa
Barbara</span></font></st1:City><font size=2 face="Book Antiqua"><span
style='font-size:10.0pt;font-family:"Book Antiqua"'>, <st1:State w:st="on">CA</st1:State>
<st1:PostalCode w:st="on">93106</st1:PostalCode></span></font></st1:place><font
size=2 face="Book Antiqua"><span style='font-size:10.0pt;font-family:"Book Antiqua"'><o:p></o:p></span></font></p>
<p class=MsoAutoSig><font size=2 face="Book Antiqua"><span style='font-size:
10.0pt;font-family:"Book Antiqua"'>~~~~~~~~~~~~~~~~~~<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
</div>
</body>
</html>