You are viewing [info]mtmonacelli's journal

Miscellaneous Programming thoughts Below are the 10 most recent journal entries recorded in the "mtmonacelli" journal:

[<< Previous 10 entries]

February 1st, 2005
02:56 pm

[Link]

Google supports blogs
OK, call me behind schedule, but I had no idea that Google provides a blogging environment. 

OK, call me brash and too quick to judge, but I'm moving my blogging off of LiveJournal and onto Blogger.com.  Why?  First, simply because I believe in Google (so far) and second because I want to see what other features Blogger.com offers.

One feature that isn't available on LiveJournal is searching functionality.  Yes, I know FeedSter does provide searching functionality for LiveJournalers, but I've tried multiple times to get FeedSter to scan my blogs with no success.  I'm hoping Blogger.com can provide me search functionality.

I really like how Blogger.com provides many (32 to be exact at the time of this writing) premade templates that I can use to present my data.  LiveJournal only provides 11 templated layouts.

Blogger.com makes me save my data on my own web space.  Currently I'm storing my data on Comcast.net. 

Google does search blogs on its own accord as it does with all other webpages, but if you want to speed up the search this link might help.

Here's my new home at least for the time being.

(1 comment | Leave a comment)

12:35 am

[Link]

Using Tagsoup
I'm parsing HTML using TagSoup just fine, but a problem I'm running into is that a web page I'm parsing has bad HTML that HTML Tidy is not fixing. 

I'm very surprised that TagSoup is not able to deal with carriage returns embedded within HTML comment tags. 

I'm sure that the carriage-returns are causing the problem because I've used dos2unix and then run TagSoup library on the output and it works just fine.  I've also use Emacs hexl-mode and found that I could remove the offending carriage return and again TagSoup runs fine after that.

My next goal is to play around with using fix-bad-comments and/or hide-comments options in Tidy.  Check out the newline parameter too.

Don't forget that you can add --tidy-mark true to prevent yourself from having to rname the output file to something indicating tidy has performed it's magic.

(Leave a comment)

January 28th, 2005
07:19 pm

[Link]

XMLUnit
Continue where I left off testing XMLUnit.  See eclipse/workspace/XMLUnit1 project.

(Leave a comment)

January 26th, 2005
03:45 pm

[Link]

Modifying Extensions in Firefox
I've got a great bunch of extensions that I've installed into Firefox.  One in particular though needs some modification.
Q: How can I make an extension modification?
A: Duffblog: Writing an Extension for Firefox and Creating New Packages for Mozilla

Interesting things I've gleaned from these links:
  • The content of the em:id element is a globally unique identifier (GUID). This is used to distinguish your extension from every other extension. When writing your own extensions, you should aways generate a new GUID for each distinct extension you write. Andy Hoskinson provides a GUID generator web service you can use for this.
  • Our extension is now complete. But to make it easy for users to install the extension, we should package it up in such a way that Firefox can install it easily. To do this, we bundle our files up into an XPI (Cross platform installer) file. An XPI file is just a normal zip file with files organized in a special way. Our XPI file should contain the following structure...(more here)
The answer to my question is in the second bullet point.  I want to modify the extension and the extension is in the form of an XPI file.  The solution is then to simply open the .xpi file in WinZip, modify the source, use the firefox interface to install the extension.  That was a breeze.

The structure of the xpi file is pretty simple.  See manual for how to deal with Jar files.
t@TaylorLaptop tmp$ unzip jumplink-v1.3.1.xpi
Archive:  jumplink-v1.3.1.xpi
  inflating: install.js
  inflating: install.rdf
 extracting: TODO
  inflating: CHANGELOG
   creating: chrome/
  inflating: chrome/jumplink.jar
t@TaylorLaptop tmp$ find .
.
./CHANGELOG
./chrome
./chrome/jumplink.jar
./install.js
./install.rdf
./jumplink-v1.3.1.xpi
./TODO
t@TaylorLaptop tmp$
Then to deal with the jar files.  You create a jar file like this: "jar cf target.jar {file1[,file2,...]|*}".  You expand the jar file like this: "jar xf target.jar"

My next goal is to combine the functionaliity of the JumpLink package with the Linky package.

(Leave a comment)

January 24th, 2005
07:07 pm

[Link]

XQuery with nux
Downloading nux-1.0rc3.zip to test out XQueries with java.  This package has quite a few dependencies that are bleeding edge stuff
  • xom 1.0
  • saxon 8.2
  • dom3 api
  • jaxme api 0.3.1
  • gnu getopt 1.0.7
Here is what I got:
  • xom 1.0
  • saxon 8.3
  • dom3 api
    • is this Xerces-J?  Assuming so...downloading xerces-2_6_2
    • here
  • ws-jaxme-current-bin.tar.gz
    • version 0.3.1
    • here
  • java-getopt-1.0.10.jar
Here is what I've been looking for: how to query Nasty html.


(Leave a comment)

January 20th, 2005
04:42 pm

[Link]

Web Scraping Proxy
Update Post!

 I wrote the author to ask him how I can download the Wsp and got an immediate response.  The username and password are available when after you've read the license agreement.  Here is where to read the license ageement.  http://www.research.att.com/~hpk/wsp/CPL.html If that link goes away then use this: When prompted for license agreement authorization use this User Name:"I accept www.opensource.org/licenses/cpl" and this Password:"." (one period, both w/o quotes).

I downloaded and ran wsp.pl on the windows box, but wsp complained that NET/SSLeay was not installed.  I tried using ppm to install it, but it seems that ActiveState doesn't support this module.  I made sure that openssl was installed in cygwin, but I simply can't find NET-SSLeay.

Original Post:

AT&T Labs Research - Web Scraping Proxy
I tried installing this tool
The Web Scraping Proxy (WSP) solves this problem by monitoring the flow of information between the browser and the Web site and emitting Perl LWP code fragments that can be used to write the Web Scraping program. A developer would use the WSP by browsing the site once with a browser that accesses the WSP as a proxy server. He then uses the emitted code as a template to build a Perl program that accesses the site.
But after installing the dependency modules I then tried to download the wsp.pl program and its password protected.  Strange!

(Leave a comment)

01:25 pm

[Link]

AT&T Research Projects
AT&T Research Projects:
I can never remember the name of this site.  They have some cool projects. One I particularly liked was graphviz.  Keywords: graph, svg, dot, dotviz, dotty, visualize graphs

(Leave a comment)

12:06 am

[Link]

Syntax Highlighting HTML
I've been looking for a tool for quite some time that will enable me to document code.  The trouble is that I was not using the right google keywords.  I kept getting links to IDEs that editted html.  Here are two that I just found
  • Syntax Highlighting HTML
  • CopySourceAsHtml
    • I've just spent the last hour trying out CopySourceAsHtml and I'm thoroughly impressed!  Its awesome!  It will color code HTML code too.  See some output on the authors page here.
Tidy reminder.  I can't ever seem to remember the arguments to Tidy that makes looking at the structure of XML/HTML easy.  Here it is:
tidy \
-i \
-w 100 \
-q \
--error-file test10/error.log \
--output-file test10/1.html \
--indent-attributes true \
--vertical-space true \
--break-before-br true \
--quote-marks false \
--quote-ampersand false test10/1.txt

(Leave a comment)

January 19th, 2005
10:46 pm

[Link]

OK, so it looks like MSDE days are over!  It seem that SQL Server 2005 Express  is taking it's place.  SQL Server Express supports XQuery.  I just downloaded and ran Microsoft SQL Server 2005 Beta 2 Setup, but it didn't install properly:
SQL Server 2005 Beta 2 Setup has detected incompatible beta components from Visual Studio or SQL Server.  To proceed, use Windows Add or Remove Programs to remove previous SQL Server Yukon components, SQL Server Support Files, and Common Language Runtime (CLR) components, and then run SQL Server 2005 Beta 2 Setup again.  For detailed instructions on uninstalling SQL Server builds, see the SQL Server 2005 Beta 2 readme file.For help, click here.
This might be due to the fact that I have Visual Studio.NET 2005 Beta 1 installed.  Oops.

(Leave a comment)

10:23 pm

[Link]

XQuery FLWR Expressions
So that's what FLWR means...
FLWR Expressions
While simple XPath expressions are fine and good, the real power of XQuery shines through with FLWR expressions. FLWR stands for For-Let-Where-Return, and is pronounced "flower". The FLWR expression is akin to SQL's SELECT query; it allows for XML data to be queried with conditional statements, and then returns a set of XML elements as a result.
Quoted from here.

(Leave a comment)

[<< Previous 10 entries]

Powered by LiveJournal.com