02:56 pm
[Link] |
Google supports blogs OK, call me behind schedule, but I had no idea that Google provides a blogging environment.
OK, call me brash and too quick to judge, but I'm moving my blogging off of LiveJournal and onto Blogger.com. Why? First, simply because I believe in Google (so far) and second because I want to see what other features Blogger.com offers.
One feature that isn't available on LiveJournal is searching functionality. Yes, I know FeedSter does provide searching functionality for LiveJournalers, but I've tried multiple times to get FeedSter to scan my blogs with no success. I'm hoping Blogger.com can provide me search functionality.
I really like how Blogger.com provides many (32 to be exact at the time of this writing) premade templates that I can use to present my data. LiveJournal only provides 11 templated layouts.
Blogger.com makes me save my data on my own web space. Currently I'm storing my data on Comcast.net.
Google does search blogs on its own accord as it does with all other webpages, but if you want to speed up the search this link might help.
Here's my new home at least for the time being.
|
12:35 am
[Link] |
Using Tagsoup I'm parsing HTML using TagSoup just fine, but a problem I'm running into is that a web page I'm parsing has bad HTML that HTML Tidy is not fixing.
I'm very surprised that TagSoup is not able to deal with carriage returns embedded within HTML comment tags.
I'm sure that the carriage-returns are causing the problem because I've used dos2unix and then run TagSoup library on the output and it works just fine. I've also use Emacs hexl-mode and found that I could remove the offending carriage return and again TagSoup runs fine after that.
My next goal is to play around with using fix-bad-comments and/or hide-comments options in Tidy. Check out the newline parameter too.
Don't forget that you can add --tidy-mark true to prevent yourself from having to rname the output file to something indicating tidy has performed it's magic.
|
07:19 pm
[Link] |
XMLUnit Continue where I left off testing XMLUnit. See eclipse/workspace/XMLUnit1 project.
|
03:45 pm
[Link] |
Modifying Extensions in Firefox I've got a great bunch of extensions that I've installed into Firefox. One in particular though needs some modification. Q: How can I make an extension modification? A: Duffblog: Writing an Extension for Firefox and Creating New Packages for Mozilla
Interesting things I've gleaned from these links:
- The content of the em:id element is a globally unique identifier (GUID). This is used to distinguish your extension from every other extension. When writing your own extensions, you should aways generate a new GUID for each distinct extension you write. Andy Hoskinson provides a GUID generator web service you can use for this.
- Our extension is now complete. But to make it easy for users to install the extension, we should package it up in such a way that Firefox can install it easily. To do this, we bundle our files up into an XPI (Cross platform installer) file. An XPI file is just a normal zip file with files organized in a special way. Our XPI file should contain the following structure...(more here)
The answer to my question is in the second bullet point. I want to modify the extension and the extension is in the form of an XPI file. The solution is then to simply open the .xpi file in WinZip, modify the source, use the firefox interface to install the extension. That was a breeze.
The structure of the xpi file is pretty simple. See manual for how to deal with Jar files.t@TaylorLaptop tmp$ unzip jumplink-v1.3.1.xpi
Archive: jumplink-v1.3.1.xpi
inflating: install.js
inflating: install.rdf
extracting: TODO
inflating: CHANGELOG
creating: chrome/
inflating: chrome/jumplink.jar
t@TaylorLaptop tmp$ find .
.
./CHANGELOG
./chrome
./chrome/jumplink.jar
./install.js
./install.rdf
./jumplink-v1.3.1.xpi
./TODO
t@TaylorLaptop tmp$
Then to deal with the jar files. You create a jar file like this: "jar cf target.jar {file1[,file2,...]|*}". You expand the jar file like this: "jar xf target.jar"
My next goal is to combine the functionaliity of the JumpLink package with the Linky package.
|
07:07 pm
[Link] |
XQuery with nux Downloading nux-1.0rc3.zip to test out XQueries with java. This package has quite a few dependencies that are bleeding edge stuff
- xom 1.0
- saxon 8.2
- dom3 api
- jaxme api 0.3.1
- gnu getopt 1.0.7
Here is what I got:
- xom 1.0
- saxon 8.3
- dom3 api
- is this Xerces-J? Assuming so...downloading xerces-2_6_2
- here
- ws-jaxme-current-bin.tar.gz
- java-getopt-1.0.10.jar
Here is what I've been looking for: how to query Nasty html.
|
04:42 pm
[Link] |
Web Scraping Proxy Update Post!
I wrote the author to ask him how I can download the Wsp and got an immediate response. The username and password are available when after you've read the license agreement. Here is where to read the license ageement. http://www.research.att.com/~hpk/wsp/CPL.html If that link goes away then use this: When prompted for license agreement authorization use this User Name:"I accept www.opensource.org/licenses/cpl" and this Password:"." (one period, both w/o quotes).
I downloaded and ran wsp.pl on the windows box, but wsp complained that NET/SSLeay was not installed. I tried using ppm to install it, but it seems that ActiveState doesn't support this module. I made sure that openssl was installed in cygwin, but I simply can't find NET-SSLeay.
Original Post:
AT&T Labs Research - Web Scraping Proxy I tried installing this toolThe Web Scraping Proxy (WSP) solves this problem by monitoring the flow of information between the browser and the Web site and emitting Perl LWP code fragments that can be used to write the Web Scraping program. A developer would use the WSP by browsing the site once with a browser that accesses the WSP as a proxy server. He then uses the emitted code as a template to build a Perl program that accesses the site. But after installing the dependency modules I then tried to download the wsp.pl program and its password protected. Strange!
|
01:25 pm
[Link] |
AT&T Research Projects AT&T Research Projects: I can never remember the name of this site. They have some cool projects. One I particularly liked was graphviz. Keywords: graph, svg, dot, dotviz, dotty, visualize graphs
|
12:06 am
[Link] |
Syntax Highlighting HTML I've been looking for a tool for quite some time that will enable me to document code. The trouble is that I was not using the right google keywords. I kept getting links to IDEs that editted html. Here are two that I just found Tidy reminder. I can't ever seem to remember the arguments to Tidy that makes looking at the structure of XML/HTML easy. Here it is:
tidy \
-i \
-w 100 \
-q \
--error-file test10/error.log \
--output-file test10/1.html \
--indent-attributes true \
--vertical-space true \
--break-before-br true \
--quote-marks false \
--quote-ampersand false test10/1.txt
|
10:46 pm
[Link] | OK, so it looks like MSDE days are over! It seem that SQL Server 2005 Express is taking it's place. SQL Server Express supports XQuery. I just downloaded and ran Microsoft SQL Server 2005 Beta 2 Setup, but it didn't install properly:SQL Server 2005 Beta 2 Setup has detected incompatible beta components from Visual Studio or SQL Server. To proceed, use Windows Add or Remove Programs to remove previous SQL Server Yukon components, SQL Server Support Files, and Common Language Runtime (CLR) components, and then run SQL Server 2005 Beta 2 Setup again. For detailed instructions on uninstalling SQL Server builds, see the SQL Server 2005 Beta 2 readme file.For help, click here.
This might be due to the fact that I have Visual Studio.NET 2005 Beta 1 installed. Oops.
|
10:23 pm
[Link] |
XQuery FLWR Expressions So that's what FLWR means...FLWR Expressions While simple XPath expressions are fine and good, the real power of XQuery shines through with FLWR expressions. FLWR stands for For-Let-Where-Return, and is pronounced "flower". The FLWR expression is akin to SQL's SELECT query; it allows for XML data to be queried with conditional statements, and then returns a set of XML elements as a result. Quoted from here.
|
11:22 am
[Link] |
.NET doing XQueries Querying XML Data with XQuery has some information about using XQuery with .NET 2.0. I found the article from here.
I spent the morning working with the contents of XQueryStuff.zip. This seems to be what MS was working on with XQuery, but then abandoned...or something. I'm not clear on that, but here is talk of it. Anyways, it doesn't look good. It seems that a lot of functionality is left out.
I downloaded, unpacked and ran some of the XQueryStuff. Bad news. Most of the functions throw exceptions reporting that they're not yet implemented. Why waste my time on the .NET stuff right now when it's not fully implmented. I'm going into the other implmetations.
Here's a useful list of implementations of Xpath 2.0 and XQuery. I don't see .NET listed despite talk of it on Scott's page. Wow! I just found that Saxon is being ported to .NET! The project is not listed under the above implmentations link, but is called Saxon.NET. Saxon.NET is still in its pre-alpha stage.
Can you use XQuery in .NET to modify this XML? If so, how?
1 <?xml version="1.0" encoding="utf-8" ?>
2 <config>
3 <engine>
4 <SID>1</SID>
5 <Sname>Adnan</Sname>
6 <Sdefault>0</Sdefault>
7 <Schecked>1</Schecked>
8 </engine>
9 <engine>
10 <SID>2</SID>
11 <Sname>Google</Sname>
12 <Sdefault>1</Sdefault>
13 <Schecked>0</Schecked>
14 </engine>
15 </config>
This is of course a two part question. One, does .NET support XQuery and two, what is the XQuery syntax that would allow you to modify the node with SID equal to 1? As of today, .NET 2.0 doesn't yet support XQuery so you will have to rely on either another language like Java or Python or you could stick with .NET, but use XPath and XSL transforms to find and transform respectively. .NET has support for XPath 1 and XSL 1. The current W3C recommendations are out for versions 2.0 of each.
I hear that a big benefit of XQuery is that it will make transforms a whole lot easier than XSL.
|
10:48 am
[Link] |
.NET XSL Transforms I played around with .NET running XSL Tranforms based on an article from XML.com today.
The artilce, though well written, is based on depreciated code. I abandoned the article pretty quickly, but I do plan to continue working with .NET XSL transformations. I'll have to search for updated articles on XML.com.
Here's the output to running the code in the article:
------ Rebuild All started: Project: CSharpExamples, Configuration: Debug .NET ------
Preparing resources...
Updating references...
Performing main compilation...
c:\csharpexamples\streamsandxslt.cs(133,4): warning CS0618:
'System.Xml.Xsl.XslTransform.Transform(System.Xml.XPath.IXPathNavigable,
System.Xml.Xsl.XsltArgumentList, System.Xml.XmlWriter)'
is obsolete: 'You should pass XmlResolver to Transform() method'
c:\csharpexamples\streamsandxslt.cs(161,4): warning CS0618:
'System.Xml.Xsl.XslTransform.Transform(System.Xml.XPath.IXPathNavigable,
System.Xml.Xsl.XsltArgumentList, System.IO.Stream)'
is obsolete: 'You should pass XmlResolver to Transform() method'
c:\csharpexamples\streamsandxslt.cs(199,4): warning CS0618:
'System.Xml.Xsl.XslTransform.Transform(System.Xml.XPath.IXPathNavigable,
System.Xml.Xsl.XsltArgumentList, System.Xml.XmlWriter)'
is obsolete: 'You should pass XmlResolver to Transform() method'
Build complete -- 0 errors, 3 warnings
Building satellite assemblies...
---------------------- Done ----------------------
Rebuild All: 1 succeeded, 0 failed, 0 skippedBeing new to doing XSL Transformations I didn't want to play around with outdated code.
|
06:40 pm
[Link] |
xmldiff Logilab.org - Quick start provides a python XML differencing tool. This might be a good tool to quickly check. This tool provides XUpdate output. Cool!
diffxml - XML Diff and Patch Utilities has another tool for differencing xml, but it doesn't look well supported. Bagit!
XmlDiff posted in GotDotNet will be a helpful tool to double check my work with XMLUnit. The houtput of this tools is in Microsoft XML Diff Language v.1.0 Beta. Don't forget to check out other XML Tools from GotDotNet.
TODO: Continue where I left off on laptop in c:/Documents and Settings/t/My Documents/eclipse/workspace/UseXMLUnit1/data/test1/
|
12:18 pm
[Link] |
Building a kernel...first time Fedora Quick links I'm always forgetting how to check gpg signatures...The Linux Kernel Archives PGP Signature
I'm stuck building the kernel in preparation for Enterprise Volume Management. Part of the problem is that I'm working over an rdp connection and linux is running in Vmware. I'm thinking that I should simply install a fresh copy of linux on a spare box I have at home and start fresh.
I've spent some time already in downloading all the packages (EVMS: Installation: Downloading Packages) though so be sure to fire up Samba in an X session and copy all the *.tar.gz's to my windows box so I don't have to redownload all those packages.
When I'm ready continue where I left off with Switch to using EVMS for ALL your volumes and partitions. If none of the kernel's built-in partitions are mounted, then there won't be any conflicts when DM tries to claim the disks. This is, of course, the preferred solution, but also requires some extra work on your part to convert to mounting your root filesystem using an EVMS volume. Please see the Root-Volume section of this install guide as well as the Converting-To-EVMS guide for more details on this option.
here.
|
02:33 am
[Link] | Being new to Eclipse, it took me quite a while to create an Eclipse project that made use of Junit to test XMLUnit. I've gotten Junits tests to all pass. See project eclipse/workspace/TestXMLUnit for the project.
Now that I've verified that XMLUnit runs as expected, I want to create my own project that will use XMLUnits methods.
|
12:57 am
[Link] | I've forgotten all about the need for a LVM GUI. Try installing this in VMWare Enterprise Volume Management System. LVM HowTo
In order to get EVM up I'll need to be familiar with building a kernel. Digital Hermit - Kernel-Build-HOWTO
|
08:14 pm
[Link] |
TagSoup parses html "Setting SAX Properties
If you need to control the specific parser class used, you can create a SAX XMLReader in the usual way, and then pass it to the Builder constructor. For instance, this would allow you to use John Cowan’s TagSoup to parse an HTML document into XOM:" http://www.cafeconleche.org/XOM/tutorial.xhtml
TagSoup is here.
|
07:51 pm
[Link] |
RDP, Vmware, Fedora Core 3, Vncserver Displaying a Fedora X session running in Vmware over RDP is painfully slow. I've found that running a vncserver in Fedora and connecting with a vnc client is way faster.
Telling the router to forward ports 5800 5801 5900 5901 to the Fedora guest OS running in Vmware is required.
Next maybe we can speed things up even more using Vnc compression with FreeNX
|
03:29 am
[Link] | TODO
- Check Plone for a blogging/cms solution http://plone.org/newsitems
- Inspect Bricolage
- The INSTALL doc for this LAMP project is very thorough
|
03:10 am
[Link] |
XOM Sample Code Why can't everyone make it this easy to understand APIs? Taken from XOM - a tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order.
There is nothing better than an example. Remember that all this XML parsing can be done on HTML files to find data. Tidy can be used to convert a potentially bad html file to well-formed xml.
Check out this java that extracts data from xml: ExampleExtractor. << from here << from here.
Its doesn't seem right however that we're wrapping up all this java around some random xml schema. Saxon has implemented Xquery. Can't we use Xquery and get rid of all that java code?
I've been interested in finding functions that could compare XML nodes. I've run across XMLUnit and EXSLT but not yet had a chance to implement tests. Check out "has-same-nodes(node-set-1, node-set-2)" and the EXSLT set functions. XPath 2.0 does in fact provide set difference and intersection functions:Set difference and intersection
These operators are new in XPath 2.0.
The expression E1 except E2 selects all nodes that are in E1 unless they are also in E2. Both expressions must return sequences of nodes. The results are returned in document order. For example, @* except @note returns all attributes except the note attribute.
The expression E1 intersect E2 selects all nodes that are in both E1 and E2. Both expressions must return sequences of nodes. The results are returned in document order. For example, preceding::fig intersect ancestor::chapter//fig returns all preceding fig elements within the current chapter.
as stated here.
Question: how well do these work? Answer: ...
Reading the XPath spec is like reading Chinese. Reading similar XPath documentation is quite pleasant. Saxon has implmented Xpath 2. Microsoft's System.XML will not until Whidbey arrives (or is it .NET 2.0?).
|