ufXtract is a new microformats parser. It has been built from the ground up to take configuration objects which allow the parsing of different microformats or POSH patterns. The component also contains an extendable output format option.
I am building a test suite to fine tune the components compliancy. Most of the compound microformats apart hCard have some issues at the moment. If you have any comments or want to point out issues please email - info.backnetwork.com
.Example urls
Updated
29-Nov-07
Added support for pages encoded with ISO-8859-1.
24-Nov-07
Some internal changes which fix given-name in hcards and incorrect Url paths from page which have been redirected using 302
16-Nov-07
I have added error handling and number of other small improvements such as the reporting feature. The major fixes are that the component will now successfully handle multiple header includes often used in hCalendar. The hAtom output has been addressed, but as yet not fully tested.
Example Xml output
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<ufxtract>
<vcard>
<fn>Gareth Rushgrove>/fn>
<n>
<given-name>Gareth</given-name>
<family-name>Rushgrove</family-name>
</n>
<url>http://morethanseven.net/</url>
</vcard>
<report>
<url status="200" millisec="109">http://lab.backnetwork.com/examples/1/page1.htm</url>
<found>1</found>
</report>
</ufxtract>
Example Xml error
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<ufxtract>
<errors>
<error>
<msg>The remote name could not be resolved: 'htp'</msg>
<url>http://htp://localhost/BacknetworkLab/examples/1/page1.htm</url>
</error>
</errors>
</ufxtract>