mercredi 15 octobre 2014

This is an API you can use to resolve a category of a web page. It uses the Dmoz data to provide this information.

What's this?

This is an API you can use to resolve a category of a web page. It uses the Dmoz data to provide this information.
  • Number of categorized web pages: 4439365
  • Number of categories: 24566
  • Dmoz data from: 19.11.2009

How can I use it?

Send a request to url
http://dmoz-api.appspot.com/category?url=
You will get the category back as a text. No, no JSON or XML. Plain text. Although you might also get nothing in response — if the web page was not categorized yet. It doesn't matter if you use http prefix or not, sme.sk will give you the same result as http://sme.sk

Example — page with a cagetory

http://dmoz-api.appspot.com/category?url=www.sme.sk

Example — page without a cagetory

http://dmoz-api.appspot.com/category?url=www.nocategory.bbq

What is Dmoz?

Dmoz is an open directory project which list various web pages and groups them into categories. Its primary purpose is to allow you to find web pages based on your interests or needs — you just browse the category that you are interested in. It might also occur to you, that it would be useful if you could take a web page, query some Dmoz API and get the page category. Unfortunately, this is not possible. Oh wait, it is!

Why?

The Dmoz data is publicly available as an RDF file — a long, funny, obscure and full-of-duplicates 2 gigabytes large XML file. Even after parsing and cherry-picking the actual interesting content it is several hundreds megabytes (around 500 actually) long, so it's really impractical to bundle it in your application

Aucun commentaire:

Enregistrer un commentaire