Friday, April 11, 2014

Week 13 Reading

Ch.13-14
To capture the generality and scope of the problem space to which stand- ing queries belong, chapter 13 introduces the general notion of a classification problem. Given a set of classes, we seek to determine which class(es) a given object belongs to. Classification using standing queries is also called routing or filteringand was discussed. Often, a class is a more general subject area. Such more general classes are usually referred to as topics, and the classification task is then called text classification, text categorization, topic classification, or topic spotting.
The classification algorithms are very similar to the algorithms we use for finding document similarity.
There are many classification tasks, in particular the type of text classification. Documents of the two classes therefore form distinct contiguous regions and we can draw boundaries that separate them and classify new documents. How exactly this is done is the topic of this chapter.
Two vector space classification methods Rocchio and kNN were introduced as well.
A large number of text classifiers can be viewed as linear classifiers – classifiers that classify based on a simple linear combination of the features. Such classifiers partition the space of features into regions separated by linear decision hyperplanes.

Ch.16-17
Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other. In other words, documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters.
Clustering is the most common form of unsupervised learning. No super- vision means that there is no human expert who has assigned documents to classes. In clustering, it is the distribution and makeup of the data that will determine cluster membership.

Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. Hierarchical clustering creates a hierarchy of clusters.

No comments:

Post a Comment