Brown CS Dept. - Inference of Patterns and Associations Using Dictionary Models

4pm Wednesday, April 23, 2008
Swig Boardroom
115 Waterman Street, Room 241
Providence, RI 02912
Google Map

Jun Liu, Department of Statistics, Harvard University

Pattern discovery is a ubiquitous problem in many disciplines. It is especially prominent in recent years due to our greatly improved data-generation capabilities in science and technologies. The method I present here is motivated by the "motif-finding" and "module-finding" problems in biology, i.e., to find sequence patterns (i.e., "words") that seem to appear more frequent than usual in a given set of text sequences (i.e., sentences) and to find which of these "words" tend to co-occur in a sentence. A challenge in the motif-finding problem is that there are no spacings and punctuations between the words and the dictionary of "words" is unknown to us. Existing methods are mostly "bottom-up" approaches, i.e., to build up the dictionary starting with single-letter words and then concatenate some existing words that appear to occur next to each other in sentences more frequently than chance. Our new approach is a top-down strategy, which uses a tree structure to represent the relationship among all possible existing words and uses the EM algorithm to estimate the usage frequency of each word. It automatically trims down most of the incorrect "words" by letting their usage frequencies converge to zero...

Related Items (2)

Companies / Organizations (2)

Is this your event?
Please help us to keep the information accurate and up-to-date.
Email us your edits, additions, and deletions. Thank you.