InSTEDD Summer 2008
Contents |
Developers
- Juan Pablo Mendoza
- Qianqian Lin
About InSTEDD
Innovative Support to Emergencies, Diseases and Disasters [or InSTEDD], is an organization that seeks to bring technologies together in an attempt to help in the early detection of various diseases and disasters. InSTEDD reaches out to various venues for ideas, then brings these ideas to those who need them. Where necessary technologies do not exist, InSTEDD works to create the technologies, sharing them with the public when they are stable and usable. InSTEDD takes part in the Open Source Movement, and releases it's technologies as free open-source code.
Objectives
Our main goal is to work on the Machine Learning project in InSTEDD. We want to contribute to teaching machines to detect the spread of disease as soon as possible. Computers are capable of analyzing big loads of data much faster than humans, and could therefore make significant improvement to the early detection of disease. In actuality, we are doing whatever needs to be done to make the jobs of the professional programmers of InSTEDD easier, while at the same time learning about Artificial Intelligence and Machine Learning.
Progress
Week 2
- Learned about Support Vector Machines(SVMs)
- Worked with a Tutorial on SVMs written by Dr. Morelli to get comfortable working with lib-svm
- Had conference call with Nicolas from InSTEDD to get directions on what to do along with necessary materials
Week 3
- Wrote code necessary to test Reuters Data:
- Parser to parse sgm code containing articles
- Dictionary class to find list of all words and frequencies of each
- InvFreq class to find inverse frequencies of each word
- Vectorizer class to create vectors (using above classes) for each article
- Started testing Reuters data:
- Training lib-svm with 1000 articles
- Testing lib-svm with 1000 different articles
- Analyzing test results and finding ways to refine code and tests
Week 4
- Continue to test lib-svm with Reuters data
