<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-1107147718367558732.post6734457831173243809..comments</id><updated>2011-07-23T12:01:58.005-07:00</updated><category term='parallel computing'/><category term='courses'/><category term='ai'/><category term='causality'/><category term='basketball'/><category term='dannys_predictions'/><category term='books'/><category term='data structure'/><category term='challenge problem'/><category term='lawyers'/><category term='toronto'/><category term='methodology'/><category term='art'/><category term='analytics'/><category term='ranking'/><category term='algorithms'/><category term='uncertainty'/><category term='memorization'/><category term='hadoop'/><category term='classification'/><category term='linear_programming'/><category term='psychology'/><category term='online marketing'/><category term='pain machine'/><category term='taxes'/><category term='netflix'/><category term='data analysis'/><category term='scipy'/><category term='schools'/><category term='sports'/><category term='scrabble'/><category term='probability'/><category term='c++'/><category term='talent'/><category term='dynamic algorithms'/><category term='computation'/><category term='displaying code'/><category term='san francisco'/><category term='career choice'/><category term='success'/><category term='APIs'/><category term='incentives'/><category term='public_relations'/><category term='rationality'/><category term='controversies'/><category term='social networks'/><category term='summer school'/><category term='nearest neighbors'/><category term='buildings'/><category term='theoretical computer science'/><category term='march_madness'/><category term='conversation starters'/><category term='the_webs'/><category term='statistics'/><category term='mcmc'/><category term='chess'/><category term='conferences'/><category term='MAP_inference'/><category term='google'/><category term='randomness'/><category term='auctions'/><category term='nutsandbolts'/><category term='structured prediction'/><category term='advertising'/><category term='military'/><category term='the real world'/><category term='beginners'/><category term='data visualization'/><category term='image_processing'/><category term='python'/><category term='biology'/><category term='lake oswego rental'/><category term='public transportation'/><category term='scott turner'/><category term='max_product_belief_propagation'/><category term='belief propagation'/><category term='code'/><category term='football'/><category term='horse racing'/><category term='learning'/><category term='artificial intelligence'/><category term='science'/><category term='linux'/><category term='computational complexity'/><category term='logistic regression'/><category term='protocol_buffers'/><category term='math'/><category term='recommendation systems'/><category term='emacs'/><category term='research'/><category term='robotics'/><category term='bayesian models'/><category term='programming'/><category term='politics'/><category term='sympy'/><category term='videos'/><category term='graduate school'/><category term='slice sampling'/><category term='matrix factorization'/><category term='graphical_models'/><category term='distributed computing'/><category term='databases'/><category term='seo'/><category term='economics'/><category term='blogger'/><category term='computer vision'/><category term='web2.0'/><category term='constraint_satisfaction'/><category term='george'/><category term='web_security'/><category term='twitter'/><category term='regularization'/><category term='history'/><category term='gambling'/><category term='machine learning'/><category term='data'/><category term='markets'/><category term='sociology'/><category term='energy use'/><category term='medicine'/><title type='text'>Comments on This Number Crunching Life: Python nearest neighbors binary classifier</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.smellthedata.com/feeds/6734457831173243809/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html'/><author><name>Danny Tarlow</name><uri>http://www.blogger.com/profile/14670021337844708633</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='25' src='http://1.bp.blogspot.com/_cFAlw8-Y0gE/TRrm8pdSK1I/AAAAAAAAA5o/S8w-VVzdc1A/S220/mehak.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>6</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-6780960049479976066</id><published>2011-07-23T12:01:58.005-07:00</published><updated>2011-07-23T12:01:58.005-07:00</updated><title type='text'>Mount and Arya [1] is another popular library for ...</title><summary type='text'>Mount and Arya [1] is another popular library for doing kNN queries in high dimensions. A common simple approach is if we are finding the nearest feature under L2 distance, we can use PCA to reduce dimensionality of features (sometimes one captures X%, e.g. 95% of the variance, and automatically choose the dimension parameter), and thus at most 100-X% error is introduced in the kNN query. </summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/6780960049479976066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/6780960049479976066'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1311447718005#c6780960049479976066' title=''/><author><name>Connelly Barnes</name><uri>http://www.blogger.com/profile/02568908952592933174</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-92922793'/></entry><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-8211979627290518028</id><published>2009-09-13T21:42:06.039-07:00</published><updated>2009-09-13T21:42:06.039-07:00</updated><title type='text'>&lt;a href="http://people.cs.ubc.ca/~mariusm/index.ph...</title><summary type='text'>&lt;a href="http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN" rel="nofollow"&gt;FLANN (Fast Library for Approximate Nearest Neighbors)&lt;/a&gt; by Marius Muja at UBC has Python bindings. It can automatically pick hyperparameters and choose between KD-tree and K-means in a data-dependent way.</summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8211979627290518028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8211979627290518028'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1252903326039#c8211979627290518028' title=''/><author><name>Joseph Turian</name><uri>http://www.blogger.com/profile/06249878639857416906</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_Hao3ATKBybM/Sq3DboObu2I/AAAAAAAAAFA/DrFqsLot1oI/S220/turian-smoking.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-596211626'/></entry><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-764527148199897562</id><published>2009-06-08T09:24:21.288-07:00</published><updated>2009-06-08T09:24:21.288-07:00</updated><title type='text'>All are good points, Will.  Thanks. 

I will defin...</title><summary type='text'>All are good points, Will.  Thanks. &lt;br /&gt;&lt;br /&gt;I will definitely address #1 at some point in the near future -- talking about cross validation seems pretty important to round these two posts out.&lt;br /&gt;&lt;br /&gt;#3 could be interesting too -- perhaps boosting or L1 regularization would be worth mentioning at some point.</summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/764527148199897562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/764527148199897562'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1244478261288#c764527148199897562' title=''/><author><name>Danny Tarlow</name><uri>http://www.blogger.com/profile/14670021337844708633</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='14202909819301920933'/><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/_cFAlw8-Y0gE/SWMB_zIPlRI/AAAAAAAAAVA/jNuwRPrtAW0/S220/Photo+28.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-675792927'/></entry><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-8958760544179668938</id><published>2009-06-08T05:58:02.046-07:00</published><updated>2009-06-08T05:58:02.046-07:00</updated><title type='text'>Yes, k-NN can be very useful.  I&amp;#39;d make the fo...</title><summary type='text'>Yes, k-NN can be very useful.  I&amp;#39;d make the following points:&lt;br /&gt;&lt;br /&gt;1. Choosing an appropriate value for &lt;i&gt;k&lt;/i&gt; is critical.  Fortunately, this is easily accomplished by checking a variety of values and selecting the one which performs best on validation data.&lt;br /&gt;&lt;br /&gt;2. Implementation can be a sticky.  Many systems easily handle equations and logic for things like discriminant </summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8958760544179668938'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8958760544179668938'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1244465882046#c8958760544179668938' title=''/><author><name>Will Dwinnell</name><uri>http://www.blogger.com/profile/03379859054257561952</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_aTiM0lwqgJ4/SSJeGiSd4VI/AAAAAAAAAB4/i588ZVHvEz4/s1600-R/n509614243_406121_6479.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-8998867'/></entry><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-2873318805496846742</id><published>2009-06-08T02:25:50.490-07:00</published><updated>2009-06-08T02:25:50.490-07:00</updated><title type='text'>If you&amp;#39;re working with data that has high dime...</title><summary type='text'>If you&amp;#39;re working with data that has high dimension but not necessarily high &lt;i&gt;intrinsic dimension&lt;/i&gt; (e.g., it lies on some relatively low dimensional manifold), I find this to be a nice line of attack (I saw the poster a while back but have never done anything with it myself):&lt;br /&gt;&lt;a href="http://books.nips.cc/papers/files/nips20/NIPS2007_0133.pdf" rel="nofollow"&gt;http://books.nips.cc/</summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/2873318805496846742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/2873318805496846742'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1244453150490#c2873318805496846742' title=''/><author><name>Danny Tarlow</name><uri>http://www.blogger.com/profile/14670021337844708633</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='14202909819301920933'/><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/_cFAlw8-Y0gE/SWMB_zIPlRI/AAAAAAAAAVA/jNuwRPrtAW0/S220/Photo+28.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-675792927'/></entry><entry><id>tag:blogger.com,1999:blog-1107147718367558732.post-8843492282927967455</id><published>2009-06-07T23:50:47.823-07:00</published><updated>2009-06-07T23:50:47.823-07:00</updated><title type='text'>Wow, I had no idea scipy had a kd tree in it.  Awe...</title><summary type='text'>Wow, I had no idea scipy had a kd tree in it.  Awesome!  Too bad kd trees aren&amp;#39;t really useful in high dimensions.  And yes I am quite happy you used a few of my design suggestions.&lt;br /&gt;&lt;br /&gt;My own nearest neighbor code isn&amp;#39;t too useful at the moment because it doesn&amp;#39;t do prediction, it just finds nearest neighbors.  I would have to make a KNN classifier module that used my core </summary><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8843492282927967455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1107147718367558732/6734457831173243809/comments/default/8843492282927967455'/><link rel='alternate' type='text/html' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html?showComment=1244443847823#c8843492282927967455' title=''/><author><name>George</name><uri>http://www.blogger.com/profile/12790096318551866567</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.smellthedata.com/2009/06/python-nearest-neighbors-binary.html' ref='tag:blogger.com,1999:blog-1107147718367558732.post-6734457831173243809' source='http://www.blogger.com/feeds/1107147718367558732/posts/default/6734457831173243809' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-899028934'/></entry></feed>
