{"id":563,"date":"2019-04-12T14:36:34","date_gmt":"2019-04-12T06:36:34","guid":{"rendered":"http:\/\/localhost\/?p=563"},"modified":"2019-06-17T19:53:35","modified_gmt":"2019-06-17T11:53:35","slug":"exploration-on-airbnb-boston-data","status":"publish","type":"post","link":"http:\/\/www.ahomer.cn\/?p=563","title":{"rendered":"Exploration on Airbnb Boston data"},"content":{"rendered":"<p>\u9879\u76ee\u8be6\u89c1\uff1a<\/p>\n<pre><code>https:\/\/github.com\/ahomer\/airbnb_bst<\/code><\/pre>\n<h3>Business and Data Understanding<\/h3>\n<p>As talk on Airbnb kaggle data website, the following Airbnb activity is included in this Boston dataset: <\/p>\n<ul>\n<li>Calendar, including listing id and the price and availability for that day<\/li>\n<li>Listings, including full descriptions and average review score <\/li>\n<li>Reviews, including unique id for each reviewer and detailed comments <\/li>\n<\/ul>\n<p>Let us take a look on these three csv files.<\/p>\n<h4>Calendar<\/h4>\n<p>It shows that the hosts are not avaible everyday and price may be changed at the busiest seasons. <\/p>\n<ul>\n<li>What is the most expensive season in Boston? <\/li>\n<li>Which hosts are the most favorite\uff1f<\/li>\n<\/ul>\n<h4>Listings<\/h4>\n<p>Summary information on listing in Boston.It contains location, host information, cleaning and guest fees, amenities and so on.<br \/>\nWe may find some import factors on price.<\/p>\n<ul>\n<li>What are the top factors strong relation to price?<\/li>\n<li>How to predict price\uff1f<\/li>\n<\/ul>\n<h4>Reviews<\/h4>\n<p>We can find many interesting opinions,sush as <\/p>\n<ul>\n<li>What are the most attractive facilities? It is big bed, large room or location?<\/li>\n<li>What will lead to bad impression\uff1f<\/li>\n<\/ul>\n<h3>Data preparing<\/h3>\n<h4>Clean Calendar<\/h4>\n<p><img decoding=\"async\" src=\"http:\/\/101.32.245.93\/wp-content\/uploads\/2019\/04\/output_16_1.png\" alt=\"png\" \/><\/p>\n<ul>\n<li>\n<p>So we can see the most expensive season is from August to November\uff0cespecial September and October. <\/p>\n<\/li>\n<li>\n<p>You can get a lowest price if you go to Boston at February.<\/p>\n<\/li>\n<li>\n<p>The most expensive listing_id is 447826.Go to Boston and experience one night.<\/p>\n<\/li>\n<\/ul>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n<\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th><\/th>\n<th>301<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>id<\/th>\n<td>447826<\/td>\n<\/tr>\n<tr>\n<th>listing_url<\/th>\n<td>https:\/\/www.airbnb.com\/rooms\/447826<\/td>\n<\/tr>\n<tr>\n<th>scrape_id<\/th>\n<td>20160906204935<\/td>\n<\/tr>\n<tr>\n<th>host_url<\/th>\n<td>https:\/\/www.airbnb.com\/users\/show\/2053557<\/td>\n<\/tr>\n<tr>\n<th>name<\/th>\n<td>Sweet Little House in JP, Boston<\/td>\n<\/tr>\n<tr>\n<th>bedrooms<\/th>\n<td>1<\/td>\n<\/tr>\n<tr>\n<th>accommodates<\/th>\n<td>2<\/td>\n<\/tr>\n<tr>\n<th>bathrooms<\/th>\n<td>1<\/td>\n<\/tr>\n<tr>\n<th>amenities<\/th>\n<td>{TV,\"Cable TV\",Internet,\"Wireless Internet\",Ki...<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p><img decoding=\"async\" src=\"http:\/\/101.32.245.93\/wp-content\/uploads\/2019\/04\/447826.png\" alt=\"Sweet Little House in JP, Boston\" \/><\/p>\n<h4>Clean Listings<\/h4>\n<p>Let us calculate the mean\/std of 'Price'.<\/p>\n<ul>\n<li>Assuming that prices obey normal distribution<\/li>\n<li>The price should be between mean-2<em>std~mean+2<\/em>std<\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"http:\/\/101.32.245.93\/wp-content\/uploads\/2019\/04\/output_27_1.png\" alt=\"png\" \/><\/p>\n<h4>Clean Reviews<\/h4>\n<p>Review the reviews.csv file,you will find there are different languages.We just need to keep the english comment.<br \/>\nWe need a lib 'langdetect'.<\/p>\n<h3>Modeling and  evaluation<\/h3>\n<p>Let's try to predict the price based on the columns in the listing we selected.<\/p>\n<ul>\n<li>What are the top factors strong relation to price?<\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"http:\/\/101.32.245.93\/wp-content\/uploads\/2019\/04\/output_39_1.png\" alt=\"png\" \/><\/p>\n<p>Top 6 factors strong relation to price:<\/p>\n<ul>\n<li>bedrooms<\/li>\n<li>room type : Private room<\/li>\n<li>number of reviews<\/li>\n<li>accommodates<\/li>\n<li>bathrooms<\/li>\n<li>review scores rating<\/li>\n<\/ul>\n<h3>Deployment<\/h3>\n<p>Mostly,the model will be deplyed on product environment based on a RPC server or http server.<br \/>\nYou can deploy the model with Tornado(python web framework).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9879\u76ee\u8be6\u89c1\uff1a https:\/\/github.com\/ahomer\/airbnb_bst Business and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-563","post","type-post","status-publish","format-standard","hentry","category-program"],"_links":{"self":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=563"}],"version-history":[{"count":7,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/563\/revisions"}],"predecessor-version":[{"id":611,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/563\/revisions\/611"}],"wp:attachment":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=563"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}