{"id":575,"date":"2019-04-18T18:00:28","date_gmt":"2019-04-18T10:00:28","guid":{"rendered":"http:\/\/localhost\/?p=575"},"modified":"2019-07-08T21:02:22","modified_gmt":"2019-07-08T13:02:22","slug":"python-pandas-key-words-for-etl","status":"publish","type":"post","link":"http:\/\/www.ahomer.cn\/?p=575","title":{"rendered":"python pandas key words for etl"},"content":{"rendered":"<p>ETL<br \/>\nExtract\u3001Transform\u3001Loading<\/p>\n<h3>Extract data from different sources<\/h3>\n<pre><code>* csv files\n* json files\n* APIs\npd.read_csv('..\/data\/population_data.csv',skiprows=4)\ndf.isnull().sum()\ndf.drop(['col'],axis=1)\npd.read_json('population_data.json',orient='records')<\/code><\/pre>\n<h3>Transform data<\/h3>\n<pre><code>* combining data from different sources\npd.concat([df1,df2])\n\n* data cleaning\npd.drop_duplicates()\ndf.apply()\n\n* data types\ndf['totalamt'].sum()\npd.to_numeric()\n\n* parsing dates\npd.to_datetime()\n\n* file encodings\npip install chardet\nchardet.detect()\n\n* missing data\ndf.dropna()\ndf.apply()\n\n* duplicate data\ndf['col'].nunique()\ndf[df['col'].str.contains('xx')]\n\n* dummy variables\npd.get_dummies()\n\n* remove outliers\ndf['col'].quantile()\nboxplot\nlinearRegression\n\n* scaling features\nnormalization\n\n* engineering features\ncol1 *\/-+** col2<\/code><\/pre>\n<h3>Load<\/h3>\n<pre><code>* send the transformed data to a database\n* ETL Pipeline\n* code an ETL pipeline\ndf.to_json()\ndf.to_csv()\ndf.to_sql()<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>ETL Extract\u3001Transform\u3001Loading Extract data from differe [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[35,34],"class_list":["post-575","post","type-post","status-publish","format-standard","hentry","category-program","tag-airbnb","tag-pandas"],"_links":{"self":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=575"}],"version-history":[{"count":4,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/575\/revisions"}],"predecessor-version":[{"id":5376,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=\/wp\/v2\/posts\/575\/revisions\/5376"}],"wp:attachment":[{"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=575"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ahomer.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}