Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use.
Data mining is the analysis step of the “knowledge discovery in databases” process or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser.
While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler.
It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Web scraping is used for contact scraping, and as a component of applications used for web indexing, web mining and data mining, online price change monitoring and price comparison, product review scraping (to watch the competition), gathering real estate listings, weather data monitoring, website change detection, research, tracking online presence and reputation, web mashup and, web data integration.
For this article, we are going to create a web scraping Python script using the Beautiful Soup 4 library.
Web scraping is an important technique for every Data Scientist or Machine Learning Engineer since it is a very efficient way to mine data. Data Mining is important because the quality of your data translates to the quality of your model.
It is very important to know how to mine the perfect data, because in the real-world problems you will almost never (99% of the time) get a well-organized dataset and you have to create that dataset yourself.
Web Scraping algorithm that finds the perfect job for you
Above is the algorithm we’ve implemented and it searches Google to find the perfect job based on the terms you provide. If you provide different terms it will search and scrape different content, so in order to scrape for jobs, you need to add that word.
Here is the result:
Conclusion
If you add different parameters you can make it search for job positions with a certain salary, experience range, education degree, etc. But we wanted to make it short and fast.
As we’ve said Web Scraping is an important technique for every Data Scientist or Machine Learning Engineer since it is a very efficient way to mine data. Data Mining is important because the quality of your data translates to the quality of your model.
It is very important to know how to mine the perfect data, because in the real-world problems you will almost never (99% of the time) get a well-organized dataset and you have to create that dataset yourself.
We are planning on doing more Data Mining articles in the future because it is very important for everything related to Data Analytics.
There are also tons of good books that you can check out like:
1. Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems) 3rd Edition
2. Introduction to Data Mining 1st Edition
3. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
4. The Hundred-Page Machine Learning Book
5. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Instagram, GitHub, and More 3rd Edition
If you liked this article, you might want to take a look at our articles about:
- FREE Computer Science Curriculum From The Best Universities and Companies In The World
- How To Become a Certified Data Scientist at Harvard University for FREE
- How to Gain a Computer Science Education from MIT University for FREE
- Top 10 Best FREE Artificial Intelligence Courses from Harvard, MIT, and Stanford
- Top 10 Best Artificial Intelligence YouTube Channels in 2020
Like with every post we do, we encourage you to continue learning, trying, and creating.