Mastering the Data Science Workflow

by Fer Troulik on Unsplash

The collection stage involves acquiring the necessary in order to perform a meaningful analysis based upon accurate information.

Techniques

Data Requirements
Define which data is needed to properly approach the (e.g. format, variables, range, granularity)

Data Sources
Find reliable and relevant data sources (e.g. databases, APIs, files, readings)


Secure necessary to the data (e.g. email/password, OAuth, API key, robots.txt)

Collection
Acquire the data using appropriate methods (e.g. SQL queries, API calls, web scraping, manual data entry)

Data Management
Handle the data in accordance with best practices (e.g. data quality, data , )

Source link