Understanding Database Fundamentals: The Foundation of Data Science
Written on
Chapter 1: The Data Hierarchy
In a manner akin to Maslow's hierarchy of needs, which categorizes human requirements based on priority, we can establish a similar hierarchy for data in today's information-centric society. Before delving into advanced concepts like artificial intelligence and deep learning, it is essential to ensure that the foundational levels of data handling are properly addressed.
This foundational phase typically begins with the acquisition, movement, and storage of data. Without effective methods to capture and retain data, it becomes impractical to proceed to subsequent stages such as analytics and predictive modeling.
To provide clarity, this discussion will primarily focus on the initial two tiers of this hierarchy, emphasizing the significance of databases and their fundamental contributions to data science.
Section 1.1: Role of Data Engineers
Data and infrastructure engineers are primarily responsible for the following tasks:
- Capturing and storing data.
- Establishing relationships among various data segments.
- Filtering data to highlight valuable insights.
- Searching data to retrieve relevant records.
- Performing CRUD (Create, Read, Update, Delete) operations to manage data effectively.
Subsection 1.1.1: Understanding Databases
A database can be understood as an electronic repository where data is systematically organized based on specific attributes (such as gender, age, height, etc.) and stored in entities that represent these attributes.
Entities are typically represented as tables containing rows and columns, with the data being relational in nature. Here are some commonly utilized types of databases:
Relational database / SQL database: Stores structured data in a tabular format.
Object-oriented database: Data is organized as objects rather than relations.
Graph database: Data is represented as nodes, with relationships known as edges.
Document database: Stores data in a semi-structured JSON format.
Databases can reside in the cloud or on dedicated local servers.
Section 1.2: Data Types in SQL
When designing a table for data storage, understanding the data types defined by Structured Query Language (SQL) is vital. Some commonly used data types include:
- INT, TINYINT, BIGINT, FLOAT, REAL: For numerical data.
- CHAR, VARCHAR: For character and string data.
- BOOL: For boolean values.
- DATE, TIME, TIMESTAMP: For date and time values.
Below is a demonstration of elements within a table, highlighting the need for a primary key, which uniquely identifies each row.
Chapter 2: Creating and Managing Databases
To illustrate how to create a database and a corresponding table in MySQL, consider the following commands:
CREATE DATABASE test; -- Initiate a new database
USE test; -- Select this database for use
CREATE TABLE customer_data (ID int, name varchar(50), age int); -- Create a table named customer_data with three columns
SHOW columns FROM customer_data; -- Display the columns of the newly created table
In our next installment, we will explore additional commands to effectively manage a database.
This video, titled "Introduction to Database Management Systems - Part 1," provides a comprehensive overview of database management principles, laying the groundwork for effective data handling.
The second video, "Introduction to Databases Part 1: The Table," dives into the fundamental structure of databases, focusing on the significance of tables in storing and organizing data.