Identifying Consecutive Numbers and Dates Using SQL Techniques
Written on
Welcome!
In this article, we will delve into a core concept frequently seen in SQL challenges on platforms such as LeetCode—detecting consecutive numbers.
What You Will Learn
Here’s what this article will cover:
- A detailed guide on recognizing consecutive numbers.
- Python code samples utilizing the SQLite module for hands-on practice.
- Illustrations of this concept through various LeetCode problems:
- Consecutive Available Seats (Easy)
- Find the Start and End Number of Continuous Ranges (Medium)
- Active Users (Medium)
Let’s get started!
Understanding the Process of Finding Consecutive Numbers
We will break down the method step by step. Using a basic dataset (let’s refer to it as ids), depicted in the following graph, our objective is to highlight the consecutive numbers marked in orange.
First, we will create a Common Table Expression (CTE) that includes an additional column named orders, which contains consecutive integers starting from one. This new column must correspond to the order of the target column id.
SELECT
id,
ROW_NUMBER() OVER(ORDER BY id) AS 'orders'
FROM ids
Next, we will subtract the orders column from the id column. This subtraction helps us recognize discontinuities in the id, where the gaps between the id and orders will increase. This enables us to group the data effectively. If the differences remain constant, it indicates that the numbers are consecutive.
SELECT
id,
ROW_NUMBER() OVER(ORDER BY id) AS 'orders',
(id - ROW_NUMBER() OVER(ORDER BY id)) AS 'gaps'
FROM ids
To extract the consecutive numbers, we can simply use COUNT() with a window function, filtering out groups that consist of only one row.
WITH t1 AS (
SELECT
id,
(id - ROW_NUMBER() OVER(ORDER BY id)) AS 'gaps'
FROM ids
), t2 AS (
SELECT
id,
COUNT(id) OVER(PARTITION BY gaps) AS 'group_cnt'
FROM t1
)
SELECT id
FROM t2
WHERE group_cnt > 1
Alternatively, this issue can be approached using GROUP BY instead of window functions, though it may necessitate an additional join or filtering step. Personally, I find the window function method more straightforward.
WITH t1 AS (
SELECT
id,
(id - ROW_NUMBER() OVER(ORDER BY id)) AS 'gaps'
FROM ids
), t2 AS (
SELECT
gaps
FROM t1
GROUP BY gaps
HAVING COUNT(id) > 1
)
SELECT id
FROM t1
WHERE gaps IN (SELECT gaps FROM t2)
More Examples of Gaps Calculation
Let’s clarify the concept of gaps with a few straightforward examples.
- If all the ids in the dataset are in consecutive order, the differences between ids and orders will all be identical.
- Note: I’ve purposely started the IDs from 4 to demonstrate that even with a different starting point, the gaps between the columns remain consistent for grouping.
- Conversely, if none of the IDs are in consecutive order, the differences between the id and orders columns will continuously accumulate, resulting in each row forming its own group.
That’s essentially the approach! We can leverage the gaps information to execute a wide range of SQL queries.
Example Code in Python
Here is a code snippet based on the previous example using the built-in SQLite package in Python. Feel free to test it on your own machine.
Load the module & generate data:
import sqlite3
import pandas as pd
# Create a connection to the SQLite database
conn = sqlite3.connect('ids.db')
# Create a DataFrame with the id values
df = pd.DataFrame({'id': [1, 2, 3, 5, 7, 9, 10]})
# Write the DataFrame to a SQLite database table named "ids"
df.to_sql('ids', conn, if_exists='replace')
# Helper function to perform the query:
def run_query(query):
conn = sqlite3.connect('ids.db')
output = pd.read_sql_query(query, conn)
print(output.to_string(index=False)) # Display output
Run a query to obtain the gaps information:
query = """
SELECT
id,
ROW_NUMBER() OVER(ORDER BY id) AS 'orders',
(id - ROW_NUMBER() OVER(ORDER BY id)) AS 'gaps'
FROM ids
"""
run_query(query)
LeetCode Problems: From Easy to Hard
Now, let’s tackle some consecutive number challenges on LeetCode!
603. Consecutive Available Seats (Easy)
In LeetCode problem 603, our goal is to find consecutive free seats (denoted by ‘free=1’). To accomplish this, we need to take an additional step:
Filter the rows to keep only the free seats.
Identify the consecutive seat IDs within the remaining rows.
WITH t1 AS (
SELECT
seat_id,
(seat_id - ROW_NUMBER() OVER(ORDER BY seat_id)) AS 'gaps'
FROM Cinema
WHERE free = 1
), t2 AS (
SELECT
seat_id,
COUNT(seat_id) OVER(PARTITION BY gaps) AS 'group_cnt'
FROM t1
)
SELECT seat_id
FROM t2
WHERE group_cnt > 1
ORDER BY seat_id
1285. Find the Start and End Number of Continuous Ranges
The objective of this problem is to identify the starting and ending numbers of each consecutive range of numbers.
First, calculate the gaps to discover the consecutive numbers.
Then, utilize GROUP BY on gaps to find the minimum and maximum values within each consecutive number group.
WITH t1 AS (
SELECT
log_id,
(id - (ROW_NUMBER() OVER(ORDER BY log_id))) AS 'gaps'
FROM Logs
)
SELECT MIN(id) AS 'start_id', MAX(log_id) AS 'end_id'
FROM t1
GROUP BY gaps
ORDER BY 1
1454. Active Users (Medium)
The objective of this task is to identify users who have logged in for at least five consecutive days.
- Note: For simplicity, I’m focusing on illustrating the concept of consecutive numbers here. If your main interest is solving the problem, you can find my answer linked here.
To avoid double counting for the same day, we first filter the dataset to keep only unique rows.
SELECT DISTINCT * FROM Logins
To calculate the gaps as described previously, we assign consecutive numbers to each login date WITHIN each user group.
- Note: PARTITION BY id is crucial because we want to identify consecutive days for each user ID. This is different from previous examples where we were looking for consecutive numbers across the entire dataset.
ROW_NUMBER() OVER(PARTITION BY id ORDER BY login_date)
To calculate the gaps between login dates, we need to convert the dates into numeric values. One way is to convert dates into days, but this approach can be memory-intensive.
Instead, I’m using a more efficient method by calculating the date difference between each login date and the earliest login date in the dataset.
DATEDIFF(login_date, (SELECT MIN(login_date) FROM Logins)) + 1
- Note: Subtracting the DATEDIFF() from ROW_NUMBER() can sometimes be tricky. Some SQL variants (e.g., MySQL) may reject subtracting an unsigned value (like ROW_NUMBER()) from a signed value. A simple fix is to ensure all DATEDIFF() values are greater than 0 by adding 1 to the output.
SELECT
id,
login_date,
(DATEDIFF(login_date, (SELECT MIN(login_date) FROM Logins)) + 1) -
(ROW_NUMBER() OVER(PARTITION BY id ORDER BY login_date)) AS 'gaps'
FROM (SELECT DISTINCT * FROM Logins) l
The rest is quite simple! Similar to prior examples, we can use GROUP BY and then count the rows in each group to find IDs with at least 5 consecutive login days.
WITH t1 AS (
SELECT
id,
login_date,
(DATEDIFF(login_date, (SELECT MIN(login_date) FROM Logins)) + 1) -
(ROW_NUMBER() OVER(PARTITION BY id ORDER BY login_date)) AS 'gaps'
FROM (SELECT DISTINCT * FROM Logins) l
)
SELECT DISTINCT id
FROM t1
GROUP BY id, gaps
HAVING COUNT(id) >= 5
That’s it! If you have any questions, feel free to leave a comment below. Otherwise, happy coding!
Reference
- LeetCode.com