Hanukkah of Data 5783
Solving the data version of Advent of Code in R
The madpersons behind the retro-chic data analysis tool VisiData have created an alternative to the infamous “Advent of Code”. Instead of programming to solve yet more optimisation puzzles, and instead of the hackneyed Christmas theme, the Hanukkah of Data requires data analysis to solve a murder mystery rug-finding mission.
I only just found out about it, so although one puzzle was released for each day of real-life Chanukah, I was able to gorge on the first 7 puzzles, needing only to wait for the last one. Each puzzle is represented by an ASCII candle in an ASCII world — quite an impressive achievement in its own right.
Here’s how I solved all 8 puzzles in R. The data comes on a password-protected USB drive (the password being the year of the previous Hanukkah of Data, which is a nice meta-puzzle). I can probably give more details (aka make the code clearer) upon request.
library(tidyverse, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
# Download and extract the data
download.file(
url = "https://hanukkah.bluebird.sh/5783/noahs-csv.zip",
destfile = "noahs-csv.zip"
)
system("unzip -P 5777 -o noahs-csv.zip")
# Load the four tables
customers <- read_csv("noahs-customers.csv")
orders <- read_csv("noahs-orders.csv")
items <- read_csv("noahs-orders_items.csv")
products <- read_csv("noahs-products.csv")
# Create a denormalized view for analysis
orders_plus <- orders |>
inner_join(items, by = "orderid") |>
inner_join(products, by = "sku") |>
inner_join(customers, by = "customerid")
Puzzle 1: The Private Investigator
Sarah needs help finding her grandmother’s rug, and a cashier remembers a private investigator who once shopped at Noah’s. His business card was clever: the phone number spelled out his surname on the old T9 phone keypad (2=ABC, 3=DEF, and so on).
So we need to build a letter-to-number converter. The mildly annoying part is that the keypad isn’t perfectly regular — S, V, Y, and Z get shunted around — so they need special handling.
# Convert a single letter to its T9 phone number
letter_to_phone_digit <- function(letter) {
letter_index <- which(letters == letter)
digit <- floor(((letter_index - 1) / 3) + 2)
# Handle irregular keys: S(7), V(8), Y(9), Z(9)
if (letter %in% c("s", "v", "y", "z")) {
return(as.character(digit - 1))
}
return(as.character(digit))
}
# Convert a full name to its phone number equivalent
name_to_phone <- function(name) {
# Extract last name, lowercase, remove non-letters
surname <- (strsplit(name, " ") |> unlist())[-1]
surname_clean <- str_to_lower(surname) |> str_replace_all("[^a-z]", "")
surname_short <- str_sub(surname_clean, end = 10) # Phone numbers are 10 digits
surname_short |>
strsplit("") |>
unlist() |>
map_chr(letter_to_phone_digit) |>
paste0(collapse = "")
}
# Find customers where phone matches name
customers |>
mutate(
name_as_phone = map_chr(name, name_to_phone),
phone_digits = str_replace_all(phone, "-", "")
) |>
filter(phone_digits == name_as_phone) |>
pull(phone)
[1] "488-836-2374"
Puzzle 2: The Rug Contractor
The investigator tells us the rug was cleaned by an outside contractor. A claim ticket reads “2017 spec JD” — the year and the contractor’s initials. Apparently these contractors would meet at Noah’s to discuss jobs over coffee and bagels, which is a fairly civilised way to run a business.
We find orders containing both coffee and bagel SKUs, filter to 2017, and look for the right initials.
# Helper to extract initials from a name
get_initials <- function(name) {
str_extract_all(name, "[A-Z]") |> unlist() |> paste0(collapse = "")
}
# Coffee and bagel SKUs (found by exploring the products table)
coffee_bagel_skus <- c("DLI1464", "BKY4234", "BKY5887")
orders_plus |>
filter(sku %in% coffee_bagel_skus) |>
group_by(orderid) |>
filter(n() > 1) |> # Must have bought multiple items
ungroup() |>
filter(year(shipped) == 2017) |>
mutate(initials = map_chr(name, get_initials)) |>
filter(initials == "JD") |>
pull(phone) |>
unique()
[1] "212-771-8924"
Puzzle 3: The Spider Hat Guy
The contractor gave the rug to someone near South Ozone Park subway station. This person wore a hat with a spider on it, and — usefully for our purposes — was an Aries born in the Year of the Dog.
Pure demographic filtering. Years of the Dog cycle every 12 years, and Aries runs from late March through mid-April.
dog_years <- c(2006, 1994, 1982, 1970, 1958, 1946, 1934)
customers |>
filter(year(birthdate) %in% dog_years) |>
filter(month(birthdate) %in% c(3, 4)) |> # March or April for Aries
filter(citystatezip == "South Ozone Park, NY 11420") |>
pull(phone)
[1] "516-636-7397"
Puzzle 4: The Early Bird Baker
Spider-hat cleaned part of the rug, found a beautiful owl pattern, hung it on his wall. Then he met someone on Tinder who fixed his bike at 5am and received the rug as payment. This person loved arriving at Noah’s before dawn to get the first pastries out of the oven.
The “before dawn” detail is the key discriminator. We want bakery orders placed around 4–5am with same-day pickup.
orders_plus |>
filter(hour(ordered) == 4) |> # 4am orders
filter(ordered == shipped) |> # Same-day delivery (pickup)
filter(str_detect(sku, "BKY")) |> # Bakery items
select(customerid, name, phone) |>
distinct()
# A tibble: 2 × 3
customerid name phone
<dbl> <chr> <chr>
1 5105 Heather Brown 929-518-7269
2 5375 Christina Booker 718-649-9036
One customer fits. (I originally called this variable tinder_girl, which tells you everything about my variable naming discipline.)
customers |>
filter(customerid == 5375) |>
pull(phone)
[1] "718-649-9036"
Puzzle 5: The Cat Lady from Queens
The early-bird baker gave the rug away on Freecycle to a woman from Queens Village who wore a Noah’s Market sweatshirt and had about ten elderly cats, all with cataracts. She spent twenty minutes carefully wrapping the rug in plastic in the rain. Dedicated.
We need a woman in Queens who buys cat-related products.
orders_plus |>
filter(str_detect(citystatezip, "Queens")) |>
filter(str_detect(desc, "Cat")) |>
select(customerid, name, phone) |>
distinct()
# A tibble: 6 × 3
customerid name phone
<dbl> <chr> <chr>
1 7675 Anita Koch 315-492-7411
2 6560 Joseph Weaver 680-839-3746
3 11780 Joshua White 914-632-0184
4 3335 Jose Cook 516-745-9461
5 5486 Sam House 607-836-2966
6 6905 Justin Cochran 716-525-5173
Only one female name in the results. That’s her.
# First result is our match
orders_plus |>
filter(str_detect(citystatezip, "Queens")) |>
filter(str_detect(desc, "Cat")) |>
slice(1) |>
pull(phone)
[1] "315-492-7411"
Puzzle 6: The Frugal Cousin
The cat lady passed the rug to her cousin, who was moving into a new flat with wood floors. This cousin is spectacularly frugal — clips every coupon, stalks every sale at Noah’s.
The dataset includes both unit_price (what the customer paid) and wholesale_cost (what Noah’s paid for the product). A coupon extremist would have a suspicious number of orders where they paid less than wholesale.
# Find customers who frequently buy below wholesale
loss_leaders <- orders_plus |>
filter(wholesale_cost > unit_price) |>
count(customerid, name, phone, sort = TRUE)
# The top customer has far more loss-leader purchases than anyone else
loss_leaders |> head(5)
# A tibble: 5 × 4
customerid name phone n
<dbl> <chr> <chr> <int>
1 8342 Emily Randolph 914-868-0316 26
2 4308 Travis Bartlett 929-906-5980 8
3 2956 Jonathan Williams 332-668-0158 6
4 5788 Joshua Hansen 680-511-5677 6
5 1422 Morgan Smith 585-689-9118 5
Customer 8342 has, by far, the most loss-making orders. That’s the frugal cousin.
customers |>
filter(customerid == 8342) |>
pull(phone)
[1] "914-868-0316"
Puzzle 7: The Ex-Boyfriend
The frugal cousin tells a romantic story: she met a man at Noah’s when he handed back something she’d dropped. They’d bought nearly identical items but preferred each other’s colour choices, so they swapped. They dated, she moved in, it didn’t work out. She left hastily one night and the rug stayed behind.
This is the trickiest puzzle. We need to find two orders placed on the same day for the same product but in different colours. Essentially a self-join on colour-stripped product descriptions.
# Add date components and normalize product descriptions (remove color)
orders_with_dates <- orders_plus |>
mutate(
order_day = day(ordered),
order_month = month(ordered),
order_year = year(ordered),
product_base = str_replace(desc, "\\([a-z]+\\)", "") |> str_squish()
)
# Get the frugal cousin's orders
cousin_orders <- orders_with_dates |>
filter(customerid == 8342)
# Find matching orders: same base product, same date, different color
romantic_match <- cousin_orders |>
inner_join(
orders_with_dates,
by = c("product_base", "order_day", "order_month", "order_year")
) |>
filter(customerid.y != 8342) |> # Not the cousin herself
filter(desc.x != desc.y) |> # Different color variant
arrange(abs(ordered.x - ordered.y)) # Orders closest in time
romantic_match |>
slice(1) |>
pull(phone.y)
[1] "315-618-5263"
Puzzle 8: The Collector
The ex-boyfriend says a woman appeared and vanished from his life, leaving behind a rug depicting a snake hidden in branches. He gave it to his sister in Manhattan, who passed it to “an acquaintance who collects all sorts of junk” — specifically, someone who owns an entire set of Noah’s collectibles.
Noah’s Market apparently sells branded collectibles. We want whoever has accumulated the most.
orders_plus |>
filter(str_detect(desc, "^Noah")) |> # Products starting with "Noah"
group_by(customerid, name, phone) |>
summarise(noah_items = n(), .groups = "drop") |>
filter(noah_items > 200) |> # Serious collector
pull(phone)
[1] "929-906-5980"
The Finale
Entering the final phone number reveals something quite spectacular — but I won’t spoil it. If you haven’t tried the Hanukkah of Data yet, give it a go. They’ve continued the tradition with new datasets and puzzles in subsequent years.
What I particularly enjoyed is how the puzzles mirror real analytical work. You have messy, interconnected datasets. You have vague requirements (“find someone who shops the sales”). You make assumptions, poke at the data, iterate. Eight puzzles, eight phone numbers, one rug recovered. Not bad for a Christmas afternoon.