Hanukkah of Data 5783

Solving the data version of Advent of Code in R

data
analysis
fun
Step-by-step R solutions to all 8 Hanukkah of Data puzzles - a data-focused alternative to Advent of Code with a delightful narrative twist.
Author

Guy Freeman

Published

December 25, 2022

The creators of VisiData, the retro-chic terminal-based data analysis tool, have built something wonderful: a data-focused alternative to Advent of Code. Instead of optimizing algorithms, you’re exploring datasets. Instead of Christmas elves, you’re helping Sarah find her grandmother’s rug.

Hanukkah of Data presents eight puzzles released over the eight nights of Hanukkah. Each puzzle gives you a narrative clue and asks you to find someone’s phone number in a dataset from “Noah’s Market” - a fictional New York grocery store. The puzzles chain together beautifully: each person you find leads you to the next.

I discovered this challenge late and binged through the first seven puzzles in one sitting, then had to wait for the finale. Here’s how I solved all eight puzzles in R.

The Data

Noah’s Market has provided us with four CSV files on a password-protected USB drive (the password is the year of the previous Hanukkah of Data - a nice meta-puzzle):

  • customers: Customer names, addresses, phone numbers, and birthdates
  • orders: Order timestamps and shipping info
  • orders_items: Line items linking orders to products
  • products: Product details including SKUs, descriptions, and pricing

Let’s load everything and create a denormalized view that joins orders with their items, products, and customers. This orders_plus table will be our workhorse for most puzzles.

Code
library(tidyverse, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)

# Download and extract the data
download.file(
  url = "https://hanukkah.bluebird.sh/5783/noahs-csv.zip",
  destfile = "noahs-csv.zip"
)
system("unzip -P 5777 -o noahs-csv.zip")

# Load the four tables
customers <- read_csv("noahs-customers.csv")
orders <- read_csv("noahs-orders.csv")
items <- read_csv("noahs-orders_items.csv")
products <- read_csv("noahs-products.csv")

# Create a denormalized view for analysis
orders_plus <- orders |>
  inner_join(items, by = "orderid") |>
  inner_join(products, by = "sku") |>
  inner_join(customers, by = "customerid")

Puzzle 1: The Private Investigator

Sarah needs help finding her grandmother’s rug, and a cashier remembers a private investigator who once shopped at Noah’s. The investigator’s business card was clever: their phone number spelled out their last name using the old T9 phone keypad mapping (2=ABC, 3=DEF, etc.).

The challenge: Find a customer whose phone number, when decoded, matches their last name.

This requires building a letter-to-number converter. The tricky part is that the phone keypad isn’t perfectly regular - the letters S, V, Y, and Z are “pushed” to share keys with fewer letters, so they need special handling.

Code
# Convert a single letter to its T9 phone number
letter_to_phone_digit <- function(letter) {
  letter_index <- which(letters == letter)
  digit <- floor(((letter_index - 1) / 3) + 2)
  # Handle irregular keys: S(7), V(8), Y(9), Z(9)
  if (letter %in% c("s", "v", "y", "z")) {
    return(as.character(digit - 1))
  }
  return(as.character(digit))
}

# Convert a full name to its phone number equivalent
name_to_phone <- function(name) {
  # Extract last name, lowercase, remove non-letters
  surname <- (strsplit(name, " ") |> unlist())[-1]
  surname_clean <- str_to_lower(surname) |> str_replace_all("[^a-z]", "")
  surname_short <- str_sub(surname_clean, end = 10)  # Phone numbers are 10 digits

  surname_short |>
    strsplit("") |>
    unlist() |>
    map_chr(letter_to_phone_digit) |>
    paste0(collapse = "")
}

# Find customers where phone matches name
customers |>
  mutate(
    name_as_phone = map_chr(name, name_to_phone),
    phone_digits = str_replace_all(phone, "-", "")
  ) |>
  filter(phone_digits == name_as_phone) |>
  pull(phone)
[1] "488-836-2374"

Puzzle 2: The Rug Contractor

The private investigator tells us Sarah’s rug was cleaned by an outside contractor. The cleaning company’s records show a claim ticket reading “2017 spec JD” - the year and the contractor’s initials. These contractors would meet at Noah’s to discuss jobs over coffee and bagels.

The challenge: Find someone with initials “JD” who bought both coffee and bagels in 2017.

We need to find orders containing multiple items from the coffee and bagel SKUs, then filter to 2017 and check for matching initials.

Code
# Helper to extract initials from a name
get_initials <- function(name) {
  str_extract_all(name, "[A-Z]") |> unlist() |> paste0(collapse = "")
}

# Coffee and bagel SKUs (found by exploring the products table)
coffee_bagel_skus <- c("DLI1464", "BKY4234", "BKY5887")

orders_plus |>
  filter(sku %in% coffee_bagel_skus) |>
  group_by(orderid) |>
  filter(n() > 1) |>  # Must have bought multiple items
  ungroup() |>
  filter(year(shipped) == 2017) |>
  mutate(initials = map_chr(name, get_initials)) |>
  filter(initials == "JD") |>
  pull(phone) |>
  unique()
[1] "212-771-8924"

Puzzle 3: The Spider Hat Guy

The contractor remembers giving the rug to someone they saw near the South Ozone Park subway station. The person wore a hat with a spider on it, and the contractor recalls they were an Aries born in the Year of the Dog.

The challenge: Find a customer in South Ozone Park, born under Aries (March 21 - April 19) in a Year of the Dog (2006, 1994, 1982, 1970, 1958…).

This is pure demographic filtering. Years of the Dog repeat every 12 years.

Code
dog_years <- c(2006, 1994, 1982, 1970, 1958, 1946, 1934)

customers |>
  filter(year(birthdate) %in% dog_years) |>
  filter(month(birthdate) %in% c(3, 4)) |>  # March or April for Aries
  filter(citystatezip == "South Ozone Park, NY 11420") |>
  pull(phone)
[1] "516-636-7397"

Puzzle 4: The Early Bird Baker

Our spider-hat friend reveals they cleaned part of the rug, found a beautiful owl pattern, and hung it on their wall. Then they met someone on Tinder who fixed their bike at 5am and received the rug as payment. This person loved arriving at Noah’s before dawn to get the first pastries out of the oven.

The challenge: Find someone who ordered bakery items (BKY prefix) very early in the morning with same-day delivery.

The “before dawn” detail is key. We’re looking for orders placed around 4-5am.

Code
orders_plus |>
  filter(hour(ordered) == 4) |>        # 4am orders
  filter(ordered == shipped) |>         # Same-day delivery (pickup)
  filter(str_detect(sku, "BKY")) |>     # Bakery items
  select(customerid, name, phone) |>
  distinct()
# A tibble: 2 × 3
  customerid name             phone       
       <dbl> <chr>            <chr>       
1       5105 Heather Brown    929-518-7269
2       5375 Christina Booker 718-649-9036

Looking at the results, there’s one customer who fits this pattern. (I originally called this variable tinder_girl but that’s not particularly descriptive!)

Code
customers |>
  filter(customerid == 5375) |>
  pull(phone)
[1] "718-649-9036"

Puzzle 5: The Cat Lady from Queens

The early-bird baker gave away the rug on Freecycle to a woman from Queens Village. She wore a Noah’s Market sweatshirt and had about ten elderly cats, all with cataracts. It was raining, and she spent twenty minutes carefully wrapping the rug in plastic.

The challenge: Find a woman in Queens who has bought cat-related products.

We’re looking for someone in Queens whose order history suggests cat ownership.

Code
orders_plus |>
  filter(str_detect(citystatezip, "Queens")) |>
  filter(str_detect(desc, "Cat")) |>
  select(customerid, name, phone) |>
  distinct()
# A tibble: 6 × 3
  customerid name           phone       
       <dbl> <chr>          <chr>       
1       7675 Anita Koch     315-492-7411
2       6560 Joseph Weaver  680-839-3746
3      11780 Joshua White   914-632-0184
4       3335 Jose Cook      516-745-9461
5       5486 Sam House      607-836-2966
6       6905 Justin Cochran 716-525-5173

Only one female name in the results - that’s our cat lady.

Code
# First result is our match
orders_plus |>
  filter(str_detect(citystatezip, "Queens")) |>
  filter(str_detect(desc, "Cat")) |>
  slice(1) |>
  pull(phone)
[1] "315-492-7411"

Puzzle 6: The Frugal Cousin

The cat lady gave the rug to her cousin who was moving into a new apartment with wood floors. This cousin is extremely frugal - she clips every coupon and shops every sale at Noah’s.

The challenge: Find someone who consistently buys items below their wholesale cost.

Noah’s data includes both unit_price (what the customer paid) and wholesale_cost (what Noah’s paid). A truly frugal shopper would have many orders where they paid less than wholesale - taking advantage of deep discounts and coupons.

Code
# Find customers who frequently buy below wholesale
loss_leaders <- orders_plus |>
  filter(wholesale_cost > unit_price) |>
  count(customerid, name, phone, sort = TRUE)

# The top customer has far more loss-leader purchases than anyone else
loss_leaders |> head(5)
# A tibble: 5 × 4
  customerid name              phone            n
       <dbl> <chr>             <chr>        <int>
1       8342 Emily Randolph    914-868-0316    26
2       4308 Travis Bartlett   929-906-5980     8
3       2956 Jonathan Williams 332-668-0158     6
4       5788 Joshua Hansen     680-511-5677     6
5       1422 Morgan Smith      585-689-9118     5

Customer 8342 stands out dramatically - they’re the frugal cousin.

Code
customers |>
  filter(customerid == 8342) |>
  pull(phone)
[1] "914-868-0316"

Puzzle 7: The Ex-Boyfriend

The frugal cousin tells a romantic story: she met a man at Noah’s when he handed back something she’d dropped. They’d bought nearly identical items but preferred each other’s color choices, so they swapped. They dated, she moved in, but it didn’t work out. She left hastily one night, leaving the rug behind.

The challenge: Find someone who bought the same product as the frugal cousin on the same day, but in a different color.

This is the trickiest puzzle. We need to find two orders on the same day for the same product (ignoring color variants) but with different specific colors.

Code
# Add date components and normalize product descriptions (remove color)
orders_with_dates <- orders_plus |>
  mutate(
    order_day = day(ordered),
    order_month = month(ordered),
    order_year = year(ordered),
    product_base = str_replace(desc, "\\([a-z]+\\)", "") |> str_squish()
  )

# Get the frugal cousin's orders
cousin_orders <- orders_with_dates |>
  filter(customerid == 8342)

# Find matching orders: same base product, same date, different color
romantic_match <- cousin_orders |>
  inner_join(
    orders_with_dates,
    by = c("product_base", "order_day", "order_month", "order_year")
  ) |>
  filter(customerid.y != 8342) |>       # Not the cousin herself
  filter(desc.x != desc.y) |>            # Different color variant
  arrange(abs(ordered.x - ordered.y))    # Orders closest in time

romantic_match |>
  slice(1) |>
  pull(phone.y)
[1] "315-618-5263"

Puzzle 8: The Collector

The ex-boyfriend says a woman mysteriously appeared and then vanished from his life, leaving behind a rug depicting a snake hidden in branches. He gave it to his sister in Manhattan, who passed it to “an acquaintance who collects all sorts of junk” - specifically, someone who “owns an entire set of Noah’s collectibles.”

The challenge: Find someone who has bought a large number of Noah’s-branded products.

Noah’s Market apparently sells branded collectibles. We need to find a customer who’s accumulated an impressive collection.

Code
orders_plus |>
  filter(str_detect(desc, "^Noah")) |>    # Products starting with "Noah"
  group_by(customerid, name, phone) |>
  summarise(noah_items = n(), .groups = "drop") |>
  filter(noah_items > 200) |>              # Serious collector
  pull(phone)
[1] "929-906-5980"

The Finale

Entering the final phone number reveals something spectacular - but I won’t spoil it here. The Hanukkah of Data team has crafted something genuinely delightful.

What I love about this challenge is how it mirrors real data analysis work. You have messy, interconnected datasets. You have vague requirements (“find someone who shops the sales”). You have to make assumptions, explore the data, and iterate. The puzzle designers cleverly embedded multiple valid approaches for most puzzles while ensuring each has a unique solution.

If you haven’t tried it yet, give it a go. And if you’re looking for more years to solve, they’ve continued the tradition with new datasets and puzzles.