The Problem: Finding Good Deals Is a Full-Time Job
Every property buyer asks the same question: Is this a good price?
Answering it requires analyzing hundreds of comparable listings, understanding neighborhood trends, and factoring in property condition, size, building type, and dozens of other variables. Real estate agents do this intuitively—but it takes years of experience, and their assessments are inherently subjective.
For the Croatian property market specifically:
- Listings are fragmented across multiple platforms (Njuskalo, Crozilla, and others)
- Pricing is inconsistent — similar properties listed at wildly different prices
- Market data is opaque — no centralized analytics for regional price trends
- Deal identification is manual — scanning hundreds of listings daily to find underpriced properties
What if an AI could aggregate every listing, predict fair market value, and surface the best deals automatically?
What We Built: Intelligent Real Estate Analysis Platform
We built a full-stack platform that continuously scrapes the Croatian property market, predicts fair prices using machine learning, and automatically identifies underpriced properties—accessible through both a dashboard and an AI chatbot.
The Platform at a Glance
| Component | What It Does |
|---|---|
| Automated Scraping | Collects listings from multiple marketplaces on a schedule |
| ML Price Prediction | 10+ models estimate fair value for any property |
| Deal Detection | Scoring algorithm surfaces underpriced listings |
| Similarity Search | Finds comparable properties for any listing |
| AI Chatbot | Natural language interface for property search and analysis |
| Market Analytics | Regional statistics, trends, and feature correlations |
How It Works: From Raw Listings to Actionable Insights
1. Automated Data Collection
The system scrapes multiple real estate platforms across three Croatian regions (Varazdin, Zagreb, Split) on a configurable schedule:
- APScheduler triggers background scraping jobs automatically
- Firecrawl API handles dynamic JavaScript-rendered pages
- Custom parsers (BeautifulSoup) extract structured data from each platform
- Deduplication engine prevents the same listing from appearing twice
Each listing captures 20+ attributes: price, location, square meters, rooms, floor, building type, condition, heating, parking, construction phase, and more.
2. Machine Learning Price Prediction
This is the core of the platform. We trained and evaluated 10+ ML models to find the best predictor for each region:
| Model | Approach | Strength |
|---|---|---|
| Linear Regression | Baseline | Interpretable |
| Ridge / Lasso | Regularized linear | Handles multicollinearity |
| Random Forest | Ensemble trees | Captures non-linear patterns |
| XGBoost | Gradient boosting | High accuracy on tabular data |
| CatBoost | Gradient boosting | Handles categorical features natively |
| Neural Network | Deep learning | Complex feature interactions |
| KNN | Instance-based | Local market patterns |
| SVR | Support vector | Robust to outliers |
The system automatically selects the best-performing model per region and feature set. Five feature configurations (minimal, core, standard, numeric, full) let us balance accuracy against data availability.
Predictions include confidence intervals—not just a single number, but a range reflecting model uncertainty.
3. Deal Detection Algorithm
For every listing, the system:
- Predicts the fair market price using the best model
- Compares the prediction to the asking price
- Calculates a deal score based on the gap
- Factors in property condition, location desirability, and listing age
Properties priced significantly below predicted value surface as potential deals—ranked and ready for review.
4. Similarity Search
For any property, the system finds the most comparable listings based on:
- Location proximity
- Square meter range
- Building type and condition
- Price per square meter
- Number of rooms
This gives buyers and agents instant comps without manual searching.
The AI Chatbot: Talk to Your Market Data
Natural Language Property Search
Instead of filling out filter forms, users ask questions in plain language (including Croatian):
- "Show me apartments in Zagreb under 150,000 euros with at least 2 bedrooms"
- "What's the average price per square meter in Varazdin?"
- "Find deals on houses in Split that need renovation"
- "Estimate the price for a 65m2 apartment in Zagreb, new construction, 3rd floor"
LLM-Powered Tool Calling
The chatbot uses Gemini with function calling to route queries to the right backend tools:
| User Intent | Tool Called |
|---|---|
| Search listings | search_listings with filters |
| Estimate price | predict_price with property features |
| Find deals | find_deals with criteria |
| Market stats | get_market_statistics for region |
| Find similar | find_similar_properties for reference |
| Renovation cost | estimate_renovation_cost by scope |
| Neighborhood info | get_neighborhood_info for location |
The chatbot interprets responses and presents results conversationally—no data science expertise required.
Technical Architecture
Stack
| Layer | Technology | Why |
|---|---|---|
| Frontend | Next.js 16, React 19, Tailwind 4 | Fast, modern UI with SSR |
| API | FastAPI (Python) | Async endpoints, ML integration |
| ML | scikit-learn, XGBoost, CatBoost | Proven tabular data models |
| LLM | OpenRouter + Gemini 2.5 Flash | Natural language interface |
| Database | Supabase (PostgreSQL) | Managed, scalable storage |
| Scraping | Firecrawl + BeautifulSoup | JS-rendered + static pages |
| Scheduling | APScheduler | Automated data collection |
Data Quality
- Deduplication catches listings posted across multiple platforms
- Outlier detection filters unrealistic prices before model training
- Feature engineering creates derived metrics (price/m2, age estimates)
- Continuous retraining as new data is collected
Results
| Metric | Value |
|---|---|
| Listings tracked | Thousands across 3 regions |
| ML models evaluated | 10+ per region |
| Price prediction | Confidence intervals on every estimate |
| Deal detection | Automated scoring for every listing |
| Time saved | Hours of manual comparison eliminated |
What a Real Estate Agent Said:
"I used to spend my mornings scrolling through three different websites comparing prices. Now I check the deals dashboard over coffee and focus on the listings that are actually worth pursuing. The price predictions are surprisingly close to what I'd estimate myself."
— Independent real estate agent, Croatia
Who This Is For
This platform works for:
- Real estate agencies wanting data-driven pricing insights
- Property investors identifying undervalued opportunities
- Individual buyers researching fair market value before making offers
- Market analysts tracking regional price trends
- Property portals adding intelligent features to existing platforms
The same architecture—scrape, predict, detect deals—applies to any market where public listing data exists.

