earthUI: An Interactive Interface for the earth (MARS) Package

William Bert Craytor

Abstract

earthUI is a graphical user interface for the R earth package, which implements Multivariate Adaptive Regression Splines (MARS). It offers three purpose modes—general predictive modeling, real-estate appraisal, and market-area analysis—and guides the user through data import, model configuration, fitting, and the interpretation of earth’s diagnostics and graphical output. This article documents earthUI’s data-format requirements, modeling workflow, output displays, and complete feature reference.

Introduction

What Is earthUI?

earthUI is a graphical user interface for the R earth package, which implements Multivariate Adaptive Regression Splines (MARS) (Friedman 1991). The modelling engine is Stephen Milborrow’s earth package (Milborrow 2024), documented in detail by Milborrow (n.d.-a) and Milborrow (n.d.-b). It runs as a local Shiny application — there is no login, no server, and no accounts. You launch it from R, import a dataset (CSV or Excel), configure your model, and fit it interactively.

The application provides a complete workflow: data import, variable configuration, model fitting, diagnostic plots, variable importance, model equations, and downloadable reports in HTML, Word, or PDF format.

Three Purpose Modes

When you launch earthUI, a Purpose radio button at the top of the sidebar lets you choose one of three modes:

In all three modes, the core modeling engine is identical — you are always fitting an Earth (MARS) model. The purpose setting controls which additional tools and interface elements are available.

Switching purposes clears all state. When you change the purpose radio, earthUI resets to a clean default state: imported data, model results, tabs, variable configuration, and earth parameters are all cleared. You must re-import your file after switching. Previously saved settings for that file and purpose combination will then be automatically restored from your last session.

Real Estate–Specific Features

When either For Appraisal or Market Area Analysis is selected, earthUI activates several features designed for real estate analysis:

Getting Started

To use earthUI:

  1. Install the package in R: install.packages("earthUI") or install from source.
  2. Launch the application: run earthUI::launch() in R, or from the command line run Rscript -e 'earthUI::launch()'. The app opens in your web browser on port 7878. You can also access the app directly by navigating to http://localhost:7878 in your browser.
  3. Import your data using the file upload in Section 1 of the sidebar. earthUI accepts CSV and Excel files.
  4. Select your Purpose (General, For Appraisal, or Market Area Analysis).
  5. Configure variables — choose your target(s) and predictors, set data types, and assign any special column roles.
  6. Set Earth parameters — degree, nprune, subset filters, and other earth() arguments.
  7. Fit the model — click “Fit Earth Model” and review the results in the main panel.
  8. Export — download predictions as Excel, generate reports, or (in appraisal mode) compute RCA adjustments and Sales Comparison Grids.

Settings are automatically persisted in your browser’s local storage and restored when you reload the same input file.

MLS Input Data Requirements

For real estate appraisal and market analysis workflows, your input data typically comes from a Multiple Listing Service (MLS) export. This chapter describes the expected file structure and the columns that earthUI can use.

File Format & Structure

earthUI accepts CSV and Excel (.xlsx, .xls) files. On import, column names are automatically converted to snake_case — for example, “Living SqFt” becomes living_sqft, “Contract Date” becomes contract_date, and “Sale Price” becomes sale_price. This normalization ensures consistent column references throughout the workflow. The CSV separator and decimal mark used during import are determined by the locale settings (see Chapter 3, “Locale & Regional Settings”).

Your data file should be a flat table with one row per property and one column per attribute. The first row of the file must contain column headers.

Required Columns for Appraisal Mode

While earthUI works with any set of columns, the full appraisal workflow (RCA adjustments + Sales Comparison Grid) benefits from having the following columns in your MLS export:

Spreadsheet column names can be in a foreign language — the “special” names are in English so that the R program can give them special treatment. Otherwise, the given column names show up in the regression models, graphs, and (if doing appraisals) the Intermediate Sales Grid.

Not all columns are required. earthUI adapts — if a column is missing, the corresponding feature is simply omitted. For example, if no concessions column is designated, the Net SP row in the Sales Grid shows Sale Price without a concessions deduction. However, for real estate pricing models certain columns are highly recommended to achieve acceptable fit:

  1. Sale Age — the number of days between the contract sale date and the effective date of the appraisal or analysis. If multi-year sales history is being used, especially for periods over 5 years, sale_age often plays a central role in estimating the sale price. In fact it is often so important that without it, earthUI fails to provide any model at all.

  2. Living Area — also goes by names such as “Living Sqft,” “GLA” (gross living area) and so on. This is another leading determinant of sale price.

  3. Total Bath Count — the total number of full, quarter, half, and 3/4 bathrooms. For example, two full baths and one half-bath would be a value of 2.5.

  4. Garage Bays or Garage Area — the number of garage spaces or the garage square footage.

  5. Lot Size — the land area of the property, typically in square feet or acres.

  6. Longitude, Latitude, and if available Area ID. Adjustments for these will be combined under a single Location adjustment in the Sales Grid.

Special Column Naming Conventions

earthUI identifies columns by their special type designation, not by their column name. You can name your columns anything you like in the MLS export — what matters is that you assign the correct special type in the Variable Configuration table (Chapter 6).

For example, your MLS might export living area as “GLA”, “Living SqFt”, “liv_area”, or “gross_living_area”. After import (where it becomes snake_case), you simply designate it as living_area in the Special dropdown. earthUI will then use it for per-SF residual calculations and Sales Grid grouping regardless of its original name.

Data Quality & Completeness

Subject Row Placement

In Appraisal mode, row 1 must be the subject property. All remaining rows are comparable sales. The subject’s sale price can be left blank (NA) or set to any value — earthUI treats it as NA during fitting regardless.

In Market Area Analysis mode, placing the subject in row 1 is optional. If present, check “Skip first row (subject property)” to exclude it from fitting.

In General mode, there is no special row handling — all rows are treated equally.

General Purpose Mode

Overview

General Purpose mode is the default when you launch earthUI. It provides the complete MARS modeling workflow for any dataset — not just real estate. You can use earthUI for scientific data, financial analysis, engineering studies, or any regression problem where you want to explore non-linear relationships and interactions between variables.

In General mode, the interface omits the real estate–specific features (special columns, sale age, coordinate rounding, RCA). The sidebar is streamlined to focus on variable selection, parameter configuration, model fitting, and export.

The Sidebar Workflow

The sidebar is organized into numbered, collapsible sections that guide you from data import through export:

1. Import Data — File upload accepting CSV and Excel files. For Excel files with multiple sheets, a sheet selector appears. Column names are automatically converted to snake_case.

2. Project Output Folder — A text field specifying where downloads and fit logs are saved (defaults to ~/Downloads).

3. Variable Configuration — Target variable selector (supports multiple targets), predictor table with checkboxes for Include, Factor, and Linear. See Chapter 6 for full details.

4. Earth Call Parameters — All arguments to the earth() function: degree, penalty, nk, pruning method, cross-validation, and more. See Chapter 7 for the complete parameter reference.

5. Fit Earth Model — A single green button that runs the model asynchronously. See Chapter 8 for fitting details.

6. Download Output — Exports predictions, residuals, CQA scores, and per-g-function contributions as an Excel file. See Chapter 9.

7. Download Report — Generates a formatted report (HTML, Word, or PDF) saved to the output folder. See Chapter 12.

Main Panel Tabs

After fitting, the main panel provides nine tabs:

Settings Persistence

earthUI automatically saves your configuration to the browser’s local storage, keyed by both the input filename and the current purpose mode. When you reload the same file under the same purpose, all settings are restored: target selection, predictor checkboxes, data types, earth parameters, and response weights. This means the same file can have different configurations for General, Appraisal, and Market modes. Settings are also backed up to an SQLite database so they persist across browser sessions. A Reset to Defaults button clears all saved settings for the current purpose.

Dark Mode

Click the moon/sun icon in the upper-right corner to toggle between light and dark themes. The theme preference is saved in local storage and persists across sessions.

Locale & Regional Settings

earthUI supports international number, date, and CSV formatting conventions through a country-based locale system. The Country dropdown in Section 1 of the sidebar (below the file upload) selects a preset for 31 supported countries. Each preset configures:

Supported Countries

Override Dropdowns

Below the country selector, four override dropdowns let you change individual settings without switching countries:

When you change the country, all overrides reset to that country’s defaults. Changing an override only affects that one setting.

Saving Defaults

Click Save as my default to store your locale preferences globally. These defaults apply to all future sessions regardless of which data file you load. Per-file settings (target, predictors, parameters) are saved separately in the browser’s local storage, but locale defaults persist across all files via an SQLite database. This two-level approach is designed for organizations like audit firms that work with data from multiple countries — set your most common country as the default, then override per-file when needed.

Appraisal Mode

When you select For Appraisal as the Purpose, earthUI configures itself for single-property valuation. All features described in Chapter 3 remain available; this chapter covers only the appraisal-specific additions.

Subject Row Handling

In appraisal mode, row 1 of your dataset is the subject property and all remaining rows are comparable sales. Your input file must be organized accordingly (see Chapter 2). The subject’s sale price can be left blank or set to any value — earthUI automatically treats it as NA during fitting.

After importing, the Data tab splits into two sections: Subject Property (row 1) and Comparable Sales (rows 2+). Row 1 is always excluded from model fitting — the notification “Skipping row 1 (subject). Fitting on N rows.” confirms this. After fitting, the model still generates predictions for the subject row, shown as est_<target> in the output.

Effective Date & Sale Age

In appraisal and market modes, an Effective Date field appears in the Variable Configuration section (defaulting to today’s date). If you designate a column as contract_date in the Special column dropdown, earthUI computes a sale_age column — the number of integer days between each sale’s contract date and the effective date. This column replaces the original date column as a predictor.

The first time you click Fit after designating a contract date, earthUI creates the sale_age column and notifies you to click Fit again to include it.

Special Column Designations

In appraisal and market modes, a Special dropdown appears for each predictor in the Variable Configuration table. The complete list of special types and their effects:

Only one column per special type is allowed (except display_only). Assigning a special to a second column automatically clears it from the first. A small blue badge appears next to the variable name showing its assigned special type.

RCA Adjustments Overview

The Calculate RCA Adjustments & Download button (sidebar section 7, visible only in appraisal mode after fitting) computes market-derived adjustments for each comparable relative to the subject. The full RCA workflow is described in Chapter 10.

After computing RCA adjustments, the Generate Sales Grid & Download button (sidebar section 8) becomes available. The Sales Grid workflow is described in Chapter 11.

Market Area Analysis Mode

When you select Market Area Analysis as the Purpose, earthUI provides the same real estate–specific features as appraisal mode (special columns, sale age, coordinate rounding) but is oriented toward analyzing a group of properties rather than valuing a single subject.

Differences from Appraisal Mode

When to Use Market Mode

Market Area Analysis mode is appropriate when you are:

Variable Selection

Section 3 of the sidebar — Variable Configuration — is where you choose which columns participate in the model and how they are treated.

Target Variable(s)

The Target (response) variable(s) dropdown at the top of Section 3 lists every column in your dataset. Select one column for a standard single-response model, or multiple columns for a multi-response model. When multiple targets are selected, the model fits all responses simultaneously using earth(cbind(y1, y2, ...) ~ .).

Columns selected as targets are automatically excluded from the predictor list.

The Predictor Table

Below the target selector, a table lists every remaining column with the following fields:

Column Description
Variable Column name (full name shown in tooltip if truncated). In appraisal/market modes, a blue badge shows the assigned special type.
Type Data type dropdown: numeric, integer, character, logical, factor, Date, POSIXct
Inc? Checkbox — include this column as a predictor in the model
Special Dropdown (appraisal/market only) — see Special Column Types Reference below
Factor Checkbox — treat this column as a categorical variable
Linear Checkbox — force linear entry only (no hinge functions)
NAs Count of missing values. Shown in red when more than 30% of values are missing

A hint line above the table explains the abbreviations: “Type = column data type, Inc = include as predictor, Factor = treat as categorical, Linear = linear-only (no hinges).”

Data Type Detection & Overrides

earthUI automatically detects data types on import. Numeric, integer, logical, factor, and date columns are recognized. Character columns that look like dates (at least 80% of values parse against common date formats) are classified as Date.

You can override any detection by changing the Type dropdown. When you change a column to character or factor, the Factor checkbox is automatically checked. Changing types affects how the column is passed to the earth() function.

Factor and Linear Flags

Special Column Types Reference

In appraisal and market modes, the Special dropdown provides the following options. Each type can be assigned to at most one column (except display_only, which allows multiple):

Date & Time Types:

Monetary Types:

Size & Location Types:

Age Types:

Display Types:

Multiple Targets

Selecting more than one target variable fits a multi-response Earth model. When multiple targets are selected:

Parameter Selection

Section 4 of the sidebar — Earth Call Parameters — provides access to all arguments accepted by the earth() function. Each parameter has a blue help icon (?) with a tooltip explanation. Parameters are organized into collapsible subsections.

Basic Parameters

Parameter Default Description
subset (empty) Row filter expression. See “Subset Filtering” below.
weights NULL Column selector for case (row) weights. Only numeric columns are listed.
wp NULL Response weights for multi-target models. Button opens a dialog with one numeric input per target (default 1.0 each).
keepxy off Retain x, y, subset, and weights in the model object.
trace 0 Trace level for fitting output (0 through 5).
glm none Optional GLM family: gaussian, binomial, or poisson.
degree 1 Maximum interaction order. Setting degree \(\geq\) 2 auto-enables cross-validation and reveals the Allowed Interactions matrix.
penalty 2 GCV penalty per knot. Higher values produce simpler models.

Forward Pass

Parameter Default Description
nk auto Maximum terms before pruning. Default: min(200, max(20, 2\(\times\)predictors)) + 1.
thresh 0.001 Forward-step threshold. Smaller values allow more terms.
minspan 0 (auto) Minimum span between knots. Negative values set the maximum knots per predictor.
endspan 0 (auto) End span — minimum distance from a knot to the edge of the data.
newvar.penalty 0 Penalty for introducing a new variable (encourages reuse of existing predictors).
fast.k 20 Number of parent terms to consider in the fast MARS algorithm.
fast.beta 1 Controls the fast MARS aging factor.

Allowed Interactions

When degree is set to 2 or higher, an interaction matrix appears below the basic parameters. This is an upper-triangular grid of checkboxes, one for each predictor pair. A checked box means the two predictors are allowed to interact; unchecking it forbids that specific interaction.

Clicking a predictor name (row or column label) toggles all interactions for that variable. Allow All and Clear All checkboxes at the top provide bulk control. The matrix uses sticky headers so column and row labels remain visible when scrolling.

An info alert reminds you: “Interaction terms increase the risk of overfitting. Cross-validation has been enabled (10-fold).”

Pruning

Parameter Default Description
pmethod backward Pruning method: backward, none, exhaustive, forward, seqrep, or cv.
nprune NULL Maximum terms after pruning. Leave empty for automatic selection.

Cross-Validation

Parameter Default Description
nfold 10 Number of cross-validation folds. Set to 0 to disable CV.
ncross 20 Number of cross-validation repetitions.
stratify on Stratify CV samples so each fold has a similar response distribution.

When degree \(\geq\) 2, cross-validation is automatically enabled (nfold set to 10) to help guard against overfitting from interactions.

Subset Filtering

The subset text input accepts an R expression that filters which rows are used for model fitting. You can type an expression directly (e.g., sale_age < 365 & area_id == 460) or use the Build filter… button to construct one visually.

Build Filter Dialog — Click “Build filter…” to open a guided dialog. Each condition row has a column dropdown, operator (<, >, <=, >=, ==, !=), and a value input that adapts to the column type: numeric input for numbers, date picker for dates, and dropdown of unique values for character/factor columns. Conditions are joined with AND (&) or OR (|) connectors. A preview at the bottom shows the expression and how many rows match. Click Apply to insert the expression into the text input.

Date columns must use as.Date("...") or as.POSIXct("...") wrappers in manual expressions. The Build Filter dialog handles this automatically.

Subset filtering is non-destructive — excluded rows remain in the dataset and receive predictions in the Excel export.

earthUI displays a recommended value below key parameters in Section 4. These recommendations update reactively based on the number of fitting rows (\(n\)) and selected predictors (\(p\)). The formulas are derived from Friedman’s MARS paper, earth’s internal algorithms, and empirical testing.

Forward Pass Parameters

nk (max terms before pruning):

\[\text{nk} = \min\!\bigl(100,\; \max(21,\; 2p + 1,\; \lfloor n/10 \rfloor)\bigr)\]

\(n\) \(p{=}5\) \(p{=}10\)
30 21 21
50 21 21
100 21 21
200 21 21
500 50 50
1000 100 100
1500 100 100

Earth’s default, \(\min(200, \max(20, 2p)) + 1\), does not account for dataset size and can be too small for large datasets, constraining the forward pass before it explores all predictors adequately. The \(\lfloor n/10 \rfloor\) term allows approximately one term per 10 observations — a standard rule of thumb for avoiding overparameterization. The cap of 100 prevents diminishing returns.

minspan (minimum observations between knots):

\[\text{minspan} = \min\!\bigl(16,\; \lfloor 5 + n/50 \rfloor\bigr)\]

endspan (minimum observations from data boundaries):

\[\text{endspan} = \min\!\bigl(16,\; \lfloor 5 + n/28 \rfloor\bigr)\]

\(n\) minspan endspan
30 5 6
50 6 6
100 7 8
200 9 12
300 11 15
500 15 16
1500 16 16

Earth’s auto-calculated minspan uses Friedman’s equation 43: \(\lfloor(-\ln(-\ln 0.95) + \ln(p \cdot n)) / (2.5 \ln 2)\rfloor\), which scales as \(\ln(np)\) and grows too slowly for large datasets. At \(n{=}1500\) it yields only \({\approx}7\), giving roughly 245 candidate knot locations per continuous predictor — too many, allowing the forward pass to fit noise. The recommended formula targets approximately 100 candidates per predictor for large \(n\), while deferring to earth’s auto-calculation for small \(n\) (where it is well-tested). Endspan is set slightly larger than minspan to provide additional boundary protection, which helps when data has thin tails (common in real estate).

penalty (GCV penalty per knot):

\[\text{penalty} = \begin{cases} 3 & \text{if degree} > 1 \\ 2 & \text{if degree} = 1 \end{cases}\]

This is earth’s own default — 2 for additive models, 3 for interaction models (the higher penalty compensates for the larger search space).

newvar.penalty (penalty for introducing a new predictor):

Recommended: 0.1 when predictors are correlated (e.g., living area with bedroom count, bathroom count, lot size). This biases the forward pass toward reusing predictors already in the model rather than introducing correlated alternatives. The mechanism: each new predictor’s RSS improvement is multiplied by \(1/(1 + \text{penalty})\). With \(\text{newvar.penalty} = 0.1\), a new variable must improve RSS by at least 10% more than another knot on an existing variable. This produces simpler models with fewer predictors without affecting final coefficient estimates (the penalty is removed after selection). Leave at 0 if predictors are not correlated.

Pruning Parameters

pmethod: Recommended: backward. The default GCV-based backward pruning is deterministic — the same data always produces the same model, with no dependence on random seeds. This is important for reproducibility, especially in appraisal work. The cv method uses cross-validation for pruning which introduces seed dependence.

nprune: Recommended: leave empty (NULL). Let GCV select the optimal number of terms. Setting nprune imposes a hard cap that overrides GCV’s judgment.

Cross-Validation Parameters

nfold (CV folds):

\[\text{nfold} = \min\!\bigl(15,\; \max(10,\; \lfloor n/100 \rfloor)\bigr)\]

\(n\) nfold
30 10
100 10
500 10
1000 10
1500 15

With pmethod = "backward", cross-validation does not affect the model — it only computes the diagnostic CVR and provides residuals for the variance model. The floor of 10 and cap of 15 reflect this: enough folds for a stable diagnostic without unnecessary computation.

ncross (CV repetitions):

\[\text{ncross} = \max\!\bigl(3,\; \lceil 100/n \rceil\bigr)\]

\(n\) ncross total residuals
30 4 120
50 3 150
100 3 300
1500 3 4500

The variance model (varmod.method = "lm") fits on the cross-validation residuals. The formula targets at least 100 total residuals (\(n \times \text{ncross}\)) for a stable variance estimate, with a floor of 3 (required by earth when a variance model is enabled).

Variance Model

varmod.method: Recommended: lm. This fits prediction intervals using a linear regression of absolute CV residuals on the predicted response. It requires nfold > 0 and ncross >= 3.

Earth Output File

After every successful fit, earthUI writes <filename>_earth_output_<timestamp>.txt to the output folder. This file contains the model terms, summary statistics (R, GRSq, CVRSq), variance model details, and trace log. One file is created per fit, providing a cumulative record for comparing parameter configurations.

Settings Defaults

A radio button at the top of Section 4 controls which defaults are loaded:

Click Save current as default to store the current parameter configuration as your personal defaults.

Fitting the Earth Model

The Fit Button

Section 5 of the sidebar contains a single green button: Fit Earth Model. This button is always visible (not inside a collapsible section). Before fitting, earthUI validates your configuration — a target must be selected and at least one predictor must be included. In appraisal/market modes, latitude and longitude columns are rounded, and sale age is computed from the effective date and contract date (if designated).

Fitting Modal & Trace Output

When you click Fit, a dark modal overlay appears with:

earthUI fits models asynchronously using callr::r_bg(), which runs the computation in a separate R process. This keeps the application responsive during long-running fits. A 300ms polling observer reads output from the background process and streams it to the trace log.

When fitting completes, a “Done in X.Xs” message appears and a close button (X) is added to the modal. The modal does not auto-dismiss — you close it when you are ready to review results.

On success, a green checkmark is appended to the Fit button. If an error occurs, the error message is displayed in the trace log and a notification appears.

Fit Log

Every fit (success or failure) writes a log file to your output folder:

Downloading Data

After fitting, download an Excel file with predictions and diagnostics. This output is used in Step 7 (RCA) to assign a CQA (Condition/Quality/Appeal) rating to the subject property. The output is sorted by residual_sf and cqa_sf to help you assess where the subject falls in the ranking. If the model is good quality, then the properties should be ranked from lowest appealing to most appealing based on residual features that did not go into the regression. The middle value should be approximately 0, the lower half negative values and the upper half positive values. You should find the worst quality homes, or “fixers” near the bottom of the ranking and the nicest homes at the top. There will usually be exceptions for anomalies such as foreclosures, short sales, probate (inheritance related) sales, and quick sales needed for job change or other reasons. Investigation of anomalies usually turns up a pertinent reason for the price anomaly.

The button label adapts to the purpose mode:

The filename format is <datafile>_modified_<YYYYMMDD_HHMMSS>.xlsx.

Output Columns

For each target variable, the following columns are appended:

Column Description
est_<target> Model prediction, e.g. est_sale_price (1 decimal place)
residual Actual minus predicted (1 dp)
cqa Comparative Quality Analysis score (2 dp, range 0–10)
residual_sf Residual divided by living area (if designated, 1 dp)
cqa_sf CQA calculated from ranking via residual_sf (2 dp)
<variable>_contribution Per-g-function contribution value (1 dp, one column per g-function)
basis Intercept value contribution, same for all properties (1 dp)
calc_residual Actual minus (basis + all contributions) — verification column (1 dp)

For multi-target models, a _<i> suffix is added to distinguish columns for each target.

Column Ordering & Excel Formatting

The ranking columns are placed at the leftmost position in the output file, in this order: residual_sf, cqa_sf, residual, cqa. This makes it easy to scan the sorted list and evaluate where the subject falls in the CQA distribution.

These columns have Excel number formatting applied:

Column Format Example
residual_sf Numeric, 2 decimal places 12.34
cqa_sf Numeric, 2 decimal places 7.25
residual Numeric, 0 decimal places 15,200
cqa Numeric, 2 decimal places 6.80

CQA Scores

The CQA (Comparative Quality Analysis) score ranks each row’s residual against all other residuals. For a given row, the CQA is the percentage of rows with a smaller signed residual, multiplied by 10. This produces a 0–10 scale where:

When a living_area column is designated, cqa_sf provides the same ranking based on per-square-foot residuals.

Sorting & Subject Row

In appraisal and market modes, comparable rows are sorted by residual_sf descending (or residual if no living area is designated). The subject row (row 1) remains in position 1 and is not sorted. In general mode, rows are exported in their original order.

This sorting, combined with the leftmost ranking columns, allows the appraiser to quickly scan the comparables from most over-predicted to most under-predicted, and assess where the subject’s assigned CQA score falls in that distribution.

Factor levels in the prediction data are aligned with the training data before calling predict(). Rows with unseen factor levels will produce NA predictions.

mgcvUI Auto-Export

On every successful fit with degree \(\leq\) 2, earthUI automatically saves the full result object as an .rds file to the Project Output Folder. The filename follows the pattern <datafile>_earthUI_result_<YYYYMMDD_HHMMSS>.rds.

This file can be loaded by mgcvUI (a companion Shiny app for GAM modeling) using readRDS(). mgcvUI uses the earth model’s knot locations and basis functions as starting points for GAM smooth terms, enabling a seamless transition from MARS to GAM modeling.

Models with degree > 2 are skipped because mgcvUI only supports pairwise interactions. A manual Export for mgcvUI button is also available in the sidebar for on-demand export.

glmnetUI Import

The same .rds file saved for mgcvUI can also be imported into glmnetUI (a companion Shiny app for elastic net regression). In glmnetUI, navigate to Section 2: Import from earthUI and use the Browse button to select the .rds file.

glmnetUI uses the standard earth::model.matrix() approach: the earth model’s basis functions (hinges, interactions, and linear terms) become the columns of glmnet’s design matrix. glmnet then performs regularized regression on these basis columns, selecting and shrinking them via lasso/elastic net. This combines earth’s adaptive basis construction with glmnet’s regularization.

Key Benefits

Important: Weight Column Must Match

Workflow

  1. Fit an earth model in earthUI (any degree).
  2. The .rds result file is saved automatically to the Project Output Folder.
  3. In glmnetUI, open Section 2: Import from earthUI and browse to the .rds file.
  4. Verify that the same data file and same weight column are used in both apps.
  5. Configure glmnet parameters in Section 5 (the Interaction Matrix is automatically disabled when an earthUI import is active).
  6. Click Fit Model. The earth basis columns appear in the Equation, Coefficients, and Contributions tabs.

RCA Calculations & Downloading

The Calculate RCA Adjustments & Download button appears in sidebar section 7, visible only in appraisal mode after a model has been fitted. RCA (Reconciliation by Comparable Adjustment) produces market-derived, per-comparable adjustments relative to the subject property.

Opening the RCA Dialog

Clicking the button opens a small modal dialog with:

CQA Score Interpolation

When you click Generate, earthUI interpolates the subject’s residual from the comparable CQA/residual pairs:

  1. The comparables’ CQA scores and residuals are sorted by CQA ascending
  2. Linear interpolation (stats::approx()) maps your entered CQA value to a residual
  3. If using CQA per SF, the per-SF residual is converted back to a total residual by multiplying by the subject’s living area
  4. The subject’s estimated value is computed as: model prediction + interpolated residual

Output Columns

The RCA Excel file (<datafile>_adjusted_<YYYYMMDD_HHMMSS>.xlsx) includes all intermediate output columns plus:

Column Description
subject_value Model prediction + interpolated residual (row 1 only)
subject_cqa The CQA score you entered (row 1 only)
<variable>_adjustment Subject contribution minus comp contribution (per g-function)
residual_adjustment Subject residual minus comp residual
net_adjustments Sum of all adjustments (contribution + residual)
gross_adjustments Sum of absolute values of all adjustments
adjusted_sale_price Comp sale price + net adjustments

The adjustment columns tell you, for each comparable, how much the model attributes the difference to each variable. adjusted_sale_price is the comparable’s sale price after applying all model-derived adjustments — a set of adjusted sale prices that should cluster around the subject’s estimated value.

Sales Comparison Grid

Overview & Purpose

The Sales Comparison Grid is an Excel workbook generated in appraisal mode (sidebar section 8) after computing RCA adjustments. It presents the subject property alongside selected comparables in a structured grid format, with Excel formulas that automatically compute adjustments and an adjusted sale price for each comparable.

The grid is designed for the appraiser’s workfile — it combines the regression-derived adjustments from the Earth model with editable cells where the appraiser can allocate the CQA residual to specific property features. The output filename is SalesGrid_<YYYYMMDD_HHMMSS>.xlsx.

Comp Selection Modal

Clicking “Generate Sales Grid & Download” opens a modal dialog where you select which comparables to include:

After confirming your selection, the selected comps are sorted by gross adjustment percentage ascending, so the most similar comparables appear on Sheet 1.

Grid Layout

Each sheet has a 20-column layout accommodating the subject property plus 3 comparables. The columns for each entity are:

The rows from top to bottom are:

  1. Title — sheet name (e.g., “Intermediate Sales Comparable Grid — Sheet 1 of 3”)
  2. Headers — “Subject” and comp column headers with address
  3. Address — full property address
  4. APN | MLS# | DOM | Subj.Prox — parcel number, listing ID, days on market, and Haversine distance (miles) from subject
  5. Sales Price | Concess. | Net SP — sale price, concessions, and Net Sale Price formula
  6. Regression Features header row
  7. BASE VALUE — the model intercept
  8. Date of Sale | OffMkt | OnMkt — contract date, sale age, and DOM
  9. Grouped rows (conditional) — Location, Site Size, and/or Age groups
  10. Model variable rows — one row per predictor (excluding grouped variables)
  11. Blank separator row
  12. Residual Features header row
  13. CQA|Residual — CQA score + remaining residual formula
  14. Residual feature rows — named + blank rows for appraiser entry
  15. Total VC / Net Adjustment
  16. Net Adjustment %
  17. Gross Adjustment %
  18. Adjusted Sale Price — formula row
  19. Copyright footer

Sale Price, Concessions & Net SP

Row 5 shows three values for each comparable:

The subject column shows “N/A” for sale price (since it is unknown) and any concessions value if available.

Grouped Rows (Location, Site, Age)

When certain special types are designated and those variables appear in the model, earthUI creates grouped rows that combine related variables:

Loc: Long | Lat | Area — Appears when any of longitude, latitude, or area are in the model. Shows factual values for each constituent variable, a combined Value Contribution (sum of the individual VCs), and for comps, an adjustment (subject combined VC \(-\) comp combined VC). Styled with light blue background on VC cells.

Site Size | Dimensions — Appears when lot_size or site_dimensions are in the model. Same structure as the Location group.

Actual Age | Effective Age — Appears when actual_age or effective_age are in the model. Same structure.

Variables consumed by grouped rows are excluded from the individual model variable rows below, preventing double-counting in the adjustment totals.

Model Variable Rows

Below the grouped rows (or directly below BASE VALUE if no grouped rows), one row per remaining model predictor shows:

CQA|Residual & Residual Feature Rows

The CQA|Residual row shows each property’s CQA score and contains a formula for the remaining residual — the portion of the residual not yet allocated to specific features. The formula is:

\[\text{Remaining Residual} = \text{Total Residual} - \sum(\text{Residual Feature VCs below})\]

Below this are residual feature rows — 5 named rows (Location, View/Appeal, Condition, Quality, Other) plus 6 blank rows. These are input cells where the appraiser enters value contributions for features not captured by the model. As the appraiser fills in values, the Remaining Residual formula automatically decreases.

For each comp, the adjustment column contains a formula: subject feature VC \(-\) comp feature VC.

Adjusted Sale Price Formula

The Adjusted Sale Price row contains Excel formulas:

Sheet Protection & Editable Cells

Each sheet is protected to prevent accidental modification of formulas and data. The protection locks:

The only unlocked (editable) cells are the residual feature Value Contribution inputs — the cells under the CQA|Residual row where the appraiser enters breakdowns. These are styled with a light yellow background to indicate they are editable.

Working with the Grid in Excel

  1. Open the file in Excel or a compatible spreadsheet application.
  2. Review the regression adjustments — the model-derived adjustments are pre-populated and locked.
  3. Allocate the residual — in the residual feature rows (yellow cells), enter your assessment of how much of the remaining residual is attributable to each feature (Location, View, Condition, Quality, etc.). The Remaining Residual formula will decrease as you allocate.
  4. Check the Adjusted Sale Price — this formula automatically updates as you enter residual feature values.
  5. Multiple sheets — if you selected more than 3 comps, navigate between sheets. Each sheet has the same subject in the left column with 3 different comps.

Downloading Reports

After fitting a model, the Download Report section (sidebar section 7, 8, or 9 depending on the purpose mode) lets you generate a comprehensive formatted report.

Report Formats

Three formats are available via the format dropdown:

Reports are rendered via Quarto and saved directly to the Project Output Folder as <datafile>_report_<YYYYMMDD_HHMMSS>.<format>. Rendering runs in the background — a modal dialog shows elapsed time and Quarto progress while the app remains responsive. A notification confirms the file location on completion.

All operations (model fitting, data downloads, RCA calculations, sales grid generation, and report rendering) are logged with start/end timestamps and elapsed times to <datafile>_earthui_log.txt in the output folder, for troubleshooting and performance monitoring.

Report Contents

Every report includes the following sections:

  1. Dataset Description — number of observations, target variable(s), predictors used, categorical predictors
  2. Model Specification — degree, cross-validation status, number of terms, predictors in model
  3. Allowed Interactions (degree \(\geq\) 2 only) — the interaction matrix with checkmarks, formatted for the output type
  4. Results Summary — R\(^2\), GRSq, GCV, RSS, and CV R\(^2\) (if cross-validation was enabled)
  5. Model Equation — the complete Earth model equation in LaTeX notation
  6. Coefficients & Basis Functions — table of all terms, their coefficients, and the basis functions that define them
  7. Variable Importance — bar chart and ranked table of predictor importance scores
  8. g-Function Contributions — plots for each g-function: line plots for univariate terms, 3D perspective and contour plots for bivariate interactions
  9. Correlation Matrix — heatmap of predictor correlations
  10. Diagnostics — Residuals vs Fitted, Normal Q-Q, and Actual vs Predicted plots
  11. ANOVA Decomposition — table showing each basis function’s contribution to RSS
  12. Earth Output — raw text output from earth::summary(), including pruning pass details

For multi-response models, sections 4–8 and 10 are repeated for each target variable.

Demo Dataset: Appraisal_1.csv

earthUI includes a demo MLS dataset for exploring the appraisal workflow. Load it programmatically with:

demo_file <- system.file("extdata", "Appraisal_1.csv", package = "earthUI")
df <- import_data(demo_file)

Or import it directly through the Shiny app file upload.

Description

The file contains 1,502 residential sales (plus 1 subject property in row 1) from a simulated MLS export. The data represents single-family home sales in a multi-area market with a range of property sizes, ages, and locations.

This is not real data, but is based on a realistic neighborhood in Northern California. All identification information has been altered or removed.

Columns

Suggested Quick Start

References

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1): 1–141. https://doi.org/10.1214/aos/1176347963.
Milborrow, Stephen. 2024. “Earth: Multivariate Adaptive Regression Splines.” October 1. https://CRAN.R-project.org/package=earth.
Milborrow, Stephen. n.d.-a. “Notes on the Earth Package.” http://www.milbo.org/doc/earth-notes.pdf.
Milborrow, Stephen. n.d.-b. “Variance Models in Earth.” http://www.milbo.org/doc/earth-varmod.pdf.