earthUI: An Interactive Interface for the earth (MARS) Package

William Bert Craytor

Abstract

earthUI is a graphical user interface for the R earth package, which implements Multivariate Adaptive Regression Splines (MARS). It offers three purpose modes—general predictive modeling, real-estate appraisal, and market-area analysis—and guides the user through data import, model configuration, fitting, and the interpretation of earth’s diagnostics and graphical output. This article documents earthUI’s data-format requirements, modeling workflow, output displays, and complete feature reference.

Introduction

What Is earthUI?

earthUI is a graphical user interface for the R earth package, which implements Multivariate Adaptive Regression Splines (MARS) (Friedman 1991). The modelling engine is Stephen Milborrow’s earth package (Milborrow 2024), documented in detail by Milborrow (n.d.-a) and Milborrow (n.d.-b). It runs as a local Shiny application — there is no login, no server, and no accounts. You launch it from R, import a dataset (CSV or Excel), configure your model, and fit it interactively.

The application provides a complete workflow: data import, variable configuration, model fitting, diagnostic plots, variable importance, model equations, and downloadable reports in HTML, Word, or PDF format.

Three Purpose Modes

When you launch earthUI, a Purpose radio button at the top of the sidebar lets you choose one of three modes:

General — Earth regression for any type of population or dataset. This is the default mode. It provides the full MARS modeling workflow without any domain-specific additions.
For Appraisal — Earth regression tailored for real estate appraisal. Adds features specific to single-property valuation, including subject property handling, special column designations, Reconciliation by Comparable Adjustment (RCA), and Sales Comparison Grid generation.
Market Area Analysis — Earth regression tailored for market area studies. Adds features for analyzing groups of properties in a defined market, with an optional “Skip first row” checkbox for subject property exclusion.

In all three modes, the core modeling engine is identical — you are always fitting an Earth (MARS) model. The purpose setting controls which additional tools and interface elements are available.

Switching purposes clears all state. When you change the purpose radio, earthUI resets to a clean default state: imported data, model results, tabs, variable configuration, and earth parameters are all cleared. You must re-import your file after switching. Previously saved settings for that file and purpose combination will then be automatically restored from your last session.

Real Estate–Specific Features

When either For Appraisal or Market Area Analysis is selected, earthUI activates several features designed for real estate analysis:

Special column designations — Each predictor can be tagged with a special role such as contract_date, dom, concessions, latitude, longitude, living_area, lot_size, actual_age, effective_age, area, site_dimensions, or display_only. These designations control how the column is handled during fitting, output, and Sales Grid generation. See Chapter 6 for the complete reference.
Rounding of latitude and longitude — Columns designated as latitude or longitude are automatically rounded to 3 decimal places to prevent overfitting.
Sale Age column — When a column is designated as contract_date and an Effective Date is provided, earthUI computes a sale_age column (days between sale date and effective date) and substitutes it as a predictor.
RCA computations (appraisal only) — In appraisal mode, after fitting the model earthUI can compute Reconciliation by Comparable Adjustment (RCA) output, which produces per-comparable adjustments, net/gross adjustment summaries, and an adjusted sale price for the subject property.
Sales Comparison Grid (appraisal only) — Generates a multi-sheet Excel workbook with formulas for comparable adjustments, residual feature breakdowns, and adjusted sale prices. See Chapter 11.

Getting Started

To use earthUI:

Install the package in R: install.packages("earthUI") or install from source.
Launch the application: run earthUI::launch() in R, or from the command line run Rscript -e 'earthUI::launch()'. The app opens in your web browser on port 7878. You can also access the app directly by navigating to http://localhost:7878 in your browser.
Import your data using the file upload in Section 1 of the sidebar. earthUI accepts CSV and Excel files.
Select your Purpose (General, For Appraisal, or Market Area Analysis).
Configure variables — choose your target(s) and predictors, set data types, and assign any special column roles.
Set Earth parameters — degree, nprune, subset filters, and other earth() arguments.
Fit the model — click “Fit Earth Model” and review the results in the main panel.
Export — download predictions as Excel, generate reports, or (in appraisal mode) compute RCA adjustments and Sales Comparison Grids.

Settings are automatically persisted in your browser’s local storage and restored when you reload the same input file.

MLS Input Data Requirements

For real estate appraisal and market analysis workflows, your input data typically comes from a Multiple Listing Service (MLS) export. This chapter describes the expected file structure and the columns that earthUI can use.

File Format & Structure

earthUI accepts CSV and Excel (.xlsx, .xls) files. On import, column names are automatically converted to snake_case — for example, “Living SqFt” becomes living_sqft, “Contract Date” becomes contract_date, and “Sale Price” becomes sale_price. This normalization ensures consistent column references throughout the workflow. The CSV separator and decimal mark used during import are determined by the locale settings (see Chapter 3, “Locale & Regional Settings”).

Your data file should be a flat table with one row per property and one column per attribute. The first row of the file must contain column headers.

Required Columns for Appraisal Mode

While earthUI works with any set of columns, the full appraisal workflow (RCA adjustments + Sales Comparison Grid) benefits from having the following columns in your MLS export:

Spreadsheet column names can be in a foreign language — the “special” names are in English so that the R program can give them special treatment. Otherwise, the given column names show up in the regression models, graphs, and (if doing appraisals) the Intermediate Sales Grid.

Not all columns are required. earthUI adapts — if a column is missing, the corresponding feature is simply omitted. For example, if no concessions column is designated, the Net SP row in the Sales Grid shows Sale Price without a concessions deduction. However, for real estate pricing models certain columns are highly recommended to achieve acceptable fit:

Sale Age — the number of days between the contract sale date and the effective date of the appraisal or analysis. If multi-year sales history is being used, especially for periods over 5 years, sale_age often plays a central role in estimating the sale price. In fact it is often so important that without it, earthUI fails to provide any model at all.
Living Area — also goes by names such as “Living Sqft,” “GLA” (gross living area) and so on. This is another leading determinant of sale price.
Total Bath Count — the total number of full, quarter, half, and 3/4 bathrooms. For example, two full baths and one half-bath would be a value of 2.5.
Garage Bays or Garage Area — the number of garage spaces or the garage square footage.
Lot Size — the land area of the property, typically in square feet or acres.
Longitude, Latitude, and if available Area ID. Adjustments for these will be combined under a single Location adjustment in the Sales Grid.

Special Column Naming Conventions

earthUI identifies columns by their special type designation, not by their column name. You can name your columns anything you like in the MLS export — what matters is that you assign the correct special type in the Variable Configuration table (Chapter 6).

For example, your MLS might export living area as “GLA”, “Living SqFt”, “liv_area”, or “gross_living_area”. After import (where it becomes snake_case), you simply designate it as living_area in the Special dropdown. earthUI will then use it for per-SF residual calculations and Sales Grid grouping regardless of its original name.

Data Quality & Completeness

Missing values (NA): Rows with NA values in any predictor or target column are automatically removed before fitting. The na.action is always set to na.fail internally, with NA removal handled before the call to earth().
Date columns: Dates should be in a format R can parse (e.g., 2025-06-15, 06/15/2025, June 15, 2025). earthUI auto-detects date columns when at least 80% of values parse successfully.
Numeric columns: Sale price, living area, lot size, concessions, and similar fields must be numeric. If your MLS exports prices with currency symbols or thousands separators (e.g., “$350,000” or “350.000,00”), you may need to clean these before import.
Factor columns: Categorical variables like area ID, style, or condition should contain a manageable number of unique values. earthUI auto-detects factors but you can override the detection.

Subject Row Placement

In Appraisal mode, row 1 must be the subject property. All remaining rows are comparable sales. The subject’s sale price can be left blank (NA) or set to any value — earthUI treats it as NA during fitting regardless.

In Market Area Analysis mode, placing the subject in row 1 is optional. If present, check “Skip first row (subject property)” to exclude it from fitting.

In General mode, there is no special row handling — all rows are treated equally.

General Purpose Mode

Overview

General Purpose mode is the default when you launch earthUI. It provides the complete MARS modeling workflow for any dataset — not just real estate. You can use earthUI for scientific data, financial analysis, engineering studies, or any regression problem where you want to explore non-linear relationships and interactions between variables.

In General mode, the interface omits the real estate–specific features (special columns, sale age, coordinate rounding, RCA). The sidebar is streamlined to focus on variable selection, parameter configuration, model fitting, and export.

The sidebar is organized into numbered, collapsible sections that guide you from data import through export:

1. Import Data — File upload accepting CSV and Excel files. For Excel files with multiple sheets, a sheet selector appears. Column names are automatically converted to snake_case.

2. Project Output Folder — A text field specifying where downloads and fit logs are saved (defaults to ~/Downloads).

3. Variable Configuration — Target variable selector (supports multiple targets), predictor table with checkboxes for Include, Factor, and Linear. See Chapter 6 for full details.

4. Earth Call Parameters — All arguments to the earth() function: degree, penalty, nk, pruning method, cross-validation, and more. See Chapter 7 for the complete parameter reference.

5. Fit Earth Model — A single green button that runs the model asynchronously. See Chapter 8 for fitting details.

6. Download Output — Exports predictions, residuals, CQA scores, and per-g-function contributions as an Excel file. See Chapter 9.

7. Download Report — Generates a formatted report (HTML, Word, or PDF) saved to the output folder. See Chapter 12.

Main Panel Tabs

After fitting, the main panel provides nine tabs:

Settings Persistence

earthUI automatically saves your configuration to the browser’s local storage, keyed by both the input filename and the current purpose mode. When you reload the same file under the same purpose, all settings are restored: target selection, predictor checkboxes, data types, earth parameters, and response weights. This means the same file can have different configurations for General, Appraisal, and Market modes. Settings are also backed up to an SQLite database so they persist across browser sessions. A Reset to Defaults button clears all saved settings for the current purpose.

Dark Mode

Click the moon/sun icon in the upper-right corner to toggle between light and dark themes. The theme preference is saved in local storage and persists across sessions.

Locale & Regional Settings

earthUI supports international number, date, and CSV formatting conventions through a country-based locale system. The Country dropdown in Section 1 of the sidebar (below the file upload) selects a preset for 31 supported countries. Each preset configures:

CSV separator — comma (,) for US/UK/Japan or semicolon (;) for most of Europe, where the comma is used as a decimal mark.
Decimal mark — period (.) or comma (,).
Thousands separator — comma (US/UK/Japan), period (Germany/Italy/Spain), space (Finland/France/Poland/Baltics/Ukraine/Russia), or apostrophe (Switzerland).
Date format — MM/DD/YYYY (US), DD/MM/YYYY (most of Europe), or YYYY-MM-DD (Sweden/Lithuania/Japan/Canada).
Paper size — Letter (US/Canada/Mexico) or A4 (everywhere else).

Supported Countries

Override Dropdowns

Below the country selector, four override dropdowns let you change individual settings without switching countries:

Paper — Letter or A4 (affects PDF report page size)
CSV sep — comma or semicolon (used when importing CSV files)
Decimal — period or comma (used in number display on plots and axes)
Date — MDY, DMY, or YMD (controls the order in which date formats are tried when parsing date columns)

When you change the country, all overrides reset to that country’s defaults. Changing an override only affects that one setting.

Saving Defaults

Click Save as my default to store your locale preferences globally. These defaults apply to all future sessions regardless of which data file you load. Per-file settings (target, predictors, parameters) are saved separately in the browser’s local storage, but locale defaults persist across all files via an SQLite database. This two-level approach is designed for organizations like audit firms that work with data from multiple countries — set your most common country as the default, then override per-file when needed.

Appraisal Mode

When you select For Appraisal as the Purpose, earthUI configures itself for single-property valuation. All features described in Chapter 3 remain available; this chapter covers only the appraisal-specific additions.

Subject Row Handling

In appraisal mode, row 1 of your dataset is the subject property and all remaining rows are comparable sales. Your input file must be organized accordingly (see Chapter 2). The subject’s sale price can be left blank or set to any value — earthUI automatically treats it as NA during fitting.

After importing, the Data tab splits into two sections: Subject Property (row 1) and Comparable Sales (rows 2+). Row 1 is always excluded from model fitting — the notification “Skipping row 1 (subject). Fitting on N rows.” confirms this. After fitting, the model still generates predictions for the subject row, shown as est_<target> in the output.

Effective Date & Sale Age

In appraisal and market modes, an Effective Date field appears in the Variable Configuration section (defaulting to today’s date). If you designate a column as contract_date in the Special column dropdown, earthUI computes a sale_age column — the number of integer days between each sale’s contract date and the effective date. This column replaces the original date column as a predictor.

The first time you click Fit after designating a contract date, earthUI creates the sale_age column and notifies you to click Fit again to include it.

Special Column Designations

In appraisal and market modes, a Special dropdown appears for each predictor in the Variable Configuration table. The complete list of special types and their effects:

Only one column per special type is allowed (except display_only). Assigning a special to a second column automatically clears it from the first. A small blue badge appears next to the variable name showing its assigned special type.

RCA Adjustments Overview

The Calculate RCA Adjustments & Download button (sidebar section 7, visible only in appraisal mode after fitting) computes market-derived adjustments for each comparable relative to the subject. The full RCA workflow is described in Chapter 10.

After computing RCA adjustments, the Generate Sales Grid & Download button (sidebar section 8) becomes available. The Sales Grid workflow is described in Chapter 11.

Market Area Analysis Mode

When you select Market Area Analysis as the Purpose, earthUI provides the same real estate–specific features as appraisal mode (special columns, sale age, coordinate rounding) but is oriented toward analyzing a group of properties rather than valuing a single subject.

Differences from Appraisal Mode

Skip first row is optional — a checkbox labeled “Skip first row (subject property)” appears below the Purpose selector. When checked, row 1 is excluded from fitting (like appraisal mode). When unchecked, all rows are included.
No RCA or Sales Grid sections — the “Calculate RCA Adjustments & Download” and “Generate Sales Grid & Download” steps are not available. Market mode focuses on model fitting and output, not per-comparable adjustments.
Sidebar numbering — without the RCA and Sales Grid steps, the Download Report section is numbered 7 instead of 9.

When to Use Market Mode

Market Area Analysis mode is appropriate when you are:

Building a regression model for a neighborhood or market area to understand value drivers
Analyzing how variables like square footage, age, lot size, and location affect sale prices across a group of properties
Preparing support for a market conditions analysis or neighborhood delineation
Working with a dataset that includes a subject property in row 1 but you want the option of including or excluding it from the fit

Variable Selection

Section 3 of the sidebar — Variable Configuration — is where you choose which columns participate in the model and how they are treated.

Target Variable(s)

The Target (response) variable(s) dropdown at the top of Section 3 lists every column in your dataset. Select one column for a standard single-response model, or multiple columns for a multi-response model. When multiple targets are selected, the model fits all responses simultaneously using earth(cbind(y1, y2, ...) ~ .).

Columns selected as targets are automatically excluded from the predictor list.

The Predictor Table

Below the target selector, a table lists every remaining column with the following fields:

Column	Description
Variable	Column name (full name shown in tooltip if truncated). In appraisal/market modes, a blue badge shows the assigned special type.
Type	Data type dropdown: `numeric`, `integer`, `character`, `logical`, `factor`, `Date`, `POSIXct`
Inc?	Checkbox — include this column as a predictor in the model
Special	Dropdown (appraisal/market only) — see Special Column Types Reference below
Factor	Checkbox — treat this column as a categorical variable
Linear	Checkbox — force linear entry only (no hinge functions)
NAs	Count of missing values. Shown in red when more than 30% of values are missing

A hint line above the table explains the abbreviations: “Type = column data type, Inc = include as predictor, Factor = treat as categorical, Linear = linear-only (no hinges).”

Data Type Detection & Overrides

earthUI automatically detects data types on import. Numeric, integer, logical, factor, and date columns are recognized. Character columns that look like dates (at least 80% of values parse against common date formats) are classified as Date.

You can override any detection by changing the Type dropdown. When you change a column to character or factor, the Factor checkbox is automatically checked. Changing types affects how the column is passed to the earth() function.

Factor and Linear Flags

Factor — When checked, the column is treated as a categorical variable. This is appropriate for columns like style, area_id, or grade that represent discrete groups rather than continuous measurements. Factor columns enter the model as indicator (dummy) variables.
Linear — When checked, the column is forced to enter the model linearly — no hinge (piecewise-linear) functions are created for it. This is useful when you know a variable has a strictly linear relationship with the target.

Special Column Types Reference

In appraisal and market modes, the Special dropdown provides the following options. Each type can be assigned to at most one column (except display_only, which allows multiple):

Date & Time Types:

contract_date — Triggers automatic sale_age computation. The original date column is replaced by an integer column measuring days between the sale date and the Effective Date.
listing_date — Used as a fallback for computing Days on Market (DOM = contract date $-$ listing date) when no explicit dom column is designated.
dom — Identifies the Days on Market column. Displayed in the Sales Grid’s APN row and Date of Sale row.

Monetary Types:

concessions — Identifies sale concessions (seller credits, buyer incentives, etc.). Used in the Sales Grid to compute Net Sale Price: Net SP = Sale Price $-$ Concessions.

Size & Location Types:

latitude — Values are automatically rounded to 3 decimal places to prevent overfitting. Used for Haversine proximity calculations (distance from subject to each comp) and grouped in the Location row of the Sales Grid.
longitude — Same rounding treatment as latitude. Used for proximity and the Location group.
area — Typically a neighborhood or area identifier. Grouped with latitude and longitude in the “Loc: Long | Lat | Area” row of the Sales Grid.
living_area — Enables per-square-foot residual calculations (residual_sf and cqa_sf) in the download output.
lot_size — Grouped in the “Site Size | Dimensions” row of the Sales Grid.
site_dimensions — Grouped with lot size in the Sales Grid (e.g., “75x120”).

Age Types:

actual_age — Grouped in the “Actual Age | Effective Age” row of the Sales Grid.
effective_age — Grouped with actual age in the Sales Grid.

Display Types:

display_only — The column is included in Excel exports but excluded from model fitting entirely. Use this for address fields, MLS numbers, parcel IDs, or other reference data that should not be a predictor. Multiple columns can have this designation.

Multiple Targets

Selecting more than one target variable fits a multi-response Earth model. When multiple targets are selected:

The model predicts all responses simultaneously, sharing basis functions across targets
The wp (response weights) button becomes active, allowing you to assign a numeric weight to each target
Variance models (varmod.method) are disabled (not supported for multi-response)
Results tabs display per-response metrics, equations, and diagnostic plots

Parameter Selection

Section 4 of the sidebar — Earth Call Parameters — provides access to all arguments accepted by the earth() function. Each parameter has a blue help icon (?) with a tooltip explanation. Parameters are organized into collapsible subsections.

Basic Parameters

Parameter	Default	Description
subset	(empty)	Row filter expression. See “Subset Filtering” below.
weights	NULL	Column selector for case (row) weights. Only numeric columns are listed.
wp	NULL	Response weights for multi-target models. Button opens a dialog with one numeric input per target (default 1.0 each).
keepxy	off	Retain x, y, subset, and weights in the model object.
trace	0	Trace level for fitting output (0 through 5).
glm	none	Optional GLM family: `gaussian`, `binomial`, or `poisson`.
degree	1	Maximum interaction order. Setting degree $\geq$ 2 auto-enables cross-validation and reveals the Allowed Interactions matrix.
penalty	2	GCV penalty per knot. Higher values produce simpler models.

Forward Pass

Parameter	Default	Description
nk	auto	Maximum terms before pruning. Default: min(200, max(20, 2$\times$predictors)) + 1.
thresh	0.001	Forward-step threshold. Smaller values allow more terms.
minspan	0 (auto)	Minimum span between knots. Negative values set the maximum knots per predictor.
endspan	0 (auto)	End span — minimum distance from a knot to the edge of the data.
newvar.penalty	0	Penalty for introducing a new variable (encourages reuse of existing predictors).
fast.k	20	Number of parent terms to consider in the fast MARS algorithm.
fast.beta	1	Controls the fast MARS aging factor.

Allowed Interactions

When degree is set to 2 or higher, an interaction matrix appears below the basic parameters. This is an upper-triangular grid of checkboxes, one for each predictor pair. A checked box means the two predictors are allowed to interact; unchecking it forbids that specific interaction.

Clicking a predictor name (row or column label) toggles all interactions for that variable. Allow All and Clear All checkboxes at the top provide bulk control. The matrix uses sticky headers so column and row labels remain visible when scrolling.

An info alert reminds you: “Interaction terms increase the risk of overfitting. Cross-validation has been enabled (10-fold).”

Pruning

Parameter	Default	Description
pmethod	backward	Pruning method: `backward`, `none`, `exhaustive`, `forward`, `seqrep`, or `cv`.
nprune	NULL	Maximum terms after pruning. Leave empty for automatic selection.

Cross-Validation

Parameter	Default	Description
nfold	10	Number of cross-validation folds. Set to 0 to disable CV.
ncross	20	Number of cross-validation repetitions.
stratify	on	Stratify CV samples so each fold has a similar response distribution.

When degree $\geq$ 2, cross-validation is automatically enabled (nfold set to 10) to help guard against overfitting from interactions.

Subset Filtering

The subset text input accepts an R expression that filters which rows are used for model fitting. You can type an expression directly (e.g., sale_age < 365 & area_id == 460) or use the Build filter… button to construct one visually.

Build Filter Dialog — Click “Build filter…” to open a guided dialog. Each condition row has a column dropdown, operator (<, >, <=, >=, ==, !=), and a value input that adapts to the column type: numeric input for numbers, date picker for dates, and dropdown of unique values for character/factor columns. Conditions are joined with AND (&) or OR (|) connectors. A preview at the bottom shows the expression and how many rows match. Click Apply to insert the expression into the text input.

Date columns must use as.Date("...") or as.POSIXct("...") wrappers in manual expressions. The Build Filter dialog handles this automatically.

Subset filtering is non-destructive — excluded rows remain in the dataset and receive predictions in the Excel export.

Recommended Parameter Values

earthUI displays a recommended value below key parameters in Section 4. These recommendations update reactively based on the number of fitting rows ($n$) and selected predictors ($p$). The formulas are derived from Friedman’s MARS paper, earth’s internal algorithms, and empirical testing.

Forward Pass Parameters

nk (max terms before pruning):

\[\text{nk} = \min\!\bigl(100,\; \max(21,\; 2p + 1,\; \lfloor n/10 \rfloor)\bigr)\]

$n$	$p{=}5$	$p{=}10$
30	21	21
50	21	21
100	21	21
200	21	21
500	50	50
1000	100	100
1500	100	100

Earth’s default, $\min(200, \max(20, 2p)) + 1$, does not account for dataset size and can be too small for large datasets, constraining the forward pass before it explores all predictors adequately. The $\lfloor n/10 \rfloor$ term allows approximately one term per 10 observations — a standard rule of thumb for avoiding overparameterization. The cap of 100 prevents diminishing returns.

minspan (minimum observations between knots):

\[\text{minspan} = \min\!\bigl(16,\; \lfloor 5 + n/50 \rfloor\bigr)\]

endspan (minimum observations from data boundaries):

\[\text{endspan} = \min\!\bigl(16,\; \lfloor 5 + n/28 \rfloor\bigr)\]

$n$	minspan	endspan
30	5	6
50	6	6
100	7	8
200	9	12
300	11	15
500	15	16
1500	16	16

Earth’s auto-calculated minspan uses Friedman’s equation 43: $\lfloor(-\ln(-\ln 0.95) + \ln(p \cdot n)) / (2.5 \ln 2)\rfloor$, which scales as $\ln(np)$ and grows too slowly for large datasets. At $n{=}1500$ it yields only ${\approx}7$, giving roughly 245 candidate knot locations per continuous predictor — too many, allowing the forward pass to fit noise. The recommended formula targets approximately 100 candidates per predictor for large $n$, while deferring to earth’s auto-calculation for small $n$ (where it is well-tested). Endspan is set slightly larger than minspan to provide additional boundary protection, which helps when data has thin tails (common in real estate).

penalty (GCV penalty per knot):

\[\text{penalty} = \begin{cases} 3 & \text{if degree} > 1 \\ 2 & \text{if degree} = 1 \end{cases}\]

This is earth’s own default — 2 for additive models, 3 for interaction models (the higher penalty compensates for the larger search space).

newvar.penalty (penalty for introducing a new predictor):

Recommended: 0.1 when predictors are correlated (e.g., living area with bedroom count, bathroom count, lot size). This biases the forward pass toward reusing predictors already in the model rather than introducing correlated alternatives. The mechanism: each new predictor’s RSS improvement is multiplied by $1/(1 + \text{penalty})$. With $\text{newvar.penalty} = 0.1$, a new variable must improve RSS by at least 10% more than another knot on an existing variable. This produces simpler models with fewer predictors without affecting final coefficient estimates (the penalty is removed after selection). Leave at 0 if predictors are not correlated.

Pruning Parameters

pmethod: Recommended: backward. The default GCV-based backward pruning is deterministic — the same data always produces the same model, with no dependence on random seeds. This is important for reproducibility, especially in appraisal work. The cv method uses cross-validation for pruning which introduces seed dependence.

nprune: Recommended: leave empty (NULL). Let GCV select the optimal number of terms. Setting nprune imposes a hard cap that overrides GCV’s judgment.

Cross-Validation Parameters

nfold (CV folds):

\[\text{nfold} = \min\!\bigl(15,\; \max(10,\; \lfloor n/100 \rfloor)\bigr)\]

$n$	nfold
30	10
100	10
500	10
1000	10
1500	15

With pmethod = "backward", cross-validation does not affect the model — it only computes the diagnostic CVR and provides residuals for the variance model. The floor of 10 and cap of 15 reflect this: enough folds for a stable diagnostic without unnecessary computation.

ncross (CV repetitions):

\[\text{ncross} = \max\!\bigl(3,\; \lceil 100/n \rceil\bigr)\]

$n$	ncross	total residuals
30	4	120
50	3	150
100	3	300
1500	3	4500

The variance model (varmod.method = "lm") fits on the cross-validation residuals. The formula targets at least 100 total residuals ($n \times \text{ncross}$) for a stable variance estimate, with a floor of 3 (required by earth when a variance model is enabled).

Variance Model

varmod.method: Recommended: lm. This fits prediction intervals using a linear regression of absolute CV residuals on the predicted response. It requires nfold > 0 and ncross >= 3.

Earth Output File

After every successful fit, earthUI writes <filename>_earth_output_<timestamp>.txt to the output folder. This file contains the model terms, summary statistics (R, GRSq, CVRSq), variance model details, and trace log. One file is created per fit, providing a cumulative record for comparing parameter configurations.

Settings Defaults

A radio button at the top of Section 4 controls which defaults are loaded:

Use last settings for input file — restores the settings you used last time with this particular file
Use default settings — applies your saved custom defaults
Earth defaults — resets all parameters to factory values

Click Save current as default to store the current parameter configuration as your personal defaults.

Fitting the Earth Model

The Fit Button

Section 5 of the sidebar contains a single green button: Fit Earth Model. This button is always visible (not inside a collapsible section). Before fitting, earthUI validates your configuration — a target must be selected and at least one predictor must be included. In appraisal/market modes, latitude and longitude columns are rounded, and sale age is computed from the effective date and contract date (if designated).

When you click Fit, a dark modal overlay appears with:

Header — “Fitting Earth Model” with an elapsed timer counting seconds
Trace log — a scrollable, monospace text area showing real-time output from the earth() function: dataset info, forward pass progress, cross-validation folds, and completion time
Color coding — default lines in green, cross-validation lines in yellow, errors in red

earthUI fits models asynchronously using callr::r_bg(), which runs the computation in a separate R process. This keeps the application responsive during long-running fits. A 300ms polling observer reads output from the background process and streams it to the trace log.

When fitting completes, a “Done in X.Xs” message appears and a close button (X) is added to the modal. The modal does not auto-dismiss — you close it when you are ready to review results.

On success, a green checkmark is appended to the Fit button. If an error occurs, the error message is displayed in the trace log and a notification appears.

Fit Log

Every fit (success or failure) writes a log file to your output folder:

Filename: <datafile>_earth_log_<YYYYMMDD_HHMMSS>.txt
Contents: timestamp, all trace output from the fitting process, and any error messages
Location: the Project Output Folder specified in Section 2 (defaults to ~/Downloads)

Downloading Data

After fitting, download an Excel file with predictions and diagnostics. This output is used in Step 7 (RCA) to assign a CQA (Condition/Quality/Appeal) rating to the subject property. The output is sorted by residual_sf and cqa_sf to help you assess where the subject falls in the ranking. If the model is good quality, then the properties should be ranked from lowest appealing to most appealing based on residual features that did not go into the regression. The middle value should be approximately 0, the lower half negative values and the upper half positive values. You should find the worst quality homes, or “fixers” near the bottom of the ranking and the nicest homes at the top. There will usually be exceptions for anomalies such as foreclosures, short sales, probate (inheritance related) sales, and quick sales needed for job change or other reasons. Investigation of anomalies usually turns up a pertinent reason for the price anomaly.

The button label adapts to the purpose mode:

General/Market: Download Output (Excel)
Appraisal: Download Intermediate Output (Excel)

The filename format is <datafile>_modified_<YYYYMMDD_HHMMSS>.xlsx.

Output Columns

For each target variable, the following columns are appended:

Column	Description
`est_<target>`	Model prediction, e.g. `est_sale_price` (1 decimal place)
`residual`	Actual minus predicted (1 dp)
`cqa`	Comparative Quality Analysis score (2 dp, range 0–10)
`residual_sf`	Residual divided by living area (if designated, 1 dp)
`cqa_sf`	CQA calculated from ranking via residual_sf (2 dp)
`<variable>_contribution`	Per-g-function contribution value (1 dp, one column per g-function)
`basis`	Intercept value contribution, same for all properties (1 dp)
`calc_residual`	Actual minus (basis + all contributions) — verification column (1 dp)

For multi-target models, a _<i> suffix is added to distinguish columns for each target.

Column Ordering & Excel Formatting

The ranking columns are placed at the leftmost position in the output file, in this order: residual_sf, cqa_sf, residual, cqa. This makes it easy to scan the sorted list and evaluate where the subject falls in the CQA distribution.

These columns have Excel number formatting applied:

Column	Format	Example
`residual_sf`	Numeric, 2 decimal places	12.34
`cqa_sf`	Numeric, 2 decimal places	7.25
`residual`	Numeric, 0 decimal places	15,200
`cqa`	Numeric, 2 decimal places	6.80

CQA Scores

The CQA (Comparative Quality Analysis) score ranks each row’s residual against all other residuals. For a given row, the CQA is the percentage of rows with a smaller signed residual, multiplied by 10. This produces a 0–10 scale where:

High CQA ($\approx$ 9–10) — the property sold for much more than the model predicted (large positive residual)
Low CQA ($\approx$ 0–1) — the property sold for much less than predicted (large negative residual)
CQA $\approx$ 5 — the residual is near the median of all residuals

When a living_area column is designated, cqa_sf provides the same ranking based on per-square-foot residuals.

Sorting & Subject Row

In appraisal and market modes, comparable rows are sorted by residual_sf descending (or residual if no living area is designated). The subject row (row 1) remains in position 1 and is not sorted. In general mode, rows are exported in their original order.

This sorting, combined with the leftmost ranking columns, allows the appraiser to quickly scan the comparables from most over-predicted to most under-predicted, and assess where the subject’s assigned CQA score falls in that distribution.

Factor levels in the prediction data are aligned with the training data before calling predict(). Rows with unseen factor levels will produce NA predictions.

mgcvUI Auto-Export

On every successful fit with degree $\leq$ 2, earthUI automatically saves the full result object as an .rds file to the Project Output Folder. The filename follows the pattern <datafile>_earthUI_result_<YYYYMMDD_HHMMSS>.rds.

This file can be loaded by mgcvUI (a companion Shiny app for GAM modeling) using readRDS(). mgcvUI uses the earth model’s knot locations and basis functions as starting points for GAM smooth terms, enabling a seamless transition from MARS to GAM modeling.

Models with degree > 2 are skipped because mgcvUI only supports pairwise interactions. A manual Export for mgcvUI button is also available in the sidebar for on-demand export.

glmnetUI Import

The same .rds file saved for mgcvUI can also be imported into glmnetUI (a companion Shiny app for elastic net regression). In glmnetUI, navigate to Section 2: Import from earthUI and use the Browse button to select the .rds file.

glmnetUI uses the standard earth::model.matrix() approach: the earth model’s basis functions (hinges, interactions, and linear terms) become the columns of glmnet’s design matrix. glmnet then performs regularized regression on these basis columns, selecting and shrinking them via lasso/elastic net. This combines earth’s adaptive basis construction with glmnet’s regularization.

Key Benefits

Interpretable interactions: earth’s hinge-based interactions (including 3-way terms) have clear geometric meaning, unlike glmnet’s native cross-product interactions which are difficult to explain in court or audit settings.
Automatic nonlinearity: earth’s hinge functions capture nonlinear relationships that a standard glmnet model with raw predictors would miss.
Variable selection: glmnet can further prune the earth basis, dropping hinge terms that don’t contribute.

Important: Weight Column Must Match

Workflow

Fit an earth model in earthUI (any degree).
The .rds result file is saved automatically to the Project Output Folder.
In glmnetUI, open Section 2: Import from earthUI and browse to the .rds file.
Verify that the same data file and same weight column are used in both apps.
Configure glmnet parameters in Section 5 (the Interaction Matrix is automatically disabled when an earthUI import is active).
Click Fit Model. The earth basis columns appear in the Equation, Coefficients, and Contributions tabs.

RCA Calculations & Downloading

The Calculate RCA Adjustments & Download button appears in sidebar section 7, visible only in appraisal mode after a model has been fitted. RCA (Reconciliation by Comparable Adjustment) produces market-derived, per-comparable adjustments relative to the subject property.

Opening the RCA Dialog

Clicking the button opens a small modal dialog with:

Score type — radio buttons to choose CQA or CQA per SF (the SF option is available only when a living_area column is designated). If you choose “CQA per SF,” then based on the CQA score you assign the subject, its residual score will be its living area times the residual_sf that matches the given CQA_SF score.
CQA value — numeric input for the subject’s CQA score (0.00–9.99, default 5.00). This score represents where you believe the subject falls in the quality distribution of the comparables.
Generate button — computes the RCA output and downloads it as Excel

CQA Score Interpolation

When you click Generate, earthUI interpolates the subject’s residual from the comparable CQA/residual pairs:

The comparables’ CQA scores and residuals are sorted by CQA ascending
Linear interpolation (stats::approx()) maps your entered CQA value to a residual
If using CQA per SF, the per-SF residual is converted back to a total residual by multiplying by the subject’s living area
The subject’s estimated value is computed as: model prediction + interpolated residual

Output Columns

The RCA Excel file (<datafile>_adjusted_<YYYYMMDD_HHMMSS>.xlsx) includes all intermediate output columns plus:

Column	Description
`subject_value`	Model prediction + interpolated residual (row 1 only)
`subject_cqa`	The CQA score you entered (row 1 only)
`<variable>_adjustment`	Subject contribution minus comp contribution (per g-function)
`residual_adjustment`	Subject residual minus comp residual
`net_adjustments`	Sum of all adjustments (contribution + residual)
`gross_adjustments`	Sum of absolute values of all adjustments
`adjusted_sale_price`	Comp sale price + net adjustments

The adjustment columns tell you, for each comparable, how much the model attributes the difference to each variable. adjusted_sale_price is the comparable’s sale price after applying all model-derived adjustments — a set of adjusted sale prices that should cluster around the subject’s estimated value.

Sales Comparison Grid

Overview & Purpose

The Sales Comparison Grid is an Excel workbook generated in appraisal mode (sidebar section 8) after computing RCA adjustments. It presents the subject property alongside selected comparables in a structured grid format, with Excel formulas that automatically compute adjustments and an adjusted sale price for each comparable.

The grid is designed for the appraiser’s workfile — it combines the regression-derived adjustments from the Earth model with editable cells where the appraiser can allocate the CQA residual to specific property features. The output filename is SalesGrid_<YYYYMMDD_HHMMSS>.xlsx.

Clicking “Generate Sales Grid & Download” opens a modal dialog where you select which comparables to include:

Recommended comps are pre-checked. These are comparables with a gross adjustment percentage below 25% of sale price, sorted by gross adjustment percentage ascending (smallest adjustments first).
Additional comps are listed below, unchecked by default. These have larger gross adjustments but are still available for selection.
A maximum of 30 comps can be selected, producing up to 10 sheets (3 comps per sheet).

After confirming your selection, the selected comps are sorted by gross adjustment percentage ascending, so the most similar comparables appear on Sheet 1.

Grid Layout

Each sheet has a 20-column layout accommodating the subject property plus 3 comparables. The columns for each entity are:

The rows from top to bottom are:

Title — sheet name (e.g., “Intermediate Sales Comparable Grid — Sheet 1 of 3”)
Headers — “Subject” and comp column headers with address
Address — full property address
APN | MLS# | DOM | Subj.Prox — parcel number, listing ID, days on market, and Haversine distance (miles) from subject
Sales Price | Concess. | Net SP — sale price, concessions, and Net Sale Price formula
Regression Features header row
BASE VALUE — the model intercept
Date of Sale | OffMkt | OnMkt — contract date, sale age, and DOM
Grouped rows (conditional) — Location, Site Size, and/or Age groups
Model variable rows — one row per predictor (excluding grouped variables)
Blank separator row
Residual Features header row
CQA|Residual — CQA score + remaining residual formula
Residual feature rows — named + blank rows for appraiser entry
Total VC / Net Adjustment
Net Adjustment %
Gross Adjustment %
Adjusted Sale Price — formula row
Copyright footer

Sale Price, Concessions & Net SP

Row 5 shows three values for each comparable:

Sale Price — the comparable’s sale price from the data
Concessions — the value from the column designated as concessions (0 if not designated)
Net SP — an Excel formula: Sale Price $-$ Concessions

The subject column shows “N/A” for sale price (since it is unknown) and any concessions value if available.

Grouped Rows (Location, Site, Age)

When certain special types are designated and those variables appear in the model, earthUI creates grouped rows that combine related variables:

Loc: Long | Lat | Area — Appears when any of longitude, latitude, or area are in the model. Shows factual values for each constituent variable, a combined Value Contribution (sum of the individual VCs), and for comps, an adjustment (subject combined VC $-$ comp combined VC). Styled with light blue background on VC cells.

Site Size | Dimensions — Appears when lot_size or site_dimensions are in the model. Same structure as the Location group.

Actual Age | Effective Age — Appears when actual_age or effective_age are in the model. Same structure.

Variables consumed by grouped rows are excluded from the individual model variable rows below, preventing double-counting in the adjustment totals.

Model Variable Rows

Below the grouped rows (or directly below BASE VALUE if no grouped rows), one row per remaining model predictor shows:

Factual values for subject and each comp (the actual data values)
Value Contribution (VC) — the g-function’s contribution to the predicted value for that row
Adjustment (comps only) — subject VC minus comp VC

CQA|Residual & Residual Feature Rows

The CQA|Residual row shows each property’s CQA score and contains a formula for the remaining residual — the portion of the residual not yet allocated to specific features. The formula is:

\[\text{Remaining Residual} = \text{Total Residual} - \sum(\text{Residual Feature VCs below})\]

Below this are residual feature rows — 5 named rows (Location, View/Appeal, Condition, Quality, Other) plus 6 blank rows. These are input cells where the appraiser enters value contributions for features not captured by the model. As the appraiser fills in values, the Remaining Residual formula automatically decreases.

For each comp, the adjustment column contains a formula: subject feature VC $-$ comp feature VC.

Adjusted Sale Price Formula

The Adjusted Sale Price row contains Excel formulas:

Subject: Sum of all Value Contribution cells from BASE VALUE through the last residual feature row. This represents the model’s total prediction for the subject plus any appraiser-allocated residual features.
Comps: Net SP + sum of all Adjustment cells from the first grouped or variable row through the last residual feature row. This gives the comparable’s sale price after all regression-derived and appraiser-entered adjustments.

Sheet Protection & Editable Cells

Each sheet is protected to prevent accidental modification of formulas and data. The protection locks:

All formula cells (Net SP, remaining residual, adjustments, adjusted sale price, etc.)
All data cells (addresses, sale prices, factual values, value contributions from the model)
Labels and headers

The only unlocked (editable) cells are the residual feature Value Contribution inputs — the cells under the CQA|Residual row where the appraiser enters breakdowns. These are styled with a light yellow background to indicate they are editable.

Working with the Grid in Excel

Open the file in Excel or a compatible spreadsheet application.
Review the regression adjustments — the model-derived adjustments are pre-populated and locked.
Allocate the residual — in the residual feature rows (yellow cells), enter your assessment of how much of the remaining residual is attributable to each feature (Location, View, Condition, Quality, etc.). The Remaining Residual formula will decrease as you allocate.
Check the Adjusted Sale Price — this formula automatically updates as you enter residual feature values.
Multiple sheets — if you selected more than 3 comps, navigate between sheets. Each sheet has the same subject in the left column with 3 different comps.

Downloading Reports

After fitting a model, the Download Report section (sidebar section 7, 8, or 9 depending on the purpose mode) lets you generate a comprehensive formatted report.

Report Formats

Three formats are available via the format dropdown:

HTML — an .html file with KaTeX math rendering, embedded images, and a table of contents. Uses the Flatly Bootstrap theme and Roboto Condensed font.
Word — a .docx file suitable for editing and distribution. Includes a table of contents, page numbers, and uses a custom reference template for consistent styling.
PDF — typeset with LuaLaTeX for professional-quality output. Paper size follows the locale setting (Letter or A4). Includes landscape pages for large interaction matrices and uses Roboto Condensed with Latin Modern Math for equations.

Reports are rendered via Quarto and saved directly to the Project Output Folder as <datafile>_report_<YYYYMMDD_HHMMSS>.<format>. Rendering runs in the background — a modal dialog shows elapsed time and Quarto progress while the app remains responsive. A notification confirms the file location on completion.

All operations (model fitting, data downloads, RCA calculations, sales grid generation, and report rendering) are logged with start/end timestamps and elapsed times to <datafile>_earthui_log.txt in the output folder, for troubleshooting and performance monitoring.

Report Contents

Every report includes the following sections:

Dataset Description — number of observations, target variable(s), predictors used, categorical predictors
Model Specification — degree, cross-validation status, number of terms, predictors in model
Allowed Interactions (degree $\geq$ 2 only) — the interaction matrix with checkmarks, formatted for the output type
Results Summary — R$^2$, GRSq, GCV, RSS, and CV R$^2$ (if cross-validation was enabled)
Model Equation — the complete Earth model equation in LaTeX notation
Coefficients & Basis Functions — table of all terms, their coefficients, and the basis functions that define them
Variable Importance — bar chart and ranked table of predictor importance scores
g-Function Contributions — plots for each g-function: line plots for univariate terms, 3D perspective and contour plots for bivariate interactions
Correlation Matrix — heatmap of predictor correlations
Diagnostics — Residuals vs Fitted, Normal Q-Q, and Actual vs Predicted plots
ANOVA Decomposition — table showing each basis function’s contribution to RSS
Earth Output — raw text output from earth::summary(), including pruning pass details

For multi-response models, sections 4–8 and 10 are repeated for each target variable.

Demo Dataset: Appraisal_1.csv

earthUI includes a demo MLS dataset for exploring the appraisal workflow. Load it programmatically with:

demo_file <- system.file("extdata", "Appraisal_1.csv", package = "earthUI")
df <- import_data(demo_file)

Or import it directly through the Shiny app file upload.

Description

The file contains 1,502 residential sales (plus 1 subject property in row 1) from a simulated MLS export. The data represents single-family home sales in a multi-area market with a range of property sizes, ages, and locations.

This is not real data, but is based on a realistic neighborhood in Northern California. All identification information has been altered or removed.

Columns

Suggested Quick Start

References

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1): 1–141. https://doi.org/10.1214/aos/1176347963.

Milborrow, Stephen. 2024. “Earth: Multivariate Adaptive Regression Splines.” October 1. https://CRAN.R-project.org/package=earth.

Milborrow, Stephen. n.d.-a. “Notes on the Earth Package.” http://www.milbo.org/doc/earth-notes.pdf.

Milborrow, Stephen. n.d.-b. “Variance Models in Earth.” http://www.milbo.org/doc/earth-varmod.pdf.

Introduction

What Is earthUI?

Three Purpose Modes

Real Estate–Specific Features

Getting Started

MLS Input Data Requirements

File Format & Structure

Required Columns for Appraisal Mode

Special Column Naming Conventions

Data Quality & Completeness

Subject Row Placement

General Purpose Mode

Overview

The Sidebar Workflow

Main Panel Tabs

Settings Persistence

Dark Mode

Locale & Regional Settings

Supported Countries

Override Dropdowns

Saving Defaults

Appraisal Mode

Subject Row Handling

Effective Date & Sale Age

Special Column Designations

RCA Adjustments Overview

Market Area Analysis Mode

Differences from Appraisal Mode

When to Use Market Mode

Variable Selection

Target Variable(s)

The Predictor Table

Data Type Detection & Overrides

Factor and Linear Flags

Special Column Types Reference

Multiple Targets

Parameter Selection

Basic Parameters

Forward Pass

Allowed Interactions

Pruning

Cross-Validation

Subset Filtering

Recommended Parameter Values

Forward Pass Parameters

Pruning Parameters

Cross-Validation Parameters

Variance Model

Earth Output File

Settings Defaults

Fitting the Earth Model

The Fit Button

Fitting Modal & Trace Output

Fit Log

Downloading Data

Output Columns

Column Ordering & Excel Formatting

CQA Scores

Sorting & Subject Row

mgcvUI Auto-Export

glmnetUI Import

Key Benefits

Important: Weight Column Must Match

Workflow

RCA Calculations & Downloading

Opening the RCA Dialog

CQA Score Interpolation

Output Columns

Sales Comparison Grid

Overview & Purpose

Comp Selection Modal

Grid Layout

Sale Price, Concessions & Net SP

Grouped Rows (Location, Site, Age)

Model Variable Rows

CQA|Residual & Residual Feature Rows

Adjusted Sale Price Formula

Sheet Protection & Editable Cells

Working with the Grid in Excel

Downloading Reports