mgcvUI is a graphical user interface for the R mgcv
package, which fits Generalized Additive Models (GAMs) using penalized
regression splines with automatic smoothness selection. It offers three
purpose modes—general predictive modeling, real-estate appraisal, and
market-area analysis—and guides the user through data import,
smooth-term specification, model fitting, diagnostic and effect plots,
and downloadable reports. This article documents mgcvUI’s data-format
requirements, modeling workflow, output displays, and complete feature
reference.
Generalized Additive Models (GAMs) were formalized by Trevor Hastie and Robert Tibshirani in their 1990 monograph. GAMs extend the linear model by replacing each linear term \(\beta_j x_j\) with a smooth function \(f_j(x_j)\), so the model becomes:
\[y = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_p(x_p) + \varepsilon\]
Each \(f_j\) is estimated from the data using a penalized regression spline. The smooth functions allow the model to capture nonlinear relationships without the user specifying a functional form in advance. The penalty controls the wiggliness of each smooth, preventing overfitting while allowing enough flexibility to track real patterns.
The R package (Mixed GAM Computation Vehicle) was developed by Simon Wood at the University of Bath (Wood 2017). First released in 2001, it is now the standard R implementation of GAMs and ships with every R installation. Its underlying methods are described in a series of papers covering stable multiple smoothing parameter estimation (Wood 2004), thin-plate regression splines (Wood 2003), fast stable REML estimation (Wood 2011), and general smooth model selection (Wood et al. 2016). Key features include:
mgcvUI, earthUI, and glmnetUI are companion applications that use different modeling engines. The following table summarizes the key differences:
MARS (Multivariate Adaptive Regression Splines) was introduced by Jerome Friedman in 1991. It builds piecewise linear models by adaptively selecting hinge functions and their knot positions from the data. The R implementation is the package by Stephen Milborrow.
Elastic net regression, implemented in the package by Friedman, Hastie, and Tibshirani (2010), combines Lasso (\(L_1\)) and ridge (\(L_2\)) penalties for simultaneous variable selection and coefficient shrinkage.
A powerful workflow combines earthUI’s automatic knot discovery with mgcvUI’s smooth estimation. When earthUI exports an result file and mgcvUI imports it:
This pipeline bridges exploratory modelling (earth) with confirmatory modelling (GAM), combining the best of both approaches.
For real estate appraisal and similar applications requiring interpretable, defensible models:
mgcvUI is a graphical user interface for the R package. It runs as a local Shiny application — there is no login, no server, and no accounts. You launch it from R, import a dataset (CSV or Excel), configure your model, and fit a GAM interactively.
The application provides a complete workflow: data import, variable configuration with smooth term specification, model fitting with background processing, diagnostic plots, smooth partial effect curves, and downloadable reports in Word, PDF, or HTML format.
Generalized Additive Models replace the linear relationship \(\beta x\) with a smooth function \(f(x)\) for each predictor. This means:
When you launch mgcvUI, a Purpose radio button at the top of the sidebar lets you choose one of three modes:
In all three modes, the core modeling engine is identical — you are always fitting a GAM via . The purpose setting controls which additional tools and interface elements are available.
When either For Appraisal or Market Area Analysis is selected, mgcvUI activates several features designed for real estate analysis:
contract_date,
dom, concessions, latitude,
longitude, living_area, lot_size,
actual_age, effective_age, area,
site_dimensions, or display_only. These
designations control how the column is handled during fitting and
output.contract_date and an Effective Date is provided, mgcvUI
computes a sale_age column (days between sale date and
effective date) and substitutes it as a predictor.To use mgcvUI:
install.packages("mgcvUI") or install from source.mgcvUI::mgcvUI() in R. The app opens in your web browser on
port 7880. The app remembers your last-used purpose mode and restores it
automatically..rds file to seed the GAM with
earth-discovered knots. Skip this step if you do not have an earthUI
result.Settings are automatically persisted in a SQLite database and restored when you reload the same input file.
For real estate appraisal and market analysis workflows, your input data typically comes from a Multiple Listing Service (MLS) export. This chapter describes the expected file structure and the columns that mgcvUI can use.
mgcvUI accepts CSV and Excel
(.xlsx, .xls) files. On import, column names
are automatically converted to snake_case — for example,
“Living SqFt” becomes living_sqft, “Contract Date” becomes
contract_date, and “Sale Price” becomes
sale_price. This normalization ensures consistent column
references throughout the workflow. The CSV separator and decimal mark
used during import are determined by the locale settings (see Chapter 3,
“Locale & Regional Settings”).
Your data file should be a flat table with one row per property and one column per attribute. The first row of the file must contain column headers.
While mgcvUI works with any set of columns, the full appraisal workflow benefits from having the following columns:
Spreadsheet column names can be in a foreign language — the “special” names are in English so that the R program can give them special treatment. Otherwise, the given column names show up in the regression models, graphs and, if doing appraisals, the output reports.
Not all columns are required. mgcvUI adapts — if a column is missing, the corresponding feature is simply omitted. However, for real estate pricing models certain columns are highly recommended:
sale_age often plays a central role.mgcvUI identifies columns by their special type designation, not by their column name. You can name your columns anything you like in the MLS export — what matters is that you assign the correct special type in the Variable Configuration table (Chapter 6).
In Appraisal mode, row 1 must be the subject property. All remaining rows are comparable sales. The subject row is excluded from model fitting. After fitting, the model still generates predictions for the subject row.
In Market Area Analysis mode, placing the subject in row 1 is optional.
In General mode, there is no special row handling — all rows are treated equally.
General Purpose mode is the default when you launch mgcvUI. It provides the complete GAM workflow for any dataset — not just real estate. You can use mgcvUI for scientific data, financial analysis, engineering studies, or any regression problem where smooth nonlinear relationships are expected.
In General mode, the interface omits the real estate–specific features (special columns, sale age, RCA). The sidebar is streamlined to focus on variable selection, parameter configuration, model fitting, and export.
The sidebar is organized into numbered, collapsible sections that guide you from data import through export:
1. Import Data — File upload accepting CSV and Excel files. For Excel files with multiple sheets, a sheet selector appears. Column names are automatically converted to snake_case.
2. Import from earthUI (optional) — Import an earthUI result file to seed the GAM with earth-discovered knot positions. This step is optional — skip it if you do not have an earthUI result to import.
3. Project Output Folder — A text field specifying
where downloads are saved (defaults to ~/Downloads).
4. Variable Configuration — Target variable selector, response transform (none, log, log10), predictor table with checkboxes for Include, Factor, and Linear. The Special column appears only in Appraisal and Market modes. See Chapter 6 for full details.
5. Mgcv Call Parameters — All model configuration: parameter presets, family, method, gamma, cross-validation, select, basis type, k, tensor type, interaction matrix, and advanced parameters. See Chapter 7 for the complete parameter reference.
6. Fit Mgcv GAM Model — The button that runs the model.
7. Download Output — Exports predictions, residuals, CQA scores, and per-variable contributions as an Excel file. Available in all purpose modes.
8. Download Report — Generates a formatted report (Word, PDF, or HTML) saved to the output folder. Steps 8–10 (RCA Adjustments, Sales Grid, Report) appear only in Appraisal/Market modes; in General mode the report step is numbered 8.
Section 2 of the sidebar — Import from earthUI — lets you import an earthUI result file. This enables the earth–mgcv pipeline: earth’s data-driven knot positions become anchor points for GAM smooth terms.
When an earthUI result is imported, mgcvUI:
A Clear button removes the imported earth data and resets to standalone mode.
After data import, the main panel provides the following tabs (model-dependent tabs populate after fitting):
mgcvUI automatically saves your configuration to an SQLite database, keyed by the input filename. When you reload the same file, all settings are restored: target selection, predictor checkboxes, data types, mgcv parameters, and interaction matrix. The last-used purpose mode is also persisted globally and restored when the app is relaunched.
Click the moon/sun icon in the upper-right corner to toggle between Nord Light and Nord Dark themes. The theme preference is saved in localStorage and persists across sessions. All UI elements adapt to the selected theme.
mgcvUI supports international number and CSV formatting conventions through a country-based locale system. The Settings dropdown in the title bar provides Country and Paper selectors for 30+ supported countries. Each preset configures:
,) for
US/UK/Japan or semicolon (;) for most of Europe..) or comma
(,).Click Save as my default to store your locale preferences globally.
When you select For Appraisal as the Purpose, mgcvUI configures itself for single-property valuation. All features described in Chapter 3 remain available; this chapter covers only the appraisal-specific additions.
In appraisal mode, row 1 of your dataset is the subject property and all remaining rows are comparable sales. Your input file must be organized accordingly (see Chapter 2). The subject’s sale price can be left blank or set to any value — mgcvUI automatically treats it as NA during fitting.
After fitting, the model generates predictions for the subject row, which is the basis for the RCA adjustment workflow.
In appraisal and market modes, an Effective Date
field appears in the Variable Configuration section (defaulting to
today’s date). If you designate a column as contract_date
in the Special dropdown, mgcvUI computes a sale_age column
— the number of integer days between each sale’s contract date and the
effective date. This column is added as a predictor.
When the Effective Date changes, sale_age is
automatically recomputed.
In appraisal and market modes, a Special dropdown appears for each predictor in the Variable Configuration table. See Chapter 6 for the complete reference of special types and their effects.
The Calculate RCA Adjustments & Download button (visible only in appraisal mode after fitting) computes market-derived adjustments for each comparable relative to the subject. The full RCA workflow is described in Chapter 11.
When you select Market Area Analysis as the Purpose, mgcvUI provides the same real estate–specific features as appraisal mode (special columns, sale age) but is oriented toward analyzing a group of properties rather than valuing a single subject.
Market Area Analysis mode is appropriate when you are:
Section 4 of the sidebar — Variable Configuration — is where you choose which columns participate in the model and how they are treated.
The Target (response) variable dropdown at the top of Section 4 lists every numeric column in your dataset. Select one column as the response variable (e.g., sale price). The target column is automatically excluded from the predictor list.
Below the target selector, a Response Transform dropdown offers three options:
When a log transform is selected, values \(\leq 0\) in the response are automatically filtered out. All output (predictions, contributions, residuals) is automatically back-transformed to the original scale.
Below the target selector, a scrollable table lists every remaining column. Checkbox columns use rotated vertical headers for compactness. The column order is:
Factor vs. Smooth vs. Linear: By default, included numeric variables get a smooth term \(f(x)\). Checking Factor creates a categorical term (one coefficient per level). Checking Linear creates a simple linear term \(\beta x\) instead of a smooth. Variables marked as both Factor and Linear are treated as Factor.
mgcvUI automatically detects data types on import. Numeric, integer, logical, factor, and date columns are recognized. You can override any detection by changing the Type dropdown. Changing types affects how the column is treated in the model.
In appraisal and market modes, the Special dropdown provides the following options:
Weighting:
weights — Observation weight column (only one allowed;
rows with weight = 0 are excluded from fitting)Date & Time Types:
contract_date — Triggers automatic
sale_age computation from the Effective Datelisting_date — Used as a fallback for computing Days on
Marketdom — Identifies the Days on Market columnMonetary Types:
concessions — Identifies sale concessions (seller
credits, buyer incentives, etc.)Size & Location Types:
latitude — Values automatically rounded to 3 decimal
placeslongitude — Same rounding treatment as latitudearea — Market area or neighborhood identifierliving_area — Enables per-square-foot residual
calculations (residual_sf and cqa_sf)lot_size — Site size columnsite_dimensions — Grouped with lot sizeAge Types:
actual_age — Property age columneffective_age — Effective property ageDisplay Types:
display_only — Column is included in Excel exports but
excluded from model fitting entirely. Use this for address fields, MLS
numbers, parcel IDs, or other reference data.Section 5 of the sidebar — Mgcv Call Parameters — provides access to all configuration options for the GAM. Each parameter has a blue help icon (?) with a tooltip explanation.
A dropdown at the top offers two presets that configure sensible defaults for common workflows:
The Earth Pipeline preset is automatically selected when earthUI knots are imported. The cubic regression spline () basis is required for earth knot integration because it allows specifying exact knot positions.
Choose the distribution family for your response variable:
| Family | Use Case |
|---|---|
| gaussian | Continuous responses (e.g., sale price). Most common. |
| Gamma | Positive continuous responses with variance proportional to the mean. |
| poisson | Count data (e.g., number of sales). |
| binomial | Binary outcomes. |
| inverse.gaussian | Highly right-skewed positive data. |
The smoothing parameter estimation method:
| Method | Description |
|---|---|
| REML | Restricted Maximum Likelihood (default, recommended). Most robust against overfitting. |
| GCV.Cp | Generalized Cross-Validation. Tends to select slightly more complex models. |
| ML | Maximum Likelihood. Similar to REML but can undersmooth. |
| P-REML | Pearson REML variant. |
| P-ML | Pearson ML variant. |
| GACV.Cp | Generalized Approximate Cross-Validation. |
The gamma parameter multiplies the effective degrees of freedom in the smoothing criterion, encouraging smoother (less wiggly) fits. Default is 1.2.
| Value | Effect |
|---|---|
| 1.0 | Standard smoothing |
| 1.2–1.4 | Slightly smoother than default (recommended) |
| 2.0+ | Much smoother — good for noisy data or small samples |
Higher gamma values guard against overfitting by penalizing complexity more heavily. The Earth Pipeline preset uses 1.4 for additional smoothing when refining earth’s knots.
When checked (default), mgcvUI runs 10-fold cross-validation after fitting to compute a CV R-squared. This provides an honest estimate of out-of-sample predictive power.
When checked (default), adds an extra penalty to each smooth term that can shrink it to zero. This enables automatic variable selection — unimportant smooth terms are effectively removed from the model. Implemented via in .
The spline basis function type for smooth terms:
| Basis | Full Name | Description |
|---|---|---|
| tp | Thin plate regression spline | Default. Isotropic (no knot placement needed). Good general-purpose choice. |
| cr | Cubic regression spline | Allows explicit knot placement. Required for earth knot integration. |
| ps | P-spline | Evenly-spaced B-spline basis with difference penalty. Computationally efficient. |
| bs | B-spline | Flexible B-spline basis. |
The basis dimension \(k\) controls the maximum wiggliness of each smooth. A value of 0 (the default) means “automatic” — mgcvUI uses \(k = 10\) or the number of earth knots, whichever is appropriate.
For interactions between continuous variables, two tensor product types are available:
| Type | Function | Description |
|---|---|---|
| ti | Tensor interaction — models only the interaction effect, with main effects handled separately by univariate terms. Preferred for interpretability. | |
| te | Tensor product — models the entire joint effect including main effects. Harder to decompose for RCA adjustments. |
A collapsible Allowed Interactions section displays an upper-triangular checkbox matrix for all included smooth (non-linear, non-factor) predictors. Each checkbox enables a tensor product smooth between the corresponding variable pair.
When earthUI knots are imported, earth-detected interactions are pre-checked and locked (highlighted with a yellow background). If the earth model used degree = 1 (no interactions), an informational message is shown.
A separate Factor-by-Smooth Interactions matrix appears when both factor and smooth variables are included. Each checkbox creates a term — a separate smooth curve for each factor level.
A collapsible “Advanced” section exposes additional settings:
outer/newton (default),
outer/bfgs, or efs (extended
Fellner-Schall).Section 6 of the sidebar contains the Fit Mgcv GAM Model button. Clicking it runs the model with your current configuration.
When you click Fit, mgcvUI:
When the package is available, model fitting runs in a background process so the app remains responsive. A dark-themed terminal-style progress overlay shows:
If is not available, fitting runs synchronously (the app freezes until fitting completes).
After successful fitting, a white checkmark appears on the Fit button and a status line shows: “R-sq = X, CV R-sq = X, Dev = X%, AIC = X, n = X.”
Shows the imported data as an interactive DataTable with horizontal scrolling and 15 rows per page.
Displays the fitted GAM model in two parts:
Model equation (MathJax-rendered):
Smooth Function Definitions table:
Six metric cards displayed across the top:
Below the cards:
A horizontal bar chart showing the relative importance of each model term:
Below the chart, a DataTable shows Term, Type, EDF, Statistic, and p-value, sorted by statistic descending.
Every model term that contributes to the predicted value has an interactive plotly contribution plot, displayed in a bordered card. The tab shows plots for all term types:
For log-transformed models, the y-axis is back-transformed to dollar contributions using the formula \(\bar{y} \times (e^{f(x)} - 1)\), making the curves directly interpretable in dollar terms.
For latitude/longitude variables, the slope is shown per 0.001 degrees rather than per degree, since that is a more meaningful geographic increment.
One colored line per factor level, overlaid on a single plot. Useful for seeing how a predictor’s effect varies across groups (e.g., different neighborhoods).
Heatmap showing the 2D contribution surface. Color scale: red (negative) \(\to\) white (zero) \(\to\) blue (positive). Hover shows both variable values and the contribution amount.
A heatmap matrix showing Pearson correlations among all numeric predictors. Available immediately after data import — no model fitting required.
Two sets of diagnostic visualizations:
Diagnostic panel (via ): Four plots — residuals vs fitted, Q-Q plot, histogram of residuals, and response vs fitted values.
Actual vs Predicted scatter plot: Observed vs predicted values with a 45-degree dashed reference line. For log-transformed models, values are back-transformed to the original scale.
Three histogram plots displayed after RCA computation (appraisal/market mode only):
Each histogram shows mean, median, and standard deviation in the subtitle, with dashed reference lines for mean and median.
A combined ANOVA table merging parametric and smooth terms from the GAM summary. Shows Term, Type (parametric or smooth), and all associated statistics.
Raw text output showing:
When earthUI knots are imported, this tab compares the direction of each variable in the earth model with its direction in the GAM:
A status line shows “All directions consistent” or the number of variables with inconsistencies.
Concurvity is the smooth analogue of multicollinearity. This tab shows:
Values above ()0.8 in the “worst” row indicate that a smooth can be well-approximated by the other smooths, suggesting redundancy.
Output from : basis dimension adequacy tests for each smooth term. If a smooth’s \(k\) is too small, this test will flag it with a low p-value, suggesting you should increase \(k\).
After fitting, the Download Output button (sidebar section 7) exports an Excel file with predictions and diagnostics.
In appraisal mode, row 1 (subject) has residual/cqa/cqa_sf set to NA. Rows are sorted by residual_sf descending when a living_area column is designated.
A white checkmark appears on the download button after successful completion.
CQA ranks each row’s residual against all others on a 0–10 scale:
The RCA (Reconciliation by Comparable Adjustment) workflow is available in Appraisal mode only, after fitting the model.
Click the Calculate RCA Adjustments & Download button in sidebar section 8. A modal dialog appears with:
living_area designated). The per-SF option scales the
subject’s residual by living area.Click Generate to compute and download.
A white checkmark appears on the button after successful computation.
The Sales Comparison Grid is available in Appraisal mode only, after computing RCA adjustments. It generates a formatted Excel workbook suitable for inclusion in appraisal reports.
Clicking the Generate Sales Grid & Download button opens a modal dialog listing all comparable sales. Comps are split into two groups:
Select your comps, then click Generate Sales Grid to create the workbook.
Required special types: contract_date
and living_area must be designated. Recommended
special types: latitude, longitude,
and lot_size — the grid will work without them but some
fields will be blank.
The generated workbook contains multiple sheets, each holding 3 comps side by side:
Each sheet is protected to prevent accidental edits, but the residual feature value cells are explicitly unlocked. This allows appraisers to enter values for features not in the model while preserving the formula-driven adjustment calculations.
Three formats are available:
Reports are saved to the output folder specified in Section 3.
Reports include:
gam.check fallback if gratia::appraise fails)
and actual vs predicted scatterEach section is independently error-protected — if one section fails, it shows an inline error message and the rest of the report generates normally.
A white checkmark appears on the Download Report button after successful generation.
mgcvUI can export the fitted GAM’s smooth functions as standalone code in multiple languages. Each smooth is evaluated on a 200-point grid and exported as a lookup table with linear interpolation, allowing the model to be used outside of R without any dependency on .
The Export Functions section (available in the report module) offers checkboxes for R, Python, C++, and SQLite. Click Download Functions (.zip) to generate a zip archive.
File:
File:
File:
File:
mgcvUI, earthUI, and glmnetUI are companion tools for regression modeling. They share the same data format, special column types, RCA workflow, and demo datasets, but use different modeling engines.
All three tools provide:
The demo dataset is shared with earthUI and glmnetUI. It is located in the folder of the mgcvUI source, or if earthUI is installed:
demo_file <- system.file("extdata", "Appraisal_1.csv", package = "earthUI")The file contains 1,502 residential sales (plus 1 subject property in row 1) from a simulated MLS export. The data represents single-family home sales in a multi-area market with a range of property sizes, ages, and locations.
This is not real data, but is based on a realistic neighborhood in Northern California. All identification information has been altered or removed.
mgcvUI::mgcvUI()Appraisal_1.csv via the file uploadsale_price as the targetsale_age, living_sqft,
baths_total, lot_size, area_id
(as factor), age, latitude,
longitude, garage_spacesmgcvUI runs on macOS, Windows, and Linux. The application is developed and primarily tested on macOS with RStudio. Platform-specific notes are provided below.
R \(\geq\) 4.1.0 is required. RStudio Desktop (2023.06 or later) is strongly recommended — it bundles pandoc (needed for HTML/PDF reports) and provides a convenient environment for launching the app.
The following packages are installed automatically as dependencies:
Fewest issues. Homebrew is recommended for system libraries. If PDF reports are needed:
# In R:
install.packages("tinytex")
tinytex::install_tinytex()
Works well with RStudio Desktop. Key notes:
Most variable across distributions. Ubuntu/Debian users may need system libraries before R packages will compile:
sudo apt install libcurl4-openssl-dev libssl-dev \
libxml2-dev libsqlite3-dev libfontconfig1-devFor PDF reports: tinytex::install_tinytex() or install
from the system package manager.
For headless servers (no display), plotly interactive plots render as static fallbacks in reports. The Shiny app itself requires a web browser connection.
mgcvUI is designed to work even when optional components are missing:
No LaTeX installation detected. Run in R:
install.packages("tinytex")
tinytex::install_tinytex()Then restart the app. The PDF option will appear.
The SQLite database directory cannot be created or written to. Check permissions on:
~/Library/Application Support/R/mgcvUI/ or
~/.local/share/R/mgcvUI/%APPDATA%/R/data/mgcvUI/The function occasionally fails on complex models (many tensor interactions, non-standard families). This is a known upstream issue. Workaround: run in the R console to see the diagnostic plots directly.
This was a known issue in earlier versions related to the package. Ensure you have the latest version of mgcvUI installed. The current version catches this error and produces the report with a placeholder for the diagnostics section.
If the app fails to start because port 7880 is occupied (from a previous session), run:
# macOS/Linux:
lsof -ti:7880 | xargs kill
# Windows (PowerShell):
Stop-Process -Id (Get-NetTCPConnection -LocalPort 7880).OwningProcessMissing system libraries. Install the development headers listed in the Linux section above, then retry .