# Page Views

## Page Views Data Requirements

### Overview

Page-views are the foundation of attribution analysis. They represent individual website visits and page interactions that capture the customer journey from first touch to conversion. Roadway supports three different methods for providing page-view data: **Segment**, **Google Analytics 4 (GA4)**, and **Custom** data sources.

A page view represents a single page visit by a visitor, containing attribution parameters and referral information necessary for attribution modeling.

### Conceptual Requirements

Roadway expects page-view data to conform to the following schema, regardless of your chosen data source method:

| **Column Name**      | **Data Type**      | **Description**                                                                                                                                                                                                    |
| -------------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `visitor_id`         | varchar            | Unique identifier for the visitor (anonymous or identified). This is typically the anonymous\_id from Segment, user\_pseudo\_id from GA4, or a similar visitor identifier from your tracking system. **Required.** |
| `page_viewed_at`     | timestamp          | UTC timestamp of when the page view occurred. **Required.**                                                                                                                                                        |
| `user_id`            | varchar            | Unique identifier for authenticated users. This should be NULL for anonymous visitors and populated when a visitor becomes an identified user. Optional but recommended for complete attribution.                  |
| `landing_page_url`   | varchar            | The full URL of the page that was visited, including query parameters. **Required.**                                                                                                                               |
| `referring_page_url` | varchar            | The full URL of the page that referred the visitor to this page. This can be NULL for direct traffic or when referrer information is not available. Optional.                                                      |
| city                 | varchar (nullable) | (Optional) The city in which the visit occurred, often provided by your tracking implementation library.                                                                                                           |
| region               | varchar (nullable) | (Optional) The region (i.e. state, in U.S. locales) in which the visit occurred, often provided by your tracking implementation library.                                                                           |
| country              | varchar (nullable) | (Optional) The country in which the visit occurred, often provided by your tracking implementation library.                                                                                                        |
| continent            | varchar (nullable) | (Optional) The continent in which the visit occurred, often provided by your tracking implementation library.                                                                                                      |

***

## Data Source Options

### Option 1: Segment

If you already import page-view data via Segment to your warehouse, Roadway can leverage the Segment schema to transparently process page-views.

**Source Table**: Your complete Segment `pages` table. See [Segment's pages table documentation](https://segment.com/docs/connections/storage/warehouses/schema/#pages) for the complete schema.&#x20;

### Option 2: Google Analytics 4 (GA4)

Alternatively, Roadway can transparently process GA4 tracking data. Use this method if you're already landing GA4 data in your warehouse.

**Source**: GA4 `page_view` events collected for an analytics property, exposed via [GA4 Export tables](https://support.google.com/analytics/answer/7029846?hl=en).

{% hint style="info" %}
Often, GA4 data is managed via a [GA4 Export for BigQuery](https://support.google.com/analytics/answer/9823238?hl=en#zippy=%2Cin-this-article), but there are other methods for managing GA4 data.
{% endhint %}

### Option 3: Custom Data Source

{% hint style="info" %}
We highly recommend providing pages information directly from Segment or GA4 schemas. See the above sections for more details.
{% endhint %}

In rare cases, customers may set up their own tracking systems, or they may have custom data modeling downstream of exports from tracking solutions like Segment or GA4.&#x20;

Use this method for cases in which you have custom page-view tracking data or downstream custom models already landed in your warehouse.

**Required Table Schema**:

```sql
create table <your_schema>.page_views (
    visitor_id varchar not null,       -- your visitor identifier
    page_viewed_at timestamp not null, -- utc timestamp of page view
    user_id varchar,                   -- user identifier (nullable)
    landing_page_url varchar not null, -- full page url with parameters
    referring_page_url varchar,        -- referring page url (nullable)
    city varchar,
    region varchar,
    country varchar,
    continent varchar
);
```

#### Data Validation

* **Non-null visitor\_id**: Every page view must have a visitor identifier
* **Non-null page\_viewed\_at**: Every page view must have a timestamp
* **Non-null landing\_page\_url**: Every page view must have a URL
* **Unique page views**: The combination of `visitor_id` + `page_viewed_at` should be unique
* **Valid timestamps**: `page_viewed_at` should be a valid UTC timestamp

#### Data Integrity Expectations

* **Page view volume**: Expect 10-1000x more page views than users depending on your product
* **Anonymous vs. identified traffic**: A significant portion of page views will have `user_id` as NULL (anonymous visitors)
* **URL quality**: URLs should be well-formed

***

#### Coming Soon

If your traffic- or events-analytics stack is based on the following technologies:

* Amplitude
* Mixpanel
* Rudderstack

Please reach out; we'll be happy to quickly accommodate your needs.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.roadwayai.com/data-requirements/warehouse-requirements/page-views.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
