# Users

## Overview

A `users`table is a comprehensive record of users of your product, along with relevant segmentation information that might be useful when analyzing growth metrics. To that end, Roadway uses three broad categories of user data:

* internal identifiers (e.g. `user_id`and `visitor_id`)&#x20;
* sign up dates
* segmentation dimensions, e.g. (\`persona\`, \`usecase\`, etc.)

{% hint style="warning" %}
As a reminder, Roadway does not use PII (e.g. name, address, phone number, email, etc.), even for users.
{% endhint %}

Internally, Roadway uses this `users`table to not only provide metrics concerning users but also to link [visits](/data-requirements/warehouse-requirements/page-views.md) data to down- or cross-stream elements of your growth funnel, such as revenue or leads.

## Core Schema Requirements

The table you expose to Roadway should adhere to the following schema:

| **Column Name**                | **Data Type**                        | **Description**                                                                                                                                            |
| ------------------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `user_id`                      | varchar                              | Unique identifier for the user. This should be consistent with the user\_id used in your page views data. **Required.**                                    |
| `signed_up_at`                 | timestamp                            | UTC timestamp of when the user first registered or signed up for your product/service. This represents the conversion event for attribution. **Required.** |
| `<optional_custom_dimensions>` | `varchar` or `boolean` (recommended) | See below section. **Optional.**                                                                                                                           |

### Additional Constraints

The following constraints apply to the above table:

* user\_id is the primary key
* Do ***NOT*** provide PII in custom user dimensions

### Custom User Dimensions

When analyzing metrics, it is useful to group or filter by specific user segments that provide meaningful business context. Custom dimensions enable these kinds of high-specificity workflows within Roadway. The specific nature (and backing logic) of these dimensions is up to you.&#x20;

#### Example Custom Dimensions

Here are some examples of custom dimensions that might be applicable for your business:

| Attribute       | Data Type | Description                  | Possible Values                                    |
| --------------- | --------- | ---------------------------- | -------------------------------------------------- |
| `persona`       | varchar   | User segment or persona type | `enterprise`, `smb`, `individual`, `student`       |
| `plan_type`     | varchar   | Subscription or plan level   | `free`, `pro`, `enterprise`, `trial`               |
| `signup_source` | varchar   | Primary signup channel       | `organic`, `paid`, `referral`, `direct`            |
| `industry`      | varchar   | User's industry category     | `technology`, `healthcare`, `finance`, `education` |
| `company_size`  | varchar   | Organization size bracket    | `1-10`, `11-50`, `51-200`, `200+`                  |

#### Best Practices for Custom Dimensions

1. **Avoid PII**: Scrub emails, names, phone numbers, etc. by providing VIEWs on top of internal tables.
2. **Use categorical data**: Avoid high-cardinality fields like specific company names
3. **Standardize values**: Ensure consistent naming (e.g., always use lowercase)
4. **Handle NULL values**: Design your analysis to handle missing dimension data
5. **Limit dimension count**: Where possible, limit a custom dimensions to fewer than 20 values to maintain query performance and practical usability (despite this, Roadway supports displaying up to 100 values per dimension).

#### Example Custom Dimension Queries

```sql
-- good: low-cardinality categorical field
select 
    user_id
    , case 
        when annual_revenue < 1000000 then 'smb'
        when annual_revenue < 10000000 then 'mid_market'  
        else 'enterprise'
    end as company_size
from user_data;

-- avoid: high-cardinality or pii fields
select 
    user_id
    , company_name  -- too specific, could be pii
    , exact_revenue -- too high cardinality
    , full_email  -- pii
from user_data;
```

***

## Data Source Options

### Option 1: Segment

In cases where you are using Segment to land user data in your warehouse, use this method.

**Source Table**: Your Segment `identifies` table. See [Segment's identifies table documentation](https://segment.com/docs/connections/storage/warehouses/schema/#identifies) for the complete schema. Roadway uses the following fields:

* `anonymous_id` - Visitor identifier before signup
* `user_id` - User identifier after signup
* `timestamp` - When the identify call was made
* Custom traits (optional) - Additional user attributes (see [Segment traits documentation](https://segment.com/docs/connections/spec/identify/#traits)) which become custom user dimensions

**Key Notes**:

* Follows Segment's recommended pattern for user identification
* The first identify call per user\_id represents the signup/conversion event
* Additional identify calls for the same user\_id are ignored for signup timing
* Custom user traits can be included if available in your Segment implementation

### Option 2: Google Analytics 4 (GA4)

In cases where you are landing GA4 data in your warehouse and have configured signup event tracking, use this method.

**Source**: GA4 signup events (default: `sign_up`)&#x20;

**Required GA4 Setup**: See [GA4 documentation](https://developers.google.com/analytics/devguides/collection/ga4) for further context. Key requirements:

* `sign_up` events being tracked (or custom signup event configured; see below)
* [User ID tracking enabled in GA4](https://developers.google.com/analytics/devguides/collection/ga4/user-id?client_type=gtag)

**Custom User Properties**: GA4 allows you to capture additional user properties during signup events. See [GA4 custom event parameters documentation](https://developers.google.com/analytics/devguides/collection/ga4/custom-events) for implementation details. Common user properties include:

* User demographics (age range, gender)
* Geographic information (country, region, city)
* Device and platform information (device category, platform, browser) - automatically captured
* Custom business attributes sent as event parameters (plan\_type, persona, etc. - see above for examples)

**Custom Signup Events**: If you're not using the standard `sign_up` event, please provide us with the name of the event you are using to track user sign-ups.

**Key Notes**:

* Signup events must include a user\_id parameter in GA4
* Geographic and device information is automatically captured
* Multiple signup events for the same user\_id are deduplicated by taking the earliest timestamp

### Option 3: Custom Data Source

When you have a custom users table (directly from your application tables or via internal data modeling) in your warehouse, use this method

**Your Table Schema**:

```sql
create table your_schema.users (
    user_id varchar primary key,      -- unique user identifier
    signed_up_at timestamp not null,  -- utc signup timestamp
    
    -- optional: custom user dimensions
    custom_user_dimension_1 varchar,
    custom_user_dimension_2 varchar,
    -- ...
);
```

**Key Notes**:

* You have full control over user identification and custom dimensions
* Ensure user\_id values match those used in your page views data
* Custom dimensions should be categorical with reasonable cardinality (< 50 unique values)

#### Data Validation

* **Unique user\_id**: Each user should appear only once in the users table
* **Non-null user\_id**: Every user record must have a unique identifier
* **Non-null signed\_up\_at**: Every user must have a signup timestamp
* **Valid timestamps**: `signed_up_at` should be a valid UTC timestamp
* **User ID consistency**: user\_id values should match those used in page views data


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.roadwayai.com/data-requirements/warehouse-requirements/users.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
