# SMS encoding and message segments

When you send an SMS, the text is encoded using one of two character sets: **GSM-7** or **UCS-2**. The encoding is selected automatically based on the characters in your message text, and it directly determines how many segments your message is split into — which affects the `sms.segments` value you receive in the [Message Status API](/products/sms/enterprise-documentation/developer-documentation/api-references/apistatus-messagestatus) and [webhooks](/products/sms/enterprise-documentation/developer-documentation/wb/wb-how).


```mermaid
 flowchart LR
     A([Message text]) --> B{Any character\noutside GSM-7?}
 
     B -- No --> C[Encoding: GSM-7]
     B -- Yes --> D[Encoding: UCS-2]
```

## Why SMS has a character limit

An SMS message is transmitted as **140 bytes** of data. This is a fixed constraint defined by the GSM standard ([GSM 03.38](https://www.etsi.org/deliver/etsi_gts/03/0338/05.00.00_60/gsmts_0338v050000p.pdf)) and has not changed since SMS was designed.

The character limit you see — 160 or 70 — is a direct consequence of how many characters fit into those 140 bytes depending on the encoding used:

- **GSM-7**: each character uses 7 bits → `(140 × 8) / 7 =` **160 characters**
- **UCS-2**: each character uses 16 bits (2 bytes) → `140 / 2 =` **70 characters**


## GSM-7

GSM-7 is the default encoding. It supports the standard Latin alphabet, digits, and a set of common symbols — 128 characters in total.

A single GSM-7 SMS can contain up to **160 characters**.

When a message exceeds 160 characters, it is split into multiple segments. Each segment in a multi-part message can carry up to **153 characters** — the remaining 7 characters per segment are used by a **User Data Header (UDH)**, a metadata block that tells the recipient's device how to reassemble the parts in the correct order.

| Message length | Segments |
|  --- | --- |
| 1–160 chars | 1 |
| 161–306 chars | 2 |
| 307–459 chars | 3 |
| 460–612 chars | 4 |
| … | … |
| Up to 1,600 chars | Up to 10 |


**Formula**: `segments = ceil(length / 153)` for messages longer than 160 characters.

### GSM-7 character set

The basic GSM-7 alphabet includes:

- Uppercase and lowercase Latin letters (`A–Z`, `a–z`)
- Digits (`0–9`)
- Common punctuation: `! " # $ % & ' ( ) * + , - . / : ; < = > ? @`
- Special characters: `£ ¥ è é ù ì ò Ç Ø ø Å å Δ Φ Γ Λ Ω Π Ψ Σ Θ Ξ ß É Æ æ Ä Ö Ñ Ü ä ö ñ ü à`
- Space, newline, carriage return


### GSM-7 extended characters

Some characters belong to the GSM-7 **extension table** and count as **2 characters** each, because they require an escape sequence:

`{ } [ ] \ | ^ ~ €`

If your message contains even one character outside the GSM-7 alphabet, the entire message is automatically re-encoded in UCS-2. This applies to accented characters not in the GSM-7 set, emojis, Arabic, Chinese, Cyrillic, and any other non-Latin script.

## UCS-2

UCS-2 is used when the message contains characters outside the GSM-7 alphabet. It encodes each character as 16 bits (2 bytes), which allows it to represent up to 65,536 characters — covering virtually all scripts in the Unicode Basic Multilingual Plane.

The trade-off is a significantly reduced character limit: a single UCS-2 SMS can contain up to **70 characters**.

For multi-part messages, each segment carries up to **67 characters** (3 characters per segment are reserved for the User Data Header).

| Message length | Segments |
|  --- | --- |
| 1–70 chars | 1 |
| 71–134 chars | 2 |
| 135–201 chars | 3 |
| 202–268 chars | 4 |
| … | … |
| Up to 700 chars | Up to 10 |


**Formula**: `segments = ceil(length / 67)` for messages longer than 70 characters.

## Comparison

|  | GSM-7 | UCS-2 |
|  --- | --- | --- |
| Bits per character | 7 | 16 |
| Supported characters | 128 (Latin + common symbols) | 65,536 (Unicode BMP) |
| Single SMS limit | 160 characters | 70 characters |
| Multi-part segment limit | 153 characters | 67 characters |
| Max concatenated length | ~1,600 characters | ~700 characters |
| Triggered by | Default | Any non-GSM-7 character in the text |


## Practical examples

| Message | Encoding | Length | Segments |
|  --- | --- | --- | --- |
| `Your code is 123456` | GSM-7 | 19 chars | 1 |
| 160 × `A` | GSM-7 | 160 chars | 1 |
| 161 × `A` | GSM-7 | 161 chars | 2 |
| `Votre fenêtre est ouverte` | UCS-2 | 25 chars | 1 — `ê` is not in GSM-7 |
| `Bâtiment B, salle 3` | UCS-2 | 19 chars | 1 — `â` triggers UCS-2 |
| 70 × `中` | UCS-2 | 70 chars | 1 |
| 71 × `中` | UCS-2 | 71 chars | 2 |


The `€` symbol is part of the GSM-7 extension table and counts as **2 characters**, not 1. A message containing only `€` signs has an effective limit of 80 symbols per single SMS (160 ÷ 2), not 160.

## Common pitfalls

### A single Unicode character can double your segment count

Because UCS-2 applies to the **entire message**, a single non-GSM-7 character forces re-encoding of all the text. This can have a significant impact on segment count:

A message of **152 GSM-7 characters + 1 emoji** would fit in 1 GSM-7 segment. But because of the emoji, the whole message is encoded in UCS-2 — 153 characters at 67 per segment results in **3 segments**, not 2.

## Reading the segment count in the API

The number of segments used for a delivered message is available in two places:

### Message Status API

The `sms.segments` field is returned in the response body of all three message status endpoints:


```json
{
  "message": {
    "id": "011d9d6e-b5b9-4cb9-be13-2bc336a923ce",
    "channel": "SMS",
    "status": "DELIVERED",
    "sms": {
      "segments": 2
    }
  }
}
```

See [Message Status API](/products/sms/enterprise-documentation/developer-documentation/api-references/apistatus-messagestatus) for the full response reference.

### Webhook

The same `sms.segments` field is included in every webhook status notification:


```json
{
  "type": "STATUS_UPDATE",
  "message": {
    "id": "b31b6607-9c55-48ba-b145-3f40b809d2d2",
    "channel": "SMS",
    "status": "DELIVERED",
    "sms": {
      "segments": 2
    }
  }
}
```

See [Understanding webhook](/products/sms/enterprise-documentation/developer-documentation/wb/wb-how) for the full payload reference.

`sms.segments` is only present when `channel` is `SMS`. It is not included for `RCS` messages.