# SMS encoding and message segments When you send an SMS, the text is encoded using one of two character sets: **GSM-7** or **UCS-2**. The encoding is selected automatically based on the characters in your message text, and it directly determines how many segments your message is split into — which affects the `sms.segments` value you receive in the [Message Status API](/products/sms/enterprise-documentation/developer-documentation/api-references/apistatus-messagestatus) and [webhooks](/products/sms/enterprise-documentation/developer-documentation/wb/wb-how). ```mermaid flowchart LR A([Message text]) --> B{Any character\noutside GSM-7?} B -- No --> C[Encoding: GSM-7] B -- Yes --> D[Encoding: UCS-2] ``` ## Why SMS has a character limit An SMS message is transmitted as **140 bytes** of data. This is a fixed constraint defined by the GSM standard ([GSM 03.38](https://www.etsi.org/deliver/etsi_gts/03/0338/05.00.00_60/gsmts_0338v050000p.pdf)) and has not changed since SMS was designed. The character limit you see — 160 or 70 — is a direct consequence of how many characters fit into those 140 bytes depending on the encoding used: - **GSM-7**: each character uses 7 bits → `(140 × 8) / 7 =` **160 characters** - **UCS-2**: each character uses 16 bits (2 bytes) → `140 / 2 =` **70 characters** ## GSM-7 GSM-7 is the default encoding. It supports the standard Latin alphabet, digits, and a set of common symbols — 128 characters in total. A single GSM-7 SMS can contain up to **160 characters**. When a message exceeds 160 characters, it is split into multiple segments. Each segment in a multi-part message can carry up to **153 characters** — the remaining 7 characters per segment are used by a **User Data Header (UDH)**, a metadata block that tells the recipient's device how to reassemble the parts in the correct order. | Message length | Segments | | --- | --- | | 1–160 chars | 1 | | 161–306 chars | 2 | | 307–459 chars | 3 | | 460–612 chars | 4 | | … | … | | Up to 1,600 chars | Up to 10 | **Formula**: `segments = ceil(length / 153)` for messages longer than 160 characters. ### GSM-7 character set The basic GSM-7 alphabet includes: - Uppercase and lowercase Latin letters (`A–Z`, `a–z`) - Digits (`0–9`) - Common punctuation: `! " # $ % & ' ( ) * + , - . / : ; < = > ? @` - Special characters: `£ ¥ è é ù ì ò Ç Ø ø Å å Δ Φ Γ Λ Ω Π Ψ Σ Θ Ξ ß É Æ æ Ä Ö Ñ Ü ä ö ñ ü à` - Space, newline, carriage return ### GSM-7 extended characters Some characters belong to the GSM-7 **extension table** and count as **2 characters** each, because they require an escape sequence: `{ } [ ] \ | ^ ~ €` If your message contains even one character outside the GSM-7 alphabet, the entire message is automatically re-encoded in UCS-2. This applies to accented characters not in the GSM-7 set, emojis, Arabic, Chinese, Cyrillic, and any other non-Latin script. ## UCS-2 UCS-2 is used when the message contains characters outside the GSM-7 alphabet. It encodes each character as 16 bits (2 bytes), which allows it to represent up to 65,536 characters — covering virtually all scripts in the Unicode Basic Multilingual Plane. The trade-off is a significantly reduced character limit: a single UCS-2 SMS can contain up to **70 characters**. For multi-part messages, each segment carries up to **67 characters** (3 characters per segment are reserved for the User Data Header). | Message length | Segments | | --- | --- | | 1–70 chars | 1 | | 71–134 chars | 2 | | 135–201 chars | 3 | | 202–268 chars | 4 | | … | … | | Up to 700 chars | Up to 10 | **Formula**: `segments = ceil(length / 67)` for messages longer than 70 characters. ## Comparison | | GSM-7 | UCS-2 | | --- | --- | --- | | Bits per character | 7 | 16 | | Supported characters | 128 (Latin + common symbols) | 65,536 (Unicode BMP) | | Single SMS limit | 160 characters | 70 characters | | Multi-part segment limit | 153 characters | 67 characters | | Max concatenated length | ~1,600 characters | ~700 characters | | Triggered by | Default | Any non-GSM-7 character in the text | ## Practical examples | Message | Encoding | Length | Segments | | --- | --- | --- | --- | | `Your code is 123456` | GSM-7 | 19 chars | 1 | | 160 × `A` | GSM-7 | 160 chars | 1 | | 161 × `A` | GSM-7 | 161 chars | 2 | | `Votre fenêtre est ouverte` | UCS-2 | 25 chars | 1 — `ê` is not in GSM-7 | | `Bâtiment B, salle 3` | UCS-2 | 19 chars | 1 — `â` triggers UCS-2 | | 70 × `中` | UCS-2 | 70 chars | 1 | | 71 × `中` | UCS-2 | 71 chars | 2 | The `€` symbol is part of the GSM-7 extension table and counts as **2 characters**, not 1. A message containing only `€` signs has an effective limit of 80 symbols per single SMS (160 ÷ 2), not 160. ## Common pitfalls ### A single Unicode character can double your segment count Because UCS-2 applies to the **entire message**, a single non-GSM-7 character forces re-encoding of all the text. This can have a significant impact on segment count: A message of **152 GSM-7 characters + 1 emoji** would fit in 1 GSM-7 segment. But because of the emoji, the whole message is encoded in UCS-2 — 153 characters at 67 per segment results in **3 segments**, not 2. ## Reading the segment count in the API The number of segments used for a delivered message is available in two places: ### Message Status API The `sms.segments` field is returned in the response body of all three message status endpoints: ```json { "message": { "id": "011d9d6e-b5b9-4cb9-be13-2bc336a923ce", "channel": "SMS", "status": "DELIVERED", "sms": { "segments": 2 } } } ``` See [Message Status API](/products/sms/enterprise-documentation/developer-documentation/api-references/apistatus-messagestatus) for the full response reference. ### Webhook The same `sms.segments` field is included in every webhook status notification: ```json { "type": "STATUS_UPDATE", "message": { "id": "b31b6607-9c55-48ba-b145-3f40b809d2d2", "channel": "SMS", "status": "DELIVERED", "sms": { "segments": 2 } } } ``` See [Understanding webhook](/products/sms/enterprise-documentation/developer-documentation/wb/wb-how) for the full payload reference. `sms.segments` is only present when `channel` is `SMS`. It is not included for `RCS` messages.