SMS encoding and message segments
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

When you send an SMS, the text is encoded using one of two character sets: GSM-7 or UCS-2. The encoding is selected automatically based on the characters in your message text, and it directly determines how many segments your message is split into — which affects the sms.segments value you receive in the Message Status API and webhooks.

Why SMS has a character limit

An SMS message is transmitted as 140 bytes of data. This is a fixed constraint defined by the GSM standard (GSM 03.38) and has not changed since SMS was designed.

The character limit you see — 160 or 70 — is a direct consequence of how many characters fit into those 140 bytes depending on the encoding used:

GSM-7: each character uses 7 bits → (140 × 8) / 7 = 160 characters
UCS-2: each character uses 16 bits (2 bytes) → 140 / 2 = 70 characters

GSM-7

GSM-7 is the default encoding. It supports the standard Latin alphabet, digits, and a set of common symbols — 128 characters in total.

A single GSM-7 SMS can contain up to 160 characters.

When a message exceeds 160 characters, it is split into multiple segments. Each segment in a multi-part message can carry up to 153 characters — the remaining 7 characters per segment are used by a User Data Header (UDH), a metadata block that tells the recipient's device how to reassemble the parts in the correct order.

Message length	Segments
1–160 chars	1
161–306 chars	2
307–459 chars	3
460–612 chars	4
…	…
Up to 1,600 chars	Up to 10

Formula: segments = ceil(length / 153) for messages longer than 160 characters.

GSM-7 character set

The basic GSM-7 alphabet includes:

Uppercase and lowercase Latin letters (A–Z, a–z)
Digits (0–9)
Common punctuation: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @
Special characters: £ ¥ è é ù ì ò Ç Ø ø Å å Δ Φ Γ Λ Ω Π Ψ Σ Θ Ξ ß É Æ æ Ä Ö Ñ Ü ä ö ñ ü à
Space, newline, carriage return

GSM-7 extended characters

Some characters belong to the GSM-7 extension table and count as 2 characters each, because they require an escape sequence:

{ } [ ] \ | ^ ~ €

If your message contains even one character outside the GSM-7 alphabet, the entire message is automatically re-encoded in UCS-2. This applies to accented characters not in the GSM-7 set, emojis, Arabic, Chinese, Cyrillic, and any other non-Latin script.

UCS-2

UCS-2 is used when the message contains characters outside the GSM-7 alphabet. It encodes each character as 16 bits (2 bytes), which allows it to represent up to 65,536 characters — covering virtually all scripts in the Unicode Basic Multilingual Plane.

The trade-off is a significantly reduced character limit: a single UCS-2 SMS can contain up to 70 characters.

For multi-part messages, each segment carries up to 67 characters (3 characters per segment are reserved for the User Data Header).

Message length	Segments
1–70 chars	1
71–134 chars	2
135–201 chars	3
202–268 chars	4
…	…
Up to 700 chars	Up to 10

Formula: segments = ceil(length / 67) for messages longer than 70 characters.

Comparison

	GSM-7	UCS-2
Bits per character	7	16
Supported characters	128 (Latin + common symbols)	65,536 (Unicode BMP)
Single SMS limit	160 characters	70 characters
Multi-part segment limit	153 characters	67 characters
Max concatenated length	~1,600 characters	~700 characters
Triggered by	Default	Any non-GSM-7 character in the text

Practical examples

Message	Encoding	Length	Segments
`Your code is 123456`	GSM-7	19 chars	1
160 × `A`	GSM-7	160 chars	1
161 × `A`	GSM-7	161 chars	2
`Votre fenêtre est ouverte`	UCS-2	25 chars	1 — `ê` is not in GSM-7
`Bâtiment B, salle 3`	UCS-2	19 chars	1 — `â` triggers UCS-2
70 × `中`	UCS-2	70 chars	1
71 × `中`	UCS-2	71 chars	2

The € symbol is part of the GSM-7 extension table and counts as 2 characters, not 1. A message containing only € signs has an effective limit of 80 symbols per single SMS (160 ÷ 2), not 160.

Common pitfalls

A single Unicode character can double your segment count

Because UCS-2 applies to the entire message, a single non-GSM-7 character forces re-encoding of all the text. This can have a significant impact on segment count:

A message of 152 GSM-7 characters + 1 emoji would fit in 1 GSM-7 segment. But because of the emoji, the whole message is encoded in UCS-2 — 153 characters at 67 per segment results in 3 segments, not 2.

Reading the segment count in the API

The number of segments used for a delivered message is available in two places:

Message Status API

The sms.segments field is returned in the response body of all three message status endpoints:

{
  "message": {
    "id": "011d9d6e-b5b9-4cb9-be13-2bc336a923ce",
    "channel": "SMS",
    "status": "DELIVERED",
    "sms": {
      "segments": 2
    }
  }
}

See Message Status API for the full response reference.

Webhook

The same sms.segments field is included in every webhook status notification:

{
  "type": "STATUS_UPDATE",
  "message": {
    "id": "b31b6607-9c55-48ba-b145-3f40b809d2d2",
    "channel": "SMS",
    "status": "DELIVERED",
    "sms": {
      "segments": 2
    }
  }
}

See Understanding webhook for the full payload reference.

sms.segments is only present when channel is SMS. It is not included for RCS messages.

SMS encoding and message segmentsCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude