Kotlin regular expressions are powerful tools for pattern matching and text manipulation in Kotlin programming. Whether you’re validating user input, parsing data, or searching for specific patterns in strings, Kotlin regex provides an elegant solution. In this comprehensive guide, we’ll explore everything you need to know about Kotlin regular expressions, from basic syntax to advanced pattern matching techniques.
Kotlin regular expressions, commonly referred to as Kotlin regex, are sequences of characters that define search patterns. The Kotlin standard library provides robust support for regular expressions through the Regex
class, making it easy to work with pattern matching in your Kotlin applications.
Kotlin offers several ways to create regex objects. The most common approach is using the Regex
constructor or the toRegex()
extension function.
// Using Regex constructor
val regex1 = Regex("hello")
// Using toRegex() extension function
val regex2 = "world".toRegex()
// Using regex literal with .r extension
val regex3 = "kotlin".toRegex()
The Regex
class in Kotlin provides a clean and intuitive way to work with regular expressions. When you create a Regex object, Kotlin compiles the pattern for efficient reuse.
Pattern matching is the core functionality of Kotlin regular expressions. You can check if a string matches a pattern using various methods provided by the Regex class.
val emailPattern = Regex("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")
val email = "user@example.com"
// Check if string matches pattern
val isValidEmail = emailPattern.matches(email)
println(isValidEmail) // true
The matches()
function returns a boolean indicating whether the entire string matches the regex pattern. This is particularly useful for validation scenarios in Kotlin applications.
Kotlin regex provides several methods to find matches within strings. The find()
method returns the first match, while findAll()
returns all matches as a sequence.
val text = "The quick brown fox jumps over the lazy dog"
val wordPattern = Regex("\\b\\w{5}\\b") // Matches 5-letter words
// Find first match
val firstMatch = wordPattern.find(text)
println(firstMatch?.value) // "quick"
// Find all matches
val allMatches = wordPattern.findAll(text)
allMatches.forEach { match ->
println(match.value)
}
The find()
method returns a MatchResult?
object that contains information about the match, including the matched text and its position in the original string.
Groups in Kotlin regular expressions allow you to capture specific parts of a match. You can create groups using parentheses in your regex pattern.
val phonePattern = Regex("(\\d{3})-(\\d{3})-(\\d{4})")
val phoneNumber = "123-456-7890"
val matchResult = phonePattern.find(phoneNumber)
if (matchResult != null) {
println("Full match: ${matchResult.value}")
println("Area code: ${matchResult.groups[1]?.value}")
println("Exchange: ${matchResult.groups[2]?.value}")
println("Number: ${matchResult.groups[3]?.value}")
}
Groups provide a powerful way to extract specific information from matched text. The groups
property returns a MatchGroupCollection
where index 0 contains the full match, and subsequent indices contain captured groups.
Kotlin supports named groups, which make your regular expressions more readable and maintainable. Named groups are created using the (?<name>pattern)
syntax.
val urlPattern = Regex("(?<protocol>https?)://(?<domain>[^/]+)(?<path>/.*)?")
val url = "https://www.example.com/path o/resource"
val match = urlPattern.find(url)
match?.let {
println("Protocol: ${it.groups["protocol"]?.value}")
println("Domain: ${it.groups["domain"]?.value}")
println("Path: ${it.groups["path"]?.value}")
}
Named groups enhance code readability by allowing you to reference captured groups by name rather than numeric index.
Kotlin regex excels at string replacement operations. The replace()
method allows you to substitute matched patterns with replacement text.
val text = "Hello World! Welcome to Kotlin programming."
val pattern = Regex("\\b\\w+\\b") // Matches whole words
// Replace all words with uppercase
val upperCaseText = pattern.replace(text) { matchResult ->
matchResult.value.uppercase()
}
println(upperCaseText) // "HELLO WORLD! WELCOME TO KOTLIN PROGRAMMING."
The replace()
method accepts a lambda function that receives a MatchResult
and returns the replacement string, providing flexible replacement logic.
Kotlin regular expressions support various options that modify how patterns are matched. These options are specified using RegexOption
enum values.
val caseInsensitivePattern = Regex("hello", RegexOption.IGNORE_CASE)
val multilinePattern = Regex("^start", setOf(RegexOption.MULTILINE, RegexOption.IGNORE_CASE))
val text = "Hello World\nSTART of line"
println(caseInsensitivePattern.find(text)?.value) // "Hello"
println(multilinePattern.find(text)?.value) // "START"
Common regex options include IGNORE_CASE
for case-insensitive matching, MULTILINE
for multiline mode, and DOT_MATCHES_ALL
for making dot match newline characters.
The split()
method in Kotlin regex allows you to divide strings based on pattern matches. This is particularly useful for parsing structured data.
val csvData = "apple,banana,orange;grape,kiwi,mango"
val delimiter = Regex("[,;]")
val fruits = delimiter.split(csvData)
fruits.forEach { fruit ->
println(fruit.trim())
}
The split()
method returns a list of strings that were separated by the regex pattern, making it easy to process delimited data.
Input validation is a common use case for Kotlin regular expressions. You can create comprehensive validation patterns for various data types.
fun validateUserInput(input: String): Boolean {
val usernamePattern = Regex("^[a-zA-Z0-9_]{3,20}$")
val passwordPattern = Regex("^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&])[A-Za-z\\d@$!%*?&]{8,}$")
return when {
input.startsWith("user:") -> {
val username = input.substringAfter("user:")
usernamePattern.matches(username)
}
input.startsWith("pass:") -> {
val password = input.substringAfter("pass:")
passwordPattern.matches(password)
}
else -> false
}
}
This validation function demonstrates how Kotlin regex can be used to enforce complex input requirements with precise pattern matching.
Kotlin supports advanced regex features including lookaheads, lookbehinds, and conditional patterns. These features enable sophisticated pattern matching scenarios.
val complexPattern = Regex("(?<=\\w)\\d+(?=\\w)") // Digits surrounded by word characters
val text = "abc123def456ghi"
val matches = complexPattern.findAll(text)
matches.forEach { match ->
println("Found: ${match.value} at position ${match.range}")
}
Lookaheads and lookbehinds allow you to match patterns based on context without including the context in the match result.
Kotlin regex methods often return sequences, which provide memory-efficient processing of large datasets. Sequences are lazily evaluated, making them ideal for processing large amounts of text.
val logFile = "Error: Connection failed\nWarning: Low memory\nError: Database timeout\nInfo: Process completed"
val errorPattern = Regex("Error: (.+)")
val errors = errorPattern.findAll(logFile)
.map { it.groups[1]?.value ?: "Unknown error" }
.toList()
errors.forEach { error ->
println("Error detected: $error")
}
Using sequences with Kotlin regex allows for efficient processing of large text files without loading all matches into memory simultaneously.
When working with literal text that contains regex special characters, you need to escape them properly. Kotlin provides the escape()
method for this purpose.
val specialText = "Cost: $19.99 (includes 5% tax)"
val literalPattern = Regex.escape("$19.99")
val searchPattern = Regex("Cost: $literalPattern")
val found = searchPattern.find(specialText)
println(found?.value) // "Cost: $19.99"
The escape()
method ensures that special regex characters are treated as literal text rather than pattern metacharacters.
Here’s a comprehensive example that demonstrates various Kotlin regex features in a practical text processing application:
import kotlin.text.Regex
data class ContactInfo(
val name: String,
val email: String,
val phone: String,
val website: String?
)
class ContactExtractor {
private val namePattern = Regex("Name:\\s*([A-Za-z\\s]+)")
private val emailPattern = Regex("Email:\\s*([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
private val phonePattern = Regex("Phone:\\s*(\\(?\\d{3}\\)?[-\\s]?\\d{3}[-\\s]?\\d{4})")
private val websitePattern = Regex("Website:\\s*(https?://[^\\s]+)")
fun extractContacts(text: String): List<ContactInfo> {
val contacts = mutableListOf<ContactInfo>()
// Split text into potential contact blocks
val contactBlocks = text.split(Regex("\\n\\s*\\n"))
for (block in contactBlocks) {
val name = namePattern.find(block)?.groups?.get(1)?.value?.trim()
val email = emailPattern.find(block)?.groups?.get(1)?.value
val phone = phonePattern.find(block)?.groups?.get(1)?.value
val website = websitePattern.find(block)?.groups?.get(1)?.value
if (name != null && email != null && phone != null) {
contacts.add(ContactInfo(name, email, phone, website))
}
}
return contacts
}
fun validateAndFormatPhone(phone: String): String? {
val cleanPhone = phone.replace(Regex("[^\\d]"), "")
return if (cleanPhone.length == 10) {
cleanPhone.replace(Regex("(\\d{3})(\\d{3})(\\d{4})"), "($1) $2-$3")
} else null
}
fun maskSensitiveData(text: String): String {
val emailMask = Regex("([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
val phoneMask = Regex("(\\(?\\d{3}\\)?[-\\s]?)(\\d{3})([-\\s]?\\d{4})")
return text
.replace(emailMask) { match ->
val user = match.groups[1]?.value ?: ""
val domain = match.groups[2]?.value ?: ""
"${user.take(2)}***@${domain}"
}
.replace(phoneMask) { match ->
val area = match.groups[1]?.value ?: ""
val end = match.groups[3]?.value ?: ""
"${area}***${end}"
}
}
}
fun main() {
val contactExtractor = ContactExtractor()
val sampleText = """
Name: John Smith
Email: john.smith@example.com
Phone: (555) 123-4567
Website: https://johnsmith.dev
Name: Jane Doe
Email: jane.doe@company.org
Phone: 555-987-6543
Website: https://janedoe.com
Name: Bob Johnson
Email: bob@techcorp.net
Phone: (555) 456-7890
""".trimIndent()
// Extract contacts
val contacts = contactExtractor.extractContacts(sampleText)
println("=== Extracted Contacts ===")
contacts.forEach { contact ->
println("Name: ${contact.name}")
println("Email: ${contact.email}")
println("Phone: ${contactExtractor.validateAndFormatPhone(contact.phone) ?: contact.phone}")
println("Website: ${contact.website ?: "Not provided"}")
println("---")
}
// Demonstrate text masking
println("\n=== Masked Text ===")
println(contactExtractor.maskSensitiveData(sampleText))
// Demonstrate advanced pattern matching
println("\n=== Pattern Analysis ===")
val emailDomains = Regex("@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
.findAll(sampleText)
.map { it.groups[1]?.value }
.distinct()
.toList()
println("Email domains found: ${emailDomains.joinToString(", ")}")
// Validate phone number formatting
val phoneNumbers = listOf("5551234567", "(555) 123-4567", "555-123-4567")
phoneNumbers.forEach { phone ->
val formatted = contactExtractor.validateAndFormatPhone(phone)
println("$phone -> ${formatted ?: "Invalid format"}")
}
}
Expected Output:
=== Extracted Contacts ===
Name: John Smith
Email: john.smith@example.com
Phone: (555) 123-4567
Website: https://johnsmith.dev
---
Name: Jane Doe
Email: jane.doe@company.org
Phone: (555) 987-6543
Website: https://janedoe.com
---
Name: Bob Johnson
Email: bob@techcorp.net
Phone: (555) 456-7890
Website: Not provided
---
=== Masked Text ===
Name: John Smith
Email: jo***@example.com
Phone: (555)***-4567
Website: https://johnsmith.dev
Name: Jane Doe
Email: ja***@company.org
Phone: 555***-6543
Website: https://janedoe.com
Name: Bob Johnson
Email: bo***@techcorp.net
Phone: (555)***-7890
=== Pattern Analysis ===
Email domains found: example.com, company.org, techcorp.net
5551234567 -> (555) 123-4567
(555) 123-4567 -> (555) 123-4567
555-123-4567 -> (555) 123-4567
This comprehensive example showcases the power of Kotlin regular expressions in real-world applications. The code demonstrates contact extraction, data validation, text masking, and advanced pattern matching techniques using various Kotlin regex features.