Kotlin Regular Expression

Kotlin regular expressions are powerful tools for pattern matching and text manipulation in Kotlin programming. Whether you’re validating user input, parsing data, or searching for specific patterns in strings, Kotlin regex provides an elegant solution. In this comprehensive guide, we’ll explore everything you need to know about Kotlin regular expressions, from basic syntax to advanced pattern matching techniques.

Understanding Kotlin Regular Expressions

Kotlin regular expressions, commonly referred to as Kotlin regex, are sequences of characters that define search patterns. The Kotlin standard library provides robust support for regular expressions through the Regex class, making it easy to work with pattern matching in your Kotlin applications.

Creating Regex Objects in Kotlin

Kotlin offers several ways to create regex objects. The most common approach is using the Regex constructor or the toRegex() extension function.

// Using Regex constructor
val regex1 = Regex("hello")

// Using toRegex() extension function
val regex2 = "world".toRegex()

// Using regex literal with .r extension
val regex3 = "kotlin".toRegex()

The Regex class in Kotlin provides a clean and intuitive way to work with regular expressions. When you create a Regex object, Kotlin compiles the pattern for efficient reuse.

Basic Pattern Matching with Kotlin Regex

Pattern matching is the core functionality of Kotlin regular expressions. You can check if a string matches a pattern using various methods provided by the Regex class.

val emailPattern = Regex("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")
val email = "user@example.com"

// Check if string matches pattern
val isValidEmail = emailPattern.matches(email)
println(isValidEmail) // true

The matches() function returns a boolean indicating whether the entire string matches the regex pattern. This is particularly useful for validation scenarios in Kotlin applications.

Finding Matches in Strings

Kotlin regex provides several methods to find matches within strings. The find() method returns the first match, while findAll() returns all matches as a sequence.

val text = "The quick brown fox jumps over the lazy dog"
val wordPattern = Regex("\\b\\w{5}\\b") // Matches 5-letter words

// Find first match
val firstMatch = wordPattern.find(text)
println(firstMatch?.value) // "quick"

// Find all matches
val allMatches = wordPattern.findAll(text)
allMatches.forEach { match ->
    println(match.value)
}

The find() method returns a MatchResult? object that contains information about the match, including the matched text and its position in the original string.

Kotlin Regex Groups and Capturing

Groups in Kotlin regular expressions allow you to capture specific parts of a match. You can create groups using parentheses in your regex pattern.

val phonePattern = Regex("(\\d{3})-(\\d{3})-(\\d{4})")
val phoneNumber = "123-456-7890"

val matchResult = phonePattern.find(phoneNumber)
if (matchResult != null) {
    println("Full match: ${matchResult.value}")
    println("Area code: ${matchResult.groups[1]?.value}")
    println("Exchange: ${matchResult.groups[2]?.value}")
    println("Number: ${matchResult.groups[3]?.value}")
}

Groups provide a powerful way to extract specific information from matched text. The groups property returns a MatchGroupCollection where index 0 contains the full match, and subsequent indices contain captured groups.

Named Groups in Kotlin Regex

Kotlin supports named groups, which make your regular expressions more readable and maintainable. Named groups are created using the (?<name>pattern) syntax.

val urlPattern = Regex("(?<protocol>https?)://(?<domain>[^/]+)(?<path>/.*)?")
val url = "https://www.example.com/path o/resource"

val match = urlPattern.find(url)
match?.let {
    println("Protocol: ${it.groups["protocol"]?.value}")
    println("Domain: ${it.groups["domain"]?.value}")
    println("Path: ${it.groups["path"]?.value}")
}

Named groups enhance code readability by allowing you to reference captured groups by name rather than numeric index.

String Replacement with Kotlin Regex

Kotlin regex excels at string replacement operations. The replace() method allows you to substitute matched patterns with replacement text.

val text = "Hello World! Welcome to Kotlin programming."
val pattern = Regex("\\b\\w+\\b") // Matches whole words

// Replace all words with uppercase
val upperCaseText = pattern.replace(text) { matchResult ->
    matchResult.value.uppercase()
}
println(upperCaseText) // "HELLO WORLD! WELCOME TO KOTLIN PROGRAMMING."

The replace() method accepts a lambda function that receives a MatchResult and returns the replacement string, providing flexible replacement logic.

Kotlin Regex Options and Flags

Kotlin regular expressions support various options that modify how patterns are matched. These options are specified using RegexOption enum values.

val caseInsensitivePattern = Regex("hello", RegexOption.IGNORE_CASE)
val multilinePattern = Regex("^start", setOf(RegexOption.MULTILINE, RegexOption.IGNORE_CASE))

val text = "Hello World\nSTART of line"
println(caseInsensitivePattern.find(text)?.value) // "Hello"
println(multilinePattern.find(text)?.value) // "START"

Common regex options include IGNORE_CASE for case-insensitive matching, MULTILINE for multiline mode, and DOT_MATCHES_ALL for making dot match newline characters.

Splitting Strings with Kotlin Regex

The split() method in Kotlin regex allows you to divide strings based on pattern matches. This is particularly useful for parsing structured data.

val csvData = "apple,banana,orange;grape,kiwi,mango"
val delimiter = Regex("[,;]")

val fruits = delimiter.split(csvData)
fruits.forEach { fruit ->
    println(fruit.trim())
}

The split() method returns a list of strings that were separated by the regex pattern, making it easy to process delimited data.

Validating Input with Kotlin Regex

Input validation is a common use case for Kotlin regular expressions. You can create comprehensive validation patterns for various data types.

fun validateUserInput(input: String): Boolean {
    val usernamePattern = Regex("^[a-zA-Z0-9_]{3,20}$")
    val passwordPattern = Regex("^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&])[A-Za-z\\d@$!%*?&]{8,}$")
    
    return when {
        input.startsWith("user:") -> {
            val username = input.substringAfter("user:")
            usernamePattern.matches(username)
        }
        input.startsWith("pass:") -> {
            val password = input.substringAfter("pass:")
            passwordPattern.matches(password)
        }
        else -> false
    }
}

This validation function demonstrates how Kotlin regex can be used to enforce complex input requirements with precise pattern matching.

Advanced Kotlin Regex Patterns

Kotlin supports advanced regex features including lookaheads, lookbehinds, and conditional patterns. These features enable sophisticated pattern matching scenarios.

val complexPattern = Regex("(?<=\\w)\\d+(?=\\w)") // Digits surrounded by word characters
val text = "abc123def456ghi"

val matches = complexPattern.findAll(text)
matches.forEach { match ->
    println("Found: ${match.value} at position ${match.range}")
}

Lookaheads and lookbehinds allow you to match patterns based on context without including the context in the match result.

Working with Regex Sequences

Kotlin regex methods often return sequences, which provide memory-efficient processing of large datasets. Sequences are lazily evaluated, making them ideal for processing large amounts of text.

val logFile = "Error: Connection failed\nWarning: Low memory\nError: Database timeout\nInfo: Process completed"
val errorPattern = Regex("Error: (.+)")

val errors = errorPattern.findAll(logFile)
    .map { it.groups[1]?.value ?: "Unknown error" }
    .toList()

errors.forEach { error ->
    println("Error detected: $error")
}

Using sequences with Kotlin regex allows for efficient processing of large text files without loading all matches into memory simultaneously.

Escaping Special Characters in Kotlin Regex

When working with literal text that contains regex special characters, you need to escape them properly. Kotlin provides the escape() method for this purpose.

val specialText = "Cost: $19.99 (includes 5% tax)"
val literalPattern = Regex.escape("$19.99")
val searchPattern = Regex("Cost: $literalPattern")

val found = searchPattern.find(specialText)
println(found?.value) // "Cost: $19.99"

The escape() method ensures that special regex characters are treated as literal text rather than pattern metacharacters.

Complete Kotlin Regular Expressions Example

Here’s a comprehensive example that demonstrates various Kotlin regex features in a practical text processing application:

import kotlin.text.Regex

data class ContactInfo(
    val name: String,
    val email: String,
    val phone: String,
    val website: String?
)

class ContactExtractor {
    private val namePattern = Regex("Name:\\s*([A-Za-z\\s]+)")
    private val emailPattern = Regex("Email:\\s*([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
    private val phonePattern = Regex("Phone:\\s*(\\(?\\d{3}\\)?[-\\s]?\\d{3}[-\\s]?\\d{4})")
    private val websitePattern = Regex("Website:\\s*(https?://[^\\s]+)")
    
    fun extractContacts(text: String): List<ContactInfo> {
        val contacts = mutableListOf<ContactInfo>()
        
        // Split text into potential contact blocks
        val contactBlocks = text.split(Regex("\\n\\s*\\n"))
        
        for (block in contactBlocks) {
            val name = namePattern.find(block)?.groups?.get(1)?.value?.trim()
            val email = emailPattern.find(block)?.groups?.get(1)?.value
            val phone = phonePattern.find(block)?.groups?.get(1)?.value
            val website = websitePattern.find(block)?.groups?.get(1)?.value
            
            if (name != null && email != null && phone != null) {
                contacts.add(ContactInfo(name, email, phone, website))
            }
        }
        
        return contacts
    }
    
    fun validateAndFormatPhone(phone: String): String? {
        val cleanPhone = phone.replace(Regex("[^\\d]"), "")
        return if (cleanPhone.length == 10) {
            cleanPhone.replace(Regex("(\\d{3})(\\d{3})(\\d{4})"), "($1) $2-$3")
        } else null
    }
    
    fun maskSensitiveData(text: String): String {
        val emailMask = Regex("([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
        val phoneMask = Regex("(\\(?\\d{3}\\)?[-\\s]?)(\\d{3})([-\\s]?\\d{4})")
        
        return text
            .replace(emailMask) { match ->
                val user = match.groups[1]?.value ?: ""
                val domain = match.groups[2]?.value ?: ""
                "${user.take(2)}***@${domain}"
            }
            .replace(phoneMask) { match ->
                val area = match.groups[1]?.value ?: ""
                val end = match.groups[3]?.value ?: ""
                "${area}***${end}"
            }
    }
}

fun main() {
    val contactExtractor = ContactExtractor()
    
    val sampleText = """
        Name: John Smith
        Email: john.smith@example.com
        Phone: (555) 123-4567
        Website: https://johnsmith.dev
        
        Name: Jane Doe
        Email: jane.doe@company.org
        Phone: 555-987-6543
        Website: https://janedoe.com
        
        Name: Bob Johnson
        Email: bob@techcorp.net
        Phone: (555) 456-7890
    """.trimIndent()
    
    // Extract contacts
    val contacts = contactExtractor.extractContacts(sampleText)
    
    println("=== Extracted Contacts ===")
    contacts.forEach { contact ->
        println("Name: ${contact.name}")
        println("Email: ${contact.email}")
        println("Phone: ${contactExtractor.validateAndFormatPhone(contact.phone) ?: contact.phone}")
        println("Website: ${contact.website ?: "Not provided"}")
        println("---")
    }
    
    // Demonstrate text masking
    println("\n=== Masked Text ===")
    println(contactExtractor.maskSensitiveData(sampleText))
    
    // Demonstrate advanced pattern matching
    println("\n=== Pattern Analysis ===")
    val emailDomains = Regex("@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})")
        .findAll(sampleText)
        .map { it.groups[1]?.value }
        .distinct()
        .toList()
    
    println("Email domains found: ${emailDomains.joinToString(", ")}")
    
    // Validate phone number formatting
    val phoneNumbers = listOf("5551234567", "(555) 123-4567", "555-123-4567")
    phoneNumbers.forEach { phone ->
        val formatted = contactExtractor.validateAndFormatPhone(phone)
        println("$phone -> ${formatted ?: "Invalid format"}")
    }
}

Expected Output:

=== Extracted Contacts ===
Name: John Smith
Email: john.smith@example.com
Phone: (555) 123-4567
Website: https://johnsmith.dev
---
Name: Jane Doe
Email: jane.doe@company.org
Phone: (555) 987-6543
Website: https://janedoe.com
---
Name: Bob Johnson
Email: bob@techcorp.net
Phone: (555) 456-7890
Website: Not provided
---

=== Masked Text ===
Name: John Smith
Email: jo***@example.com
Phone: (555)***-4567
Website: https://johnsmith.dev

Name: Jane Doe
Email: ja***@company.org
Phone: 555***-6543
Website: https://janedoe.com

Name: Bob Johnson
Email: bo***@techcorp.net
Phone: (555)***-7890

=== Pattern Analysis ===
Email domains found: example.com, company.org, techcorp.net
5551234567 -> (555) 123-4567
(555) 123-4567 -> (555) 123-4567
555-123-4567 -> (555) 123-4567

This comprehensive example showcases the power of Kotlin regular expressions in real-world applications. The code demonstrates contact extraction, data validation, text masking, and advanced pattern matching techniques using various Kotlin regex features.