Join us
@kalpit-sharma-dev ・ Dec 18,2021 ・ 2 min read ・ 625 views ・ Originally posted on faun.pub
Golang strings are immutable.
In general, immutable data is simpler to reason about, but it also means your program must allocate more memory to “change” that data. Sometimes, your program can’t afford that luxury. For example, there might not be any more memory to allocate. Another reason: you don’t want to create more work for the garbage collector.
In C, a string is a null-terminated sequence of chars — char*
. Each char
is a single byte, and the string keeps going until there’s a '\0'
character. If you pointed at an arbitrary memory location and called it a C string, you’d see every byte in order until you hit a zero.
In Go, string
is its own data type. At its core, it’s still a sequence of bytes, but:
rune
s may span multiple bytes.So string
in Go carries some additional structure compared to char*
in C. How does it do this? It’s actually a struct:
Data
here is analogous to the C string, and Len
is the length. The Golang struct memory layout starts with the last field, so if you were to look at a string
under the microscope, you’d see Len
first and then a pointer to the string
's contents. (You can find documentation of these header structs in the reflect
package.)
Before we start inspecting strings by looking at their StringHeader
fields, how do we cast a string
to a StringHeader
in the first place? When you really need to convert from one Go type to another, use the unsafe
package:
import (
"unsafe"
)s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))
unsafe.Pointer
is an untyped pointer. It can point to any kind of value. It’s a way to tell the compiler, “Step aside. I know what I’m doing.” In this case, what we’re doing is converting a *string
into an unsafe.Pointer
into a *StringHeader
.
Now we have access to the underlying representation of the string
. Ever wondered how len("hello")
works? We can implement it ourselves:
func strLen(s string) int {
header := (*StringHeader)(unsafe.Pointer(&s)
return header.Len
}
Getting the length of a string is nice, but what about setting it? Here’s what happens if we artificially extend the length of a string:
s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))
header.Len = 100// cast the header back to 'string' and print it
fmt.Print(*(*string)(unsafe.Pointer(header)))/* on stdout:
helloint16int32int64panicslicestartuint8write (MB)
Value addr= code= ctxt: curg= list= m->p= p->m=
*/
By changing the Len
field of the string header, we can expand the string to include other parts of memory. It’s interesting to observe this behavior, but it’s not something you’d actually want to use.
Data: unsafe.Pointer
You may have noticed that StringHeader
has an unsafe.Pointer
field which points to the string’s sequence of bytes. []byte
also has a sequence of bytes. In fact, we can build a []byte
from this pointer. Here’s what a slice actually looks like:
type SliceHeader struct {
Data unsafe.Pointer
Len int
Cap int
}
It’s a lot like StringHeader
, except it also has a Cap
(capacity) field. What happens if we build a SliceHeader
from the fields of a StringHeader
?
func strToBytes(s string) []byte {
header := (*StringHeader)(unsafe.Pointer(&s))
bytesHeader := &SliceHeader{
Data: header.Data,
Len: header.Len,
Cap: header.Len,
}
return *(*[]byte)(unsafe.Pointer(bytesHeader))
}fmt.Print(strToBytes("hello")) // [104 101 108 108 111]
We’ve converted a string
into a []byte
. It’s just as easy to go the other direction:
func bytesToStr(b []byte) string {
header := (*SliceHeader)(unsafe.Pointer(&b))
strHeader := &StringHeader{
Data: header.Data,
Len: header.Len,
}
return *(*string)(unsafe.Pointer(strHeader))
}fmt.Print(bytesToStr([]byte{104, 101, 108, 108, 111}) // "hello"
Both string
and []byte
headers are using the same Data
pointer, so they share memory. If you ever need to convert between string
and []byte
but there isn’t enough memory to perform a copy, this might be useful.
A word of caution, however: string
is meant to be immutable, but []byte
is not. If you cast a string
to []byte
and try to modify the byte array, it’s a segmentation fault.
s := "hello"
b := strToBytes(s)b[0] = 100// panic: runtime error: invalid memory address or nil pointer dereference
// [signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xd56a2]
Casting in the other direction doesn’t cause a segmentation fault, but then your supposedly immutable string
can change:
b := []byte{104, 101, 108, 108, 111}
s := bytesToStr(b)fmt.Print(s) // "hello"
b[0] = 100
fmt.Print(s) // "dello"
TRY IT …. https://goplay.tools/snippet/PAjwbct_ohF
Join other developers and claim your FAUN account now!
Connectwise
@kalpit-sharma-devInfluence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.